Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • SNP Analysis

    Has anyone used any SNP analysis pipelines other than the standard variant pipeline? Any suggestions on good ones to try?

  • #2
    A truly brilliant question cmm8cmm8. Kudos. I was wondering exactly the same thing.

    Comment


    • #3
      Originally posted by cmm8cmm8 View Post
      Has anyone used any SNP analysis pipelines other than the standard variant pipeline? Any suggestions on good ones to try?
      We use most of the libraries from bioconductors.

      The Bioconductor project aims to develop and share open source software for precise and repeatable analysis of biological data. We foster an inclusive and collaborative community of developers and data scientists.


      Here you will find all possible libraries for several platforms like Affy, Agilent, Illumina..

      I hope this helps.

      Comment


      • #4
        Originally posted by manoj.b View Post
        We use most of the libraries from bioconductors.

        The Bioconductor project aims to develop and share open source software for precise and repeatable analysis of biological data. We foster an inclusive and collaborative community of developers and data scientists.


        Here you will find all possible libraries for several platforms like Affy, Agilent, Illumina..

        I hope this helps.
        I think there's a confusion here between SNPs for microarray and SNP detection from sequencing. I believe the Bioconductor packages is for the former and the people were asking for the latter.

        Q: What do you consider as "standard"? Only from the vendors? How about MAQ?

        I'm testing NextGENe currently, which is supposedly designed for SNP detection (well, really for mutation detection). Would love to hear what other people use.

        Comment


        • #5
          Thanks for the replies. By "standard" I was referring to the variant pipeline that comes with the instrument.

          Comment


          • #6
            For SOLiD data, we use BFAST (admittedly my own aligner) [https://secure.genome.ucla.edu/index.php/BFAST]. The output of that is converted to SAM format (for use with samtools) [http://samtools.sourceforge.net/].

            We then use the MAQ consensus model to call SNPs using samtools, modifying the various parameters (train on known data) to get the correct TPR and FPR for calling hets.

            Nils

            Comment


            • #7
              Nilshomer,

              I recognize that your data is SOLID, but I was wondering about your method for concensus calling in which you "train on known data" to find the best parameter settings.

              I, too, am interested in doing such a thing. I have a 1M SNP Illumina array and Next-Gen data from the Illumina GA2 on the exome. What type of data did you train on?
              Which parameters did you find needed the most tweaking?
              Did you also find that the number of variants called by MAQ (or Samtools, in your case) was very high? I get >180,000 variants in the cns.filter.snp file when using the parameters from easyrun. This seems like way too many, but I'm having difficulty distinguishing the real things from the false positives.

              Looking forward to hearing your input...

              Comment


              • #8
                We are novice bioinformtacists so use CLC Bio's Genomic Workbench. The DIP (deletion-insertion polymorphism) algorithm works well. The SNP algorithm definitely detects known SNPs and we are optimizing the settings for best sensitivity and specificity. So far if we maximize specificity by looking at the X and Y chromosomes where SNPs should obviously be homozygous for male DNA samples, it reduces sensitivity and we miss too many known SNPs. Relaxing the criteria gives us better sensitivity but we get too many false positives.

                Comment


                • #9
                  Originally posted by erichpowell View Post
                  Nilshomer,

                  I recognize that your data is SOLID, but I was wondering about your method for concensus calling in which you "train on known data" to find the best parameter settings.

                  I, too, am interested in doing such a thing. I have a 1M SNP Illumina array and Next-Gen data from the Illumina GA2 on the exome. What type of data did you train on?
                  Which parameters did you find needed the most tweaking?
                  Did you also find that the number of variants called by MAQ (or Samtools, in your case) was very high? I get >180,000 variants in the cns.filter.snp file when using the parameters from easyrun. This seems like way too many, but I'm having difficulty distinguishing the real things from the false positives.

                  Looking forward to hearing your input...
                  I would plot an ROC curve based on all of the parameters in samtools at sites that Illumina genotyped as heterozygous assuming no genotyping error (1/10,000 in actuality). I found varying the "-r" parameter to be of most value. Also, further filtering like requiring a variant to be seen on both strand with sufficient coverage and quality helps a lot. We applied all these methods in our paper (self-publicity).

                  Comment


                  • #10
                    We use the bioscope software from lifetech which has the dibayes algorithm implemented. If you fiddle a bit with the settings it seems to work quite ok, but we have not yet done any thorough testing.

                    Comment


                    • #11
                      I have been using CLC to detect SNPs based on two SAM files, but I am having big problems in getting everything running. I was wondering if there is any package in Bioconductor or another free source that I can use?

                      Thanks for your help,

                      Comment


                      • #12
                        Hi,

                        Without using LifeTech's BioScope/LifeScope, I think the following pipeline can be applied to SOLiD data for SNP/indel detection.

                        1) *.csfasta+*.qual / *.XSQ -> SAM/BAM
                        BFAST, BWA, or NovoalignCS

                        2) SAM/BAM -> SNP/indel detection
                        SAM tools or GATK (more accurate)

                        3) Annotation
                        GATK or ANNOVAR

                        I think SAM tools and GATK do not use color-space information to detect SNPs/indels. That is one of the advantage of BioScope/LifeScope.
                        Last edited by HiroMishima; 10-31-2011, 04:44 PM.

                        Comment


                        • #13
                          Originally posted by HiroMishima View Post
                          Hi,

                          Without using LifeTech's BioScope/LifeScope, I think the following pipeline can be applied to SOLiD data for SNP/indel detection.

                          1) *.csfasta+*.qual / *.XSQ -> SAM/BAM
                          BFAST, BWA, or NovoalignCS

                          2) SAM/BAM -> SNP/indel detection
                          SAM tools or GATK (more accurate)

                          3) Annotation
                          GATK or ANNOVAR

                          I think SAM tools and GATK do not use color-space information to detect SNPs/indels. That is one of the advantage of BioScope/LifeScope.
                          I was wondering this, too. Thanks for your information!!

                          Comment

                          Latest Articles

                          Collapse

                          • seqadmin
                            Current Approaches to Protein Sequencing
                            by seqadmin


                            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                            04-04-2024, 04:25 PM
                          • seqadmin
                            Strategies for Sequencing Challenging Samples
                            by seqadmin


                            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                            03-22-2024, 06:39 AM

                          ad_right_rmr

                          Collapse

                          News

                          Collapse

                          Topics Statistics Last Post
                          Started by seqadmin, 04-11-2024, 12:08 PM
                          0 responses
                          31 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 04-10-2024, 10:19 PM
                          0 responses
                          32 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 04-10-2024, 09:21 AM
                          0 responses
                          28 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 04-04-2024, 09:00 AM
                          0 responses
                          53 views
                          0 likes
                          Last Post seqadmin  
                          Working...
                          X