Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Which Pipeline is Correct??

    Dear All,

    I'd be very grateful for any advice on which (if any) of the following processes to take for SNP discovery? I find a different number of SNPs with each approach with the same SNP filters applied throughout.

    Raw FastQ File --> Direct Mapping to Reference --> SNP discovered 1,200

    Raw FastQ File --> De Novo Assembly --> Extract Paired Contigs --> Map Paired Contigs to Reference --> SNPs discovered 1,089

    Raw FastQ File --> De Novo Assembly --> Extract All Contigs --> Map Contigs to Reference --> SNPs discovered 1,383

    How can I know which SNPs are correct. Would it be useful to test the quality of the corresponding consensus sequence (for sets of known nucleotides?) or would that not help to judge which is most accurate.

    Best wishes lg36

  • #2
    It does depend on the quality of your reference but I would say the first. I see no reason to do a de-novo assembly, especially for SNP discovery, when you can do an mapping instead. De-novo will always give worse and more questionable results.

    As for how to determine which ones are correct ... back to the lab you go! Independent verification via different methodology is the definitive proof. Oh, you can also take your sequencing results and apply statistical filters to it and list the SNPs that fall into, say, p<0.05 but where is the fun in that?

    Comment


    • #3
      Mapping based approach may be confused by long indels or large-scale changes, which leads to false SNPs. This is not that infrequent for human SNP discovery. Assembly can do better in such cases as it more effectively takes advantage of between-read information.

      On the other hand, although I believe for small genomes, assembly based approach is advantageous in theory, many existing assemblers and contig aligners are not fine tuned for assembly based SNP discovery. On Illumina data, for which the tool chain is relatively complete and mature, the overall accuracy of mapping based calls is likely to be better unless you are very careful about the assembly.

      Anyway, which is better highly depends on how you did the analysis and the divergence from your reference. We cannot just tell from the numbers. I recommend you look at calls unique to one set in IGV/tview and get a sense by yourself. This is the cheapest yet very effective way to answer your own question.

      Comment


      • #4
        Thanks both so much for your help. Just to give you some more information, the second method gives me a consensus sequence which is most representative of the consensus sequence we have already PCR'd in the lab via a different method. Does this make method 2 more correct than method one or three.

        Comment


        • #5
          hi g36,

          Which tools did you use for mapping and SNP discovery?

          Originally posted by lg36 View Post
          Dear All,

          I'd be very grateful for any advice on which (if any) of the following processes to take for SNP discovery? I find a different number of SNPs with each approach with the same SNP filters applied throughout.

          Raw FastQ File --> Direct Mapping to Reference --> SNP discovered 1,200

          Raw FastQ File --> De Novo Assembly --> Extract Paired Contigs --> Map Paired Contigs to Reference --> SNPs discovered 1,089

          Raw FastQ File --> De Novo Assembly --> Extract All Contigs --> Map Contigs to Reference --> SNPs discovered 1,383

          How can I know which SNPs are correct. Would it be useful to test the quality of the corresponding consensus sequence (for sets of known nucleotides?) or would that not help to judge which is most accurate.

          Best wishes lg36

          Comment


          • #6
            Hi lg36
            I'm sort of having the same doubt. How did you solve this issue?
            best wishes
            Jorge

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Strategies for Sequencing Challenging Samples
              by seqadmin


              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
              03-22-2024, 06:39 AM
            • seqadmin
              Techniques and Challenges in Conservation Genomics
              by seqadmin



              The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

              Avian Conservation
              Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
              03-08-2024, 10:41 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, Yesterday, 06:37 PM
            0 responses
            10 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, Yesterday, 06:07 PM
            0 responses
            9 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-22-2024, 10:03 AM
            0 responses
            49 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-21-2024, 07:32 AM
            0 responses
            67 views
            0 likes
            Last Post seqadmin  
            Working...
            X