SEQanswers

SEQanswers (http://seqanswers.com/forums/index.php)
-   Bioinformatics (http://seqanswers.com/forums/forumdisplay.php?f=18)
-   -   Which Pipeline is Correct?? (http://seqanswers.com/forums/showthread.php?t=20519)

lg36 06-01-2012 06:27 PM

Which Pipeline is Correct??
 
Dear All,

I'd be very grateful for any advice on which (if any) of the following processes to take for SNP discovery? I find a different number of SNPs with each approach with the same SNP filters applied throughout.

Raw FastQ File --> Direct Mapping to Reference --> SNP discovered 1,200

Raw FastQ File --> De Novo Assembly --> Extract Paired Contigs --> Map Paired Contigs to Reference --> SNPs discovered 1,089

Raw FastQ File --> De Novo Assembly --> Extract All Contigs --> Map Contigs to Reference --> SNPs discovered 1,383

How can I know which SNPs are correct. Would it be useful to test the quality of the corresponding consensus sequence (for sets of known nucleotides?) or would that not help to judge which is most accurate.

Best wishes lg36

westerman 06-02-2012 04:26 AM

It does depend on the quality of your reference but I would say the first. I see no reason to do a de-novo assembly, especially for SNP discovery, when you can do an mapping instead. De-novo will always give worse and more questionable results.

As for how to determine which ones are correct ... back to the lab you go! Independent verification via different methodology is the definitive proof. Oh, you can also take your sequencing results and apply statistical filters to it and list the SNPs that fall into, say, p<0.05 but where is the fun in that?

lh3 06-02-2012 12:13 PM

Mapping based approach may be confused by long indels or large-scale changes, which leads to false SNPs. This is not that infrequent for human SNP discovery. Assembly can do better in such cases as it more effectively takes advantage of between-read information.

On the other hand, although I believe for small genomes, assembly based approach is advantageous in theory, many existing assemblers and contig aligners are not fine tuned for assembly based SNP discovery. On Illumina data, for which the tool chain is relatively complete and mature, the overall accuracy of mapping based calls is likely to be better unless you are very careful about the assembly.

Anyway, which is better highly depends on how you did the analysis and the divergence from your reference. We cannot just tell from the numbers. I recommend you look at calls unique to one set in IGV/tview and get a sense by yourself. This is the cheapest yet very effective way to answer your own question.

lg36 06-02-2012 12:38 PM

Thanks both so much for your help. Just to give you some more information, the second method gives me a consensus sequence which is most representative of the consensus sequence we have already PCR'd in the lab via a different method. Does this make method 2 more correct than method one or three.

milo0615 01-21-2016 03:37 PM

hi g36,

Which tools did you use for mapping and SNP discovery?

Quote:

Originally Posted by lg36 (Post 75038)
Dear All,

I'd be very grateful for any advice on which (if any) of the following processes to take for SNP discovery? I find a different number of SNPs with each approach with the same SNP filters applied throughout.

Raw FastQ File --> Direct Mapping to Reference --> SNP discovered 1,200

Raw FastQ File --> De Novo Assembly --> Extract Paired Contigs --> Map Paired Contigs to Reference --> SNPs discovered 1,089

Raw FastQ File --> De Novo Assembly --> Extract All Contigs --> Map Contigs to Reference --> SNPs discovered 1,383

How can I know which SNPs are correct. Would it be useful to test the quality of the corresponding consensus sequence (for sets of known nucleotides?) or would that not help to judge which is most accurate.

Best wishes lg36


jorge-bariloche 09-21-2016 02:37 AM

Hi lg36
I'm sort of having the same doubt. How did you solve this issue?
best wishes
Jorge


All times are GMT -8. The time now is 01:34 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.