SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Cufflinks or Cuffdiff for frag-bias-correct? Dbuch Bioinformatics 2 05-22-2014 07:20 PM
454 reads correct with illumina biocomfun 454 Pyrosequencing 6 02-12-2012 03:00 AM
How to find out if mapping is correct/not mitochy Bioinformatics 3 01-17-2012 01:09 AM
Correct kit for qPCR on Illumina preps? Heisman Sample Prep / Library Generation 2 06-08-2011 07:34 AM
Manually correct heterozygous indels captainobvious Bioinformatics 2 03-03-2009 10:07 AM

Reply
 
Thread Tools
Old 06-01-2012, 06:27 PM   #1
lg36
Member
 
Location: london

Join Date: Mar 2012
Posts: 12
Default Which Pipeline is Correct??

Dear All,

I'd be very grateful for any advice on which (if any) of the following processes to take for SNP discovery? I find a different number of SNPs with each approach with the same SNP filters applied throughout.

Raw FastQ File --> Direct Mapping to Reference --> SNP discovered 1,200

Raw FastQ File --> De Novo Assembly --> Extract Paired Contigs --> Map Paired Contigs to Reference --> SNPs discovered 1,089

Raw FastQ File --> De Novo Assembly --> Extract All Contigs --> Map Contigs to Reference --> SNPs discovered 1,383

How can I know which SNPs are correct. Would it be useful to test the quality of the corresponding consensus sequence (for sets of known nucleotides?) or would that not help to judge which is most accurate.

Best wishes lg36
lg36 is offline   Reply With Quote
Old 06-02-2012, 04:26 AM   #2
westerman
Rick Westerman
 
Location: Purdue University, Indiana, USA

Join Date: Jun 2008
Posts: 1,104
Default

It does depend on the quality of your reference but I would say the first. I see no reason to do a de-novo assembly, especially for SNP discovery, when you can do an mapping instead. De-novo will always give worse and more questionable results.

As for how to determine which ones are correct ... back to the lab you go! Independent verification via different methodology is the definitive proof. Oh, you can also take your sequencing results and apply statistical filters to it and list the SNPs that fall into, say, p<0.05 but where is the fun in that?
westerman is offline   Reply With Quote
Old 06-02-2012, 12:13 PM   #3
lh3
Senior Member
 
Location: Boston

Join Date: Feb 2008
Posts: 693
Default

Mapping based approach may be confused by long indels or large-scale changes, which leads to false SNPs. This is not that infrequent for human SNP discovery. Assembly can do better in such cases as it more effectively takes advantage of between-read information.

On the other hand, although I believe for small genomes, assembly based approach is advantageous in theory, many existing assemblers and contig aligners are not fine tuned for assembly based SNP discovery. On Illumina data, for which the tool chain is relatively complete and mature, the overall accuracy of mapping based calls is likely to be better unless you are very careful about the assembly.

Anyway, which is better highly depends on how you did the analysis and the divergence from your reference. We cannot just tell from the numbers. I recommend you look at calls unique to one set in IGV/tview and get a sense by yourself. This is the cheapest yet very effective way to answer your own question.
lh3 is offline   Reply With Quote
Old 06-02-2012, 12:38 PM   #4
lg36
Member
 
Location: london

Join Date: Mar 2012
Posts: 12
Default

Thanks both so much for your help. Just to give you some more information, the second method gives me a consensus sequence which is most representative of the consensus sequence we have already PCR'd in the lab via a different method. Does this make method 2 more correct than method one or three.
lg36 is offline   Reply With Quote
Old 01-21-2016, 03:37 PM   #5
milo0615
Member
 
Location: Walnut, California

Join Date: Dec 2012
Posts: 39
Default

hi g36,

Which tools did you use for mapping and SNP discovery?

Quote:
Originally Posted by lg36 View Post
Dear All,

I'd be very grateful for any advice on which (if any) of the following processes to take for SNP discovery? I find a different number of SNPs with each approach with the same SNP filters applied throughout.

Raw FastQ File --> Direct Mapping to Reference --> SNP discovered 1,200

Raw FastQ File --> De Novo Assembly --> Extract Paired Contigs --> Map Paired Contigs to Reference --> SNPs discovered 1,089

Raw FastQ File --> De Novo Assembly --> Extract All Contigs --> Map Contigs to Reference --> SNPs discovered 1,383

How can I know which SNPs are correct. Would it be useful to test the quality of the corresponding consensus sequence (for sets of known nucleotides?) or would that not help to judge which is most accurate.

Best wishes lg36
milo0615 is offline   Reply With Quote
Old 09-21-2016, 02:37 AM   #6
jorge-bariloche
Junior Member
 
Location: Argentina

Join Date: Oct 2014
Posts: 4
Default

Hi lg36
I'm sort of having the same doubt. How did you solve this issue?
best wishes
Jorge
jorge-bariloche is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 10:59 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO