SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Cuffdiff output - how reliable? whoopster101 RNA Sequencing 12 02-23-2014 08:15 AM
Seeking Reliable Bioinformatics Services in EU/US? puggie Service Providers 15 08-30-2013 04:03 AM
DEGseq VS edgeR, which one is more reliable? tianyub836 Bioinformatics 40 01-07-2013 10:43 PM
Galaxy: filtering for unique/reliable alignments onson001 Bioinformatics 1 10-20-2011 04:47 PM
Which tool is reliable for SVs? AlexSmith Bioinformatics 1 06-11-2010 08:23 AM

Reply
 
Thread Tools
Old 05-30-2012, 07:07 AM   #1
lg36
Member
 
Location: london

Join Date: Mar 2012
Posts: 12
Default Does a reliable consensus mean more reliable SNPs?

Dear All,

I'm relatively new to WGS analysis so please excuse any naivety on my part.

Before getting the WGS sequences I have confirmed the presence or absence of certain oligonucleotides in various bacterial DNA samples. So I know I should see these sequences in the final consensus sequence.

Is it true to think that if I can produce a more reliable consensus sequence then the SNP calls are also likely to be more reliable. I appreciate that there are many SNP quality filters etc that will also be applied that can lead to difference between a consensus and a SNP call, but I just wanted to get an idea of the overall correlation between the consensus and SNPs.

If there is a high correlation between the two then surely if I make sure that my consensus sequences are as reliable as possible, when I come to calling the SNPs from the same mapped reads they will be more reliable???

Apologies if I'm totally wrong about this.

Best wishes lg36
lg36 is offline   Reply With Quote
Old 06-04-2012, 12:36 AM   #2
mbayer
Member
 
Location: Dundee, Scotland

Join Date: Mar 2009
Posts: 29
Default

Hi lg36,

you're not wrong about this at all -- this is in fact a pretty important factor in SNP discovery.

Your SNPs can only ever be as good as your reference and your mapping. If your reference contains errors, this will propagate right through into your SNP calls, and similarly if you mismap lots of reads you will also increase your false positive SNP rate.

I routinely map the reads from the individual used to make the reference back to the reference before I do any mapping of other individuals onto that reference for SNP discovery. I then call SNPs on that mapping first, and I always get SNPs here.

In a homozygous or haploid organism this will give you a list of positions where there reference most likely contains errors -- in an ideal case there should be zero SNPs when I map the reads back onto the reference that was made from the same reads. I don't know what you work with but I am fortunate in that I do a lot of work with cultivated barley which is essentially homozygous and that simplifies matters obviously.

I then subtract the list of SNPs called there from any list of SNPs generated with reads from a different individual -- it's essentially a way of removing background noise. I guess if you have a heterozygous organism and it's well curated you could probably use a public, curated list of SNPs instead.

This gives you much cleaner SNP sets and reduces the false positive rate but the caveat is that potentially you may be increasing your false negative rate (I don't have any data on this yet). It all depends on what your SNPs are for - if reliability is key, then this works well. You may also want to remove duplicates from the mapping -- that also reduces your FP rate.

cheers

Micha
mbayer is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 11:41 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO