SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
TruSeq RNA-How many samples to pool? mchotalia Sample Prep / Library Generation 0 01-11-2012 06:27 AM
Read number variation in pooled/multiplexed samples -- any tips? gringer Illumina/Solexa 2 01-06-2012 11:18 AM
Merging samples to get cross-experiment variation cbeck Bioinformatics 0 09-29-2011 07:56 AM
tools for SNP calling in pooled samples gfmgfm Bioinformatics 0 12-30-2010 09:57 AM
SNP calling software in pooled samples mrxcm3 Bioinformatics 3 11-03-2010 09:38 PM

Reply
 
Thread Tools
Old 02-11-2013, 01:57 AM   #1
dan
wiki wiki
 
Location: Cambridge, England

Join Date: Jul 2008
Posts: 266
Default Variation calling, pool samples or not?

I seem to remember from talks and manuals that it's good practice to pool reads from all available samples when calling (per-sample) variations, as the added depth improves call statistics.

Can anyone point me at literature or discussion on this specific point, as I can't find anything concrete.


Cheers,
Dan.
__________________
Homepage: Dan Bolser
MetaBase the database of biological databases.
dan is offline   Reply With Quote
Old 02-11-2013, 06:04 AM   #2
severin
Genome Informatics Facility
 
Location: Iowa @isugif

Join Date: Sep 2009
Posts: 105
Default GATK best practices

Quote:
Originally Posted by dan View Post
I seem to remember from talks and manuals that it's good practice to pool reads from all available samples when calling (per-sample) variations, as the added depth improves call statistics.

Can anyone point me at literature or discussion on this specific point, as I can't find anything concrete.


Cheers,
Dan.
I was literally just reading about that here.

http://www.broadinstitute.org/gatk/g...best-practices

Though not sure what is recommended for the Haplotype Caller as it is still a little experimental.
severin is offline   Reply With Quote
Old 02-11-2013, 06:19 AM   #3
dan
wiki wiki
 
Location: Cambridge, England

Join Date: Jul 2008
Posts: 266
Default

Nice link. It's very clear, but there isn't much detail, i.e. Why are samples called together?

Not surprised that haplotype calling is up in the air ;-)


Cheers,
__________________
Homepage: Dan Bolser
MetaBase the database of biological databases.
dan is offline   Reply With Quote
Old 02-11-2013, 06:22 AM   #4
dan
wiki wiki
 
Location: Cambridge, England

Join Date: Jul 2008
Posts: 266
Default

Here is a good answer :-)
http://www.biostars.org/p/10926/
__________________
Homepage: Dan Bolser
MetaBase the database of biological databases.
dan is offline   Reply With Quote
Old 02-11-2013, 06:59 AM   #5
dgscofield
Member
 
Location: Uppsala, Sweden

Join Date: Nov 2010
Posts: 27
Default

Especially if you're after allele frequency spectra, estimators are generally better for pooled samples. The 2010 Genetics article by Futschik and Schlötterer will get you started.

One common-sense statistical issue for calling SNPs in individuals is that there are stochastic allele-specific coverage biases up and down around expected coverage for any sample, nothing you can do about that. Pooling samples reduces the influence of this error term relative to the detection threshold for reasonably-frequent alleles. If there is weak evidence for a SNP in a single individual considered alone but that same SNP is segregating at reasonable frequency within the population, that prior knowledge strengthens the evidence for the SNP in the individual.
dgscofield is offline   Reply With Quote
Reply

Tags
pipeline, pool, statistics, variation calling, vcf

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 07:53 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO