SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
"allele balance ratio" and "quality by depth" in VCF files efoss Bioinformatics 2 10-25-2011 12:13 PM
comparing two samples using "bcftools -T" option? NanYu Bioinformatics 0 10-25-2011 08:01 AM
The position file formats ".clocs" and "_pos.txt"? Ist there any difference? elgor Illumina/Solexa 0 06-27-2011 08:55 AM
"Systems biology and administration" & "Genome generation: no engineering allowed" seb567 Bioinformatics 0 05-25-2010 01:19 PM
SEQanswers second "publication": "How to map billions of short reads onto genomes" ECO Literature Watch 0 06-30-2009 12:49 AM

Reply
 
Thread Tools
Old 02-09-2012, 07:30 AM   #1
giorgifm
Member
 
Location: Columbia University Medical Center

Join Date: Aug 2011
Posts: 35
Default Coverage "standards" for SNP detection in tumor samples

Dear all,

I was wondering if there is a standard "coverage" for exomic SNP calling in tumor_vs_healthy samples (same patient). As we know, tumor samples have an intrinsically higher mutability (Parsons et al., 1993). I was thinking of applying a threshold of at least 20X for the healthy one, and 50X for the tumor one. Do these look sufficient to you?

Also, there appears to be no standard for coverage definition: so by "50X" I intend exome-wise coverage of 100bp uniquely-mapping Illumina paired reads, after duplicate removal.

Thanks!

Federico
giorgifm is offline   Reply With Quote
Old 02-09-2012, 12:33 PM   #2
Bukowski
Senior Member
 
Location: Aberdeen, Scotland

Join Date: Jan 2010
Posts: 362
Default

Quote:
Originally Posted by giorgifm View Post
Dear all,

I was wondering if there is a standard "coverage" for exomic SNP calling in tumor_vs_healthy samples (same patient). As we know, tumor samples have an intrinsically higher mutability (Parsons et al., 1993). I was thinking of applying a threshold of at least 20X for the healthy one, and 50X for the tumor one. Do these look sufficient to you?

Also, there appears to be no standard for coverage definition: so by "50X" I intend exome-wise coverage of 100bp uniquely-mapping Illumina paired reads, after duplicate removal.

Thanks!

Federico
No there is no standard. It depends how many calls you want to make accurately. Something like SomaticSniper will happily call things in low coverage areas, but you will have little confidence in the genotypes. Even with 40x coverage for an exome sample.

I'm doing some development work on cancer panels, and we've been advised (this is not exome sequencing, but targetted resequencing) to be aiming for 500x to 1000x coverage. I was a little iffy about these figures until I started actually doing the analysis on exomes myself just to test things out.

This is prohibitively expensive for exomes I imagine, so I think in terms of depth 'as much as you can afford'. Remember you will also want to be confident about the genotype calls in your normal samples..
Bukowski is offline   Reply With Quote
Old 03-22-2012, 11:20 AM   #3
giorgifm
Member
 
Location: Columbia University Medical Center

Join Date: Aug 2011
Posts: 35
Default

Thank you for your answer Bukowski. So far we are aiming at around 40x coverage. That seems to be the minimum coverage to stabilize the significance of somatic mutations found.
giorgifm is offline   Reply With Quote
Old 06-24-2013, 02:11 PM   #4
rama
Member
 
Location: Boston, USA

Join Date: Jan 2011
Posts: 20
Default finding the depth of coverage with more confidence

This is an extension to the original question on this post. I was wondering if anybody knows how I can calculate the accuracy in sequencing at various levels of depth of coverage. based on this I want to choose the coverage with more confidence. thanks in advance to all.
rama is offline   Reply With Quote
Old 06-24-2013, 02:25 PM   #5
Heisman
Senior Member
 
Location: St. Louis

Join Date: Dec 2010
Posts: 535
Default

Quote:
Originally Posted by rama View Post
This is an extension to the original question on this post. I was wondering if anybody knows how I can calculate the accuracy in sequencing at various levels of depth of coverage. based on this I want to choose the coverage with more confidence. thanks in advance to all.
A couple ideas: http://genome.sph.umich.edu/wiki/SNP...Set_Properties

Also, for any metric, you can tentatively assume your higher coverage/higher quality score calls will be more "correct" than the lower coverage/lower quality score calls. Thus, for any metric, compare different coverage thresholds to your highest quality sets. One caveat is it's possible for mapping artifacts or other things to lead to super high coverage, so make sure your "high quality set" looks real.
Heisman is offline   Reply With Quote
Old 06-24-2013, 03:45 PM   #6
rama
Member
 
Location: Boston, USA

Join Date: Jan 2011
Posts: 20
Default

Thanks a bunch for the pointer.
once we identify the data with "high quality set" is there a way to compute metrics at different coverage thresholds. I am not sure how to do it, do I have to randomly subset sequence reads and check for the variant calls or just compare with the consensus?
rama is offline   Reply With Quote
Old 06-26-2013, 02:34 PM   #7
Joann
Senior Member
 
Location: Woodbridge CT

Join Date: Oct 2008
Posts: 231
Default Global Alliance White Paper on Clinical Data

There is a consortium on clinical data as described in the White Paper linked here:

http://www.broadinstitute.org/files/...PaperJune3.pdf

On page 30 there is listed the names of organizers and their institutions, where you may be able to obtain additional follow-up information to "standards" questions about clinical data at this time.

Please contribute your posts on any standards statements that you may obtain therefrom here at this forum and/or in the Wiki so that others may be kept informed thus enabling a more rapid dissemination of consensus parameters.
Joann is offline   Reply With Quote
Old 06-26-2013, 03:41 PM   #8
Heisman
Senior Member
 
Location: St. Louis

Join Date: Dec 2010
Posts: 535
Default

Quote:
Originally Posted by rama View Post
Thanks a bunch for the pointer.
once we identify the data with "high quality set" is there a way to compute metrics at different coverage thresholds. I am not sure how to do it, do I have to randomly subset sequence reads and check for the variant calls or just compare with the consensus?
I was thinking just separate calls by coverage. IE, make a set of calls at >100x coverage, a set at 90-100x, a set at 80-90x, etc, and compare them. Or use quality score instead of coverage if you like that metric better. Your idea is interesting though; you could take a set of high quality calls and then randomly take smaller and smaller sets of reads for the same positions, redo the calling, and see how low the coverage threshold can get until your "subset calls" deviate too much from the legitimate set. The problem is if your high quality calls are in "easy" sites then this strategy won't apply to the rest of the genome necessarily.
Heisman is offline   Reply With Quote
Reply

Tags
coverage snp

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 08:25 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO