![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Option "calmd"; Reporting indels and Somatic mutations for Whole Exome Seq data: | angerusso | Bioinformatics | 0 | 01-10-2012 03:32 PM |
Planning computing budget for Exome-seq data analysis | zxyeo | General | 6 | 11-23-2011 12:22 AM |
RNA-Seq: SeqGene: a comprehensive software solution for mining exome- and transcripto | Newsbot! | Literature Watch | 0 | 07-01-2011 03:30 AM |
Illumina Human Exome vs Agilent Human Exome | GW_OK | Sample Prep / Library Generation | 23 | 06-28-2011 12:06 PM |
RNA-Seq: Screening the human exome: a comparison of whole genome and whole transcript | Newsbot! | Literature Watch | 0 | 07-06-2010 02:00 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Senior Member
Location: Graz, Austria Join Date: Feb 2010
Posts: 219
|
![]()
Hi guys,
I am working on some exome datasets and ran into a problem: Is there already a published way of looking at sites which should have been captured but didn't be sequenced? E.g. we did exome sequencing of a patient and I did alignment, SNP calling (GATK) and afterwards variant annotation using Annovar. We are now down to 3 SNPs. While we are now trying to validate that experimentally I was wondering how to determine parts of the exome which are not covered by let's assume 20+ reads. Moreover, how could one determine homozygote exon deletions (they should not be covered by sequences at all, but that can happen due to chance as well)? Any help appreciated, Peter |
![]() |
![]() |
![]() |
#2 |
Senior Member
Location: Austria Join Date: Apr 2009
Posts: 181
|
![]()
Hi ulz_peter,
We are also interested in this question. I'm surprised others have not asked this and are only looking for SNPs/indels in exome data I have not found a published package yet that does this. Hard Cutoffs for reads/exon are not safe since you'll get too many False positives for deletions if the capture did not go well or when going from one sample to another (unless you normalize reads across samples). However, I am in collaboration with someone who is working on this problem using a statistical approach Last edited by NGSfan; 01-31-2011 at 05:30 AM. |
![]() |
![]() |
![]() |
#3 |
Member
Location: WTSI Join Date: Dec 2010
Posts: 41
|
![]()
I haven't specifically looked into this problem but I have a feeling that BEDTools would contain functions (for example genomeCoverage) for calculating what you need.
So something like calculate base-wise genome coverage of the exome data -> filter through the capture regions by discarding bases not in those regions -> filter out bases having coverage <20x. |
![]() |
![]() |
![]() |
#4 | |
Senior Member
Location: Palo Alto Join Date: Apr 2009
Posts: 213
|
![]() Quote:
I personally wrote a perl script to do this myself. It basically calculates mean coverage across each targeted interval and generates a set of "% targets above mean coverage" metrics. It will be easy to tweak this script to instead output all the intervals below a specific coverage value (whatever you think equates to "too low" for variant calling--4X coverage or whatever). Just for your information, I compared multiple different exome pull-down platforms and saw various amounts of un-covered targets in each (ranging from ~1%-~7% missed). It appeared that even with additional sequencing, there are 1-2% of the targets in each platform that simply didn't pull down adequately and thereby failed.
__________________
Mendelian Disorder: A blogshare of random useful information for general public consumption. [Blog] Breakway: A Program to Identify Structural Variations in Genomic Data [Website] [Forum Post] Projects: U87MG whole genome sequence [Website] [Paper] |
|
![]() |
![]() |
![]() |
#5 |
Senior Member
Location: San Diego Join Date: May 2008
Posts: 912
|
![]()
BEDTools does indeed have something. Below an excerpt from their histogram output. Column 5 is the coverage, and column 6 is how many bases were covered that well. Column 7 is the total length of the feature, column 8 is the % of the feature covered at that exact depth.
So for the exon below, 8 bases were totally uncovered, and 122 bases were covered at 5x or greater. I made it with this command line, courtesy of another poster on Seqanswers coverageBed -abam reads.bam b exons.bed -hist >result.txt chr1 176160618 176161057 Spna1-41 0 8 439 0.0182232 chr1 176160618 176161057 Spna1-41 1 59 439 0.1343964 chr1 176160618 176161057 Spna1-41 2 92 439 0.2095672 chr1 176160618 176161057 Spna1-41 3 103 439 0.2346241 chr1 176160618 176161057 Spna1-41 4 55 439 0.1252847 chr1 176160618 176161057 Spna1-41 5 21 439 0.0478360 chr1 176160618 176161057 Spna1-41 6 49 439 0.1116173 chr1 176160618 176161057 Spna1-41 7 27 439 0.0615034 chr1 176160618 176161057 Spna1-41 8 25 439 0.0569476 |
![]() |
![]() |
![]() |
#6 | |
Senior Member
Location: Austria Join Date: Apr 2009
Posts: 181
|
![]() Quote:
Yes, this is precisely the problem - you will always have 1-2% of target regions (eg. exons) that are not captured. If you have , say, 15000 target regions, then up to 300 regions you would false call as deletions. So you would have to have a way to figure out the "bad baits" and remove them from the analysis. The question is - across how many samples does a bait not have to work to call it bad? 5? 10 ? 20? This is a trickier problem in that I don't think thresholds like #reads/exon would work robustly. Variability in capture rates, sampling depths, etc would make it hard to use one cutoff consistently.. no? Just brainstorming here... Last edited by NGSfan; 02-03-2011 at 09:16 AM. |
|
![]() |
![]() |
![]() |
#7 |
Senior Member
Location: Austria Join Date: Apr 2009
Posts: 181
|
![]()
Just to give you a taste on what we have working (I cannot give the details since we have not published yet):
We've got ~60 cell lines sequenced with targeted exonic capture. Our methods takes the bait locations, calculates the coverage and determines deleted regions. It does not use simple cutoffs, but looks at the distribution of normalized reads across all cell lines for each bait and determines what read count represents "copy number 2" (diploid) and what is less than (deletion) or greater than diploid (amplification or copy number). For now we can get the deletions reliably - but CNV counts are trickier. It's very preliminary though ! So far CNV/deletion algorithms I know are only for whole genome sequencing. IGV tracks: http://www.sendspace.com/file/dzh753 Copy number/deletion prediction: http://www.sendspace.com/file/tj4xii Last edited by NGSfan; 02-16-2011 at 08:25 AM. |
![]() |
![]() |
![]() |
#8 |
Nils Homer
Location: Boston, MA, USA Join Date: Nov 2008
Posts: 1,285
|
![]()
Is it the NCI 60 set?
|
![]() |
![]() |
![]() |
#9 |
Senior Member
Location: Austria Join Date: Apr 2009
Posts: 181
|
![]()
No, it's our own sequenced exome capture from 60-80 different cell lines - it probably has considerable overlap though.
Although I didn't know there is exome capture data available for the NCI 60? Where would I find this? Would be interesting to run it on that data. |
![]() |
![]() |
![]() |
#10 |
Member
Location: Connecticut Join Date: Jun 2009
Posts: 74
|
![]()
I don't know if others have run into problems of not being able to get a hand on commercialized exome baits bed files. I spoke with a sales rep and he told me it is proprietary info. yikes.
|
![]() |
![]() |
![]() |
#11 | |
Senior Member
Location: San Diego Join Date: May 2008
Posts: 912
|
![]() Quote:
We used the Agilient mouse exome set, and they gave us a .bed file |
|
![]() |
![]() |
![]() |
#12 | |
Senior Member
Location: Austria Join Date: Apr 2009
Posts: 181
|
![]() Quote:
That doesn't sound good - it is crucial to have the bait bed files - how else will you know what was probed and not probed? ![]() Maybe the sales rep meant it was only for paying customers? |
|
![]() |
![]() |
![]() |
#13 |
Member
Location: Connecticut Join Date: Jun 2009
Posts: 74
|
![]()
thats good to hear, at least now they can't say they arent giving out the bed files anymore
im using human sureselect all exon kit 50 mb |
![]() |
![]() |
![]() |
#14 |
Member
Location: Connecticut Join Date: Jun 2009
Posts: 74
|
![]()
thanks to all you people, i finally will have my hands on the files after speaking with the company
|
![]() |
![]() |
![]() |
#15 |
Senior Member
Location: San Diego Join Date: May 2008
Posts: 912
|
![]()
Hooray, happy ending.
|
![]() |
![]() |
![]() |
Thread Tools | |
|
|