SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
exome capture kit from Illumina lingfung.tang Sample Prep / Library Generation 1 01-27-2012 04:33 AM
Exome capture validation arvi8689 Genomic Resequencing 2 01-18-2012 02:45 AM
Illumina TruSeq exome capture Geneus Sample Prep / Library Generation 0 02-17-2011 08:50 AM
Barcode before exome capture upenn_ngs Sample Prep / Library Generation 5 11-01-2010 03:27 PM
Whole Exome Capture Bruce E Illumina/Solexa 2 02-25-2010 07:29 AM

Reply
 
Thread Tools
Old 09-16-2010, 07:58 AM   #1
foxyg
Member
 
Location: US

Join Date: May 2010
Posts: 54
Default Exome Sequencing reads alignment outside capture regions

For illumina exome sequencing pair end single lane data with Agilent Sure Select protocal, anyone know what is the expected amount of reads you will get from non exon part of genome? Because I found a large portion, 30% of reads aligned outside of capture region. If you have worked with exome sequencing, is this a reasonable number?

Thanks
foxyg is offline   Reply With Quote
Old 09-16-2010, 08:07 AM   #2
bioinfosm
Senior Member
 
Location: USA

Join Date: Jan 2008
Posts: 482
Default

That is reasonable.. we see ~40% off target. Perhaps this number changes with their latest 50MB capture kit!
__________________
--
bioinfosm
bioinfosm is offline   Reply With Quote
Old 09-16-2010, 08:11 AM   #3
svl
Member
 
Location: Netherlands

Join Date: Sep 2009
Posts: 43
Default

Here as well. Using solid: 65-70 % of bases (of mappable reads) are on target.
svl is offline   Reply With Quote
Old 09-16-2010, 08:25 AM   #4
foxyg
Member
 
Location: US

Join Date: May 2010
Posts: 54
Default

Are those reads aligned outside capture region really from non capture regions or are they just misaligned to those place.
foxyg is offline   Reply With Quote
Old 09-16-2010, 08:24 PM   #5
malachig
Senior Member
 
Location: WashU

Join Date: Aug 2010
Posts: 117
Default

Out of curiosity, how are you defining 'off target' vs. 'on target' reads. Do you mean reads that fall entirely within the capture sequence (i.e. the boundaries of each probe), or overlap the target region by at least one base, or are within distance X of a target, etc.??
malachig is offline   Reply With Quote
Old 09-16-2010, 08:34 PM   #6
foxyg
Member
 
Location: US

Join Date: May 2010
Posts: 54
Default

I counted the individual base hit in the capture region and outside capture region. I suppose I also could just count the starting of the read loc, it wouldn't make much difference.
foxyg is offline   Reply With Quote
Old 09-16-2010, 08:58 PM   #7
zee
NGS specialist
 
Location: Malaysia

Join Date: Apr 2008
Posts: 249
Default

I pad the target regions with ~80-100bp and then do the intersection with my bam files using bedtools:

Code:
intersectBed -wa -abam alignments.bam -b target_regions.100bp_padded.bed > alignments.ontarget.bam
zee is offline   Reply With Quote
Old 09-16-2010, 09:10 PM   #8
malachig
Senior Member
 
Location: WashU

Join Date: Aug 2010
Posts: 117
Default

Thanks zee. That's what I was wondering. Without the padding it seems like you could substantially overestimate the rate of off-target hits. And since the hybridization is going to grab DNA fragments that have partial overlap with the target regions, you expect a fair amount of off-target sequence. Aligned bases that are within ~80-100 bases off the target region are quite different from bases that are very distant from the target region when evaluating the success of the enrichment.

Last edited by malachig; 09-16-2010 at 11:24 PM.
malachig is offline   Reply With Quote
Old 09-16-2010, 09:16 PM   #9
zee
NGS specialist
 
Location: Malaysia

Join Date: Apr 2008
Posts: 249
Default

I picked up this hint from the Bainbridge et al., 2010 paper on whole exome capture sequencing. There is a section in the Materials and Methods which contains:

Quote:
Target exons were padded to a minimum length of 80 bp, and consolidated to remove redundant overlaps.
It makes sense to allow for some padding around the target region. Even with their protocol they only recovered at most 51% for SOLiD and 78% with Illumina PE.
zee is offline   Reply With Quote
Old 09-17-2010, 05:05 AM   #10
svl
Member
 
Location: Netherlands

Join Date: Sep 2009
Posts: 43
Default

Quote:
Originally Posted by malachig View Post
Out of curiosity, how are you defining 'off target' vs. 'on target' reads. Do you mean reads that fall entirely within the capture sequence (i.e. the boundaries of each probe), or overlap the target region by at least one base, or are within distance X of a target, etc.??
We do not use "on target reads" but "on target bases". From all mapped reads the percentage of bases "on target" is determined. So if a read overlaps "target region" for 10%, it contributes 10%. And if overlaps almost completely at 95%, it contributes 95%. It seems a more honest definition of "on target" to me than some cutoff (overlap or distance)...
svl is offline   Reply With Quote
Old 11-06-2010, 07:32 AM   #11
Sheila
Member
 
Location: Europe

Join Date: Jun 2009
Posts: 17
Question Percentage of coverage for each chromosome - Agilent design

Hi there,
Does anyone have some numbers on % of coverage for each individual chromosome? I got some weird results for chr7. Looking at the file Agilent provided us with the probe design (based on hg18) I saw that the last coordinates for chr7 were
chr7 158828612 158828732
chr7 158829397 158829517
chr7 158829517 158829637
chr7 158835676 158835796
chr7 158835748 158835868
chr7 158851160 158851280
chr7 158896436 158896556
chr7 158902496 158902616
chr7 158935127 158935247
chr7 158937377 158937497

whereas the length of chr7 in hg18 was 158821424
Was the whole-exome capture kit (the one for 38Mb) designed on hg18 or hg19 human reference. We were told it was hg18.
I'm very confused...

Regards,

S.
Sheila is offline   Reply With Quote
Old 11-08-2010, 05:56 AM   #12
adamdeluca
Member
 
Location: Iowa City, IA

Join Date: Jul 2010
Posts: 95
Default

Quote:
Originally Posted by Sheila View Post
Hi there,
chr7 158937377 158937497

whereas the length of chr7 in hg18 was 158821424
Was the whole-exome capture kit (the one for 38Mb) designed on hg18 or hg19 human reference. We were told it was hg18.
It is designed to hg19. From my hg19 annotation file:
chr7 158937377 158937497 A_36_B106723 1000 +

You can get the original file on the agilent earray site
earray.chem.agilent.com/earray/
adamdeluca is offline   Reply With Quote
Old 03-09-2011, 04:21 AM   #13
chariko
Member
 
Location: Spain

Join Date: Jun 2010
Posts: 56
Default

Quote:
Originally Posted by Sheila View Post
Hi there,
Does anyone have some numbers on % of coverage for each individual chromosome? I got some weird results for chr7. Looking at the file Agilent provided us with the probe design (based on hg18) I saw that the last coordinates for chr7 were
chr7 158828612 158828732
chr7 158829397 158829517
...

S.
Hi Sheila,

I contacted Agilent and they didn't send me the reference file for the exome. As I read, it seems that they did send you this file, so could you please send me your reference file? As I read you used the whole-exome capture kit (the one for 38Mb) in your analysis.

Thanks
chariko is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 10:12 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO