SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > Illumina/Solexa



Similar Threads
Thread Thread Starter Forum Replies Last Post
PubMed: High-throughput, high-fidelity HLA genotyping with deep sequencing. Newsbot! Literature Watch 0 08-26-2012 12:00 AM
ChIP-Seq: Enabling Data Analysis on High-Throughput Data in Large Data Depository Usi Newsbot! Literature Watch 0 12-15-2010 02:00 AM

Reply
 
Thread Tools
Old 03-01-2017, 09:24 PM   #1
hinkwok
Junior Member
 
Location: Hong Kong

Join Date: Jun 2011
Posts: 4
Default IDT exome panel - high duplication rates with high data throughput

Hi All,

We did an exome capture using the IDT exome panel, 12 samples per capture. To increase the data throughput per sample, the same capture pool is sequenced in two PE100 HiSeq lanes.

When the data of only one lane is analyzed, the raw coverage is 96X, while usable reads (mapped to exome and removed duplicates) covered 55X (duplication rate = 20%). Yet when data of both lanes are analyzed together, raw coverage is roughly doubled (192X), but the usable reads became 88X (less than double of 55X), and duplication rate raised to 40%.

Does any of you have experience in using IDT exome panel for capture? What are the percentage of usable reads, and duplication rate, respectively?

Are there better ways to increase data throughput, while retaining high usable reads coverage?

Million thanks!
Hin
hinkwok is offline   Reply With Quote
Old 03-01-2017, 10:28 PM   #2
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,453
Default

Could you explain your methodology in more detail? For example, the sequencing platform, run mode, library preparation, origin of the samples, and so forth. It sounds like you are finding duplicate reads between different libraries, which is not a useful approach.
Brian Bushnell is offline   Reply With Quote
Old 03-01-2017, 10:54 PM   #3
hinkwok
Junior Member
 
Location: Hong Kong

Join Date: Jun 2011
Posts: 4
Default

Hi Brian,

We did the sequencing on HiSeq 1500, paired-end 100bp. Library preparation was done by KAPA Hyper Prep Kit. And the samples are frozen tissues.

To clarify, the same input captured libraries were loaded into both lanes, so all lab conditions should be consistent between the two. Just that when we analyze one lane on its own, verses pooling the data of both lanes and analyze, there are discrepancies in usable read % and duplication rates.

I suspect that there are molecules unique in one lane, but not when considered both lanes together (hence classified into duplicate reads).

So would there be ways to increase data throughput and retain the high usable reads coverage? Would less samples per capture reaction helps?

Thanks,
Hin
hinkwok is offline   Reply With Quote
Old 03-02-2017, 02:02 AM   #4
nucacidhunter
Senior Member
 
Location: Iran

Join Date: Jan 2013
Posts: 907
Default

I do not have experience with IDT exome capture but their product specification should have stats on capture efficiency such as %mappable reads, % on target reads, %targets recovered and so on. You capture efficiency is less than what I have seen with SureSelect products.

Duplicate rate depend on library diversity (unique fragments) and sequencing depth. For instance, a library with 10M unique reads will have less duplicate when sequenced reads are 10M in comparison to if it was sequenced to 50M.

To increase sequencing depth without increasing duplicates the library diversity need to be increased and is dependent on the diversity of library going through capture reactions, capture probe design and post capture amplification. Easiest but costly approach would be to duplicate the whole capture process.
nucacidhunter is offline   Reply With Quote
Old 03-02-2017, 03:11 AM   #5
Bukowski
Senior Member
 
Location: UK

Join Date: Jan 2010
Posts: 331
Default

Quote:
Originally Posted by hinkwok View Post
So would there be ways to increase data throughput and retain the high usable reads coverage? Would less samples per capture reaction helps?
Does this imply you did pre-capture pooling? It could be that post-capture PCR has had to have a few extra rounds, perhaps the hybridisations didn't yield well enough - that will increase your duplicates.

This is higher than I have seen for other IDT panels though, which makes me think library prep is the issue here.
Bukowski is offline   Reply With Quote
Old 03-02-2017, 08:39 AM   #6
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,453
Default

Thanks for explaining the procedure in more detail.

Quote:
Originally Posted by hinkwok View Post
When the data of only one lane is analyzed, the raw coverage is 96X, while usable reads (mapped to exome and removed duplicates) covered 55X (duplication rate = 20%). Yet when data of both lanes are analyzed together, raw coverage is roughly doubled (192X), but the usable reads became 88X (less than double of 55X), and duplication rate raised to 40%.
This is expected, when you have highly PCR-amplified libraries. As nucacidhunter stated, with a fixed number of unique molecules, the more deeply you sequence their clones, the more duplicates you will find. To avoid this, it's helpful to start with more DNA and do less amplification, though I don't know what the constraints are of the IDT kit.

I highly recommend against mapping reads to the the exome, in any situation. This leads to false positive variant calls. Exome-capture data should be mapped to the genome. You will find that actually a substantial portion of the reads map to the genome outside of the baited regions, and these can often be useful (particularly when they map to pseudogenes that look like the baits).

Incidentally, you can remove duplicates using Clumpify prior to mapping, which will reduce the mapping time substantially when you have a high rate of duplicates. The command would be something like:

Code:
clumpify.sh in1=r1.fq.gz in2=r2.fq.gz out1=clumped1.fq.gz out2=clumped2.fq.gz dedupe subs=5
Brian Bushnell is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 06:28 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO