SEQanswers

SEQanswers (http://seqanswers.com/forums/index.php)
-   Genomic Resequencing (http://seqanswers.com/forums/forumdisplay.php?f=28)
-   -   Pre-Capture Pooling with Nimblegen SeqCap EZ v3: SNP detection quality? (http://seqanswers.com/forums/showthread.php?t=17256)

EvilTwin 01-31-2012 05:26 AM

Pre-Capture Pooling with Nimblegen SeqCap EZ v3: SNP detection quality?
 
Dear all,

Does anyone have already some experience with the new Nimblegen SeqCap EZ v3 targeted Exome Enrichment kit, concerning pre-capture pooling?
The kit explicitly supports pre-capture pooling of samples (barcoded) and a subsequent pooled targeted enrichment.
(In a test at my institution, The v3 kit itself performed satisfactorily concerning on-target rate and target coverage).

Now I am especially looking for information concerning the specificity of SNP-calls when performing such a pre-capture pooling approach.

The enrichment involves some PCR cycles when the samples are already pooled (as far as I gathered), and I wondered if or to what degree cross-hybridization between the captured fragments of different samples occurs or can occur at this step. However, my knowledge (and so far also my understanding) about this technology is limited.

If such cross-hybridization occurs, wouldnít the specificity of the SNP calls become much worse compared to a single sample enrichment? I would expect that there is a considerable fraction of reads from a specific sample (assigned via the sample-specific barcodes) which bear SNPs or InDels stemming from cross-hybridization events with the fragments from another sample.

Or do I maybe misunderstand something in general, and cross-hybridization cannot happen? Or can those events easily be identified and filtered? Or does that only play a very minor role and can be neglected?

I asked Nimblegen about it... They seem to have a hard time to find someone in the company who can provide any informaton on that topic (I am asking repeatedly and waiting for weeks now). I also haunted the competitors from Agilent, but they just stated (of course) that the SNP detection specificity with the Nimblegen pre-capture pooling enrichment is worse than with their enrichment with the Human All Exon 50 MB or v4 kits, and that they would advise against pre-capture pooling with their kits, but they did not provide any arguments or data.

I studied the publications on in-solution targeted exome enrichment kit comparisons (see this thread: http://seqanswers.com/forums/showthread.php?t=14617), and they largely agree that the Nimblegen capture probe design (DNA probes, shorter, but many) is in the end slightly more efficient than Agilentís design (RNA probes and longer, meaning higher binding specificity, but fewer of them) for SNP calling (Agilent won for the overall detection counts because of the larger target region compared to the older Nimblegen kits).
Although also the older Nimblegen version (v2) apparently also supports pre-capture enrichment, the two studies that compared that kit with Agilentís SureSelect Human All Exon 50 MB both used single sample enrichment as far as I can tell.

I should mention that I am comparably new to NGS, and I am an end user, but I tried getting myself read into field as good as possible during the past few months (btw, SeqAnswers was a great help). I am not directly involved in the practical steps concerning enrichment and sequencing, which is done by our NGS core facility. However, they also cannot answer the pre-capture enrichment question.

I thoroughly searched for available information on the topic in the web and on SeqAnswers, but I couldnít find any. If I missed or completely misunderstood something, I would be glad if you could point me towards it.

Any information or opinion is very welcome!

Thanks a lot!

Heisman 01-31-2012 08:49 AM

You might want to read through this paper: http://nar.oxfordjournals.org/content/40/1/e3.abstract

I have never used Nimblegen so I can't comment specifically regarding it.

sehrrot 06-24-2012 10:20 PM

Hi Eviltwin

Could you share how much on-target and avg coverage you get from SeqCap EZ v3?

I am about to get the result using that kit and I need a sort of comparison with other's data. I've previously done SeqCap EZ V2 exome with post-multiplexing and got around 70% on target and 100X coverages. Thus, I am expecting that V3 should be higher than 70% on target rate and 70X coverages (64Mb vs. 40 Mb target regions) in the end.

EvilTwin 06-26-2012 12:53 AM

Hi sehrrot,

how do you exactly calculate your on-target and coverage data? Maybe I can learn something :-)

We tried the SeqCap EZ 3.0 and got ~ 70x average coverage per Exome when pooling 4 samples on an Illumina HiSeq 2000 lane (GATK DoC Walker), with ~ 60 % bases strictly on-target (Picard CollectHSMetrics with the supplied "capture.bed" as target/bait file).

I had a talk with some Roche/Nimblegen guys a while ago, and they stated that Nimblegen calculate their on-target values by defining anything within 150 bp +/- the target regions as on-target, which of course increases that value. Also, the primary focus in developing SeqCap V3 was to increase the target region size, while it is allegedly very hard to increase the on-target efficiency. So I would not expect that there is much difference to V2 in that respect.

EvilTwin 06-26-2012 01:03 AM

@ Heisman,

I just realized I never thanked you for the hint on that paper, but still… thanks a lot!

As I understand, our sequencing core facility had at that time already applied the (or at least some) double-indexing method, so erroneous read assignment should be reduced.

As for the pre-capture pooling, there doesn’t seem to be artificial variant enrichment in the capture pools (e.g. if one sample bears a rare variation, it stays specific for that sample and doesn't show up in the others).

sehrrot 07-04-2012 05:03 AM

Hi EvilTwin

I just got my nimblegen v3 exome data from the sequencer. But I am shocked when I checked the seuqence duplication level on the FastQC, which is nearly 50%... I will do mapping onward and check it how good the sequencing quality is..

EvilTwin 07-05-2012 08:07 AM

Hi sehrrot,

that is strange, we typically got 5-10 % (MarkDuplicates in Picard Tools). One run was exceptional with 25 % duplication, but as I understand there was some technical problem...

sehrrot 07-05-2012 04:41 PM

Hi EvilTwin

I think so. I am still waiting my pipeline for on-target rate and coverages. Duplicate level on Picard is around 20-25%, which is lower than FastQC one but higher than previously I've seen in the NimbleGen exome V2 (which as around 5-8%). I've done NimbleGen V3 exome with pre-multiplexing as I've done this for V2 as well (the performance actually same between pre-multiplexing and post-multiplexing; I've tested with NimbleGen, Illumina and also compared with Agilent post-multiplexing) and got the nice result. I am still not sure why the duplicate level is high ..

Anyway thanks for your reply.

EvilTwin 07-06-2012 01:55 AM

Hi sehrrot,

Initially we also compared Agilent50MB post-capture multiplexing and NimbleGen V3 pre-capture multiplexing on the same set of samples, and there were no dramatic differences concerning duplication in that run (5 vs 6 %), but NimbleGen doing slightly better concerning on-target rate and also per-interval coverage (and naturally overall coverage due to the larger target size).
I checked some samples with FastQC, it also displays higher duplication levels, around 15-20 % (so perhaps much of it comes from unmapped reads?)

I will ask the sequencing core facility what exactly the problem was with the run displaying the exceptional high level of duplication

sehrrot 07-06-2012 04:05 AM

Hi EvilTwin

Thanks for your sharing. I am guessing my sample would have the problem in sample prep or capture efficiency. Otherwise, it might be a problem in my HiSeq, as I've experienced in the dramatic quality drops after the cycle 80 and subsequently got a loss of cluster in read 2.

Anyway, apart from that, I think I got an answer why higher duplicate level in Fastqc than picard. FastQc find the identical sequences compared to others but they could be an enriched fragments, not solely for duplicates. Thus, Picard take information of paired end and if two sequences are identical as well as having the same start position of paired end sequences, Picard calls them as duplicates.


All times are GMT -8. The time now is 12:10 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.