Seqanswers Leaderboard Ad

**bioinfosm** · 03-11-2010, 09:34 AM

Originally posted by upenn_ngs View Post

Another factor, many GC rich regions are dropped from both whole genome sequencing as well as the exome capture. This image from the Broad.

http://www.postimage.org/image.php?v=aV6cBnA

Do you know what the WGS track is, and from where one can obtain it for the IGV?

**NGSfan** · 03-17-2010, 02:16 AM

Originally posted by Xi Wang View Post

Oh. But if you have the data, you can try what just I mentioned.

And for PE reads, I don't think it can improve a lot. Because it is the DNA fragments that amplified. So the coverage should have some relationship with the GC-content of the DNA fragments. On the other hand, the read GC-content and the DNA fragment GC-content have a high correlation. As a result, the relationship between the read GC-content and the coverage reflects a lot the reality.

I'm not clear on why converting the read coverage to a log scale would help understand distribution better. Simply visualizing coverage on a log scale will simply change the scale you're looking at, no?

Maybe I'm not understanding the advantage. Could you show me an example?

**if1** · 03-17-2010, 02:17 AM

Originally posted by bioinfosm View Post

...
We notice that only 20% of reads map on-target! Is that a common thing? (Illumina 75bp PE)

Hi,
we have also run a TE using SureSelect with Illumina 76bp reads Single End and we obtained similar results to the Tewhey et al (Genome Biol 2009): 50% of uniquely aligned reads were on target with a uniformity of capture similar to what reported in the paper.
I am wondering if someone else has results on Illumina 76 Paired-End, as it seems from Agilent website that the % on target should increase from 50% to 70% using PE protocol.

Thanks

**NGSfan** · 03-17-2010, 02:33 AM

Originally posted by bioinfosm View Post

Another point which I did not notice here is, # of reads actually sequenced to get 30x exome coverage for the agilent capture stuff.

We notice that only 20% of reads map on-target! Is that a common thing? (Illumina 75bp PE)

Becareful with Tewhey's enrichment calculation, they calculated 48% "on or near target" meaning +/- 150bp around the target! If you look in the text, they actually only got 37% "on target" , so while your 20% is low, it's not terrible in comparison. How much of the genome are you targeting?

In our case with 76-bp single end reads, we targeted 0.09% of the genome and enriched to 35% of the sequences being "on target" , which is a ~390-fold enrichment. If you were to actually convert Tewhey's numbers to solely "on target" (from 0.12% to 37%), then their claim of "about 400 fold enrichment" is actually ~290-fold. Just a small criticism.

We have just completed a 76-bp paired end run with 4 samples multiplexed - I will let you know what we get with our alignment results

**bioinfosm** · 03-18-2010, 12:03 PM

Thanks NGSfan.

The on or near number, taking +/- 200bp goes to 15%, still pretty low I would guess.

It would be interesting to check, where all the rest 85% reads went!

**Xi Wang** · 03-19-2010, 08:55 AM

Originally posted by NGSfan View Post

I'm not clear on why converting the read coverage to a log scale would help understand distribution better. Simply visualizing coverage on a log scale will simply change the scale you're looking at, no?

Maybe I'm not understanding the advantage. Could you show me an example?

I did not have such data, so I can't show you examples.
But I noticed that the regions with high GC-content are less than the regions with average GC, as well as the low GC regions. That is to say there are more points in the figure around the average GC (x-axis). So it is more likely to have high coverage points in this part. This is what you saw in the figure. If you take log, the high-coverage points will decrease more than low-coverage do. This figure is more promising to reflect the nature of the relationship between read coverage and GC-content.

**bryan haffer** · 06-14-2010, 06:52 AM

If you want to be able to enrich repetitive regions without bias, look at RainDance Technologies. Using their RainStorm approach, you can design PCR primers to capture 99% or greater of your target regions. The technology will also provide better uniformity allowing for less sequencing than SureSelect.

Bio-Rad Laboratories

https://www.raindancetech.com

**NGSfan** · 06-14-2010, 08:00 AM

Hi Bryan,

Thanks for the suggestion. The Rain Dance approach seems to really be the better approach to handle these biased regions, albeit on a smaller scale.

For example, it doesn't scale very well for say, 1000 genes like the SureSelect, or if you want the whole exome, for example.

**ECO** · 06-14-2010, 09:17 AM

Originally posted by bryan haffer View Post

If you want to be able to enrich repetitive regions without bias, look at RainDance Technologies. Using their RainStorm approach, you can design PCR primers to capture 99% or greater of your target regions. The technology will also provide better uniformity allowing for less sequencing than SureSelect.

www.raindancetech.com

In the interest of full disclosure. In the future please make your affiliations more clear.

**DoubleA** · 03-17-2011, 04:15 PM

NGSfan,
Our lab has been using a custom SureSelect library to capture and sequence the extended HLA region (~8Mb). We have found that we get great coverage (>40X) over regions with <60% GC content while we get very poor coverage of regions with >60% GC content. I don't think that these results are out of the ordinary. I just came across the manuscript below...

Analyzing and minimizing PCR amplification bias in Illumina sequencing libraries - PubMed

http://www.ncbi.nlm.nih.gov/pubmed/21338519

Despite the ever-increasing output of Illumina sequencing data, loci with extreme base compositions are often under-represented or absent. To evaluate sources of base-composition bias, we traced genomic sequences ranging from 6% to 90% GC through the process by quantitative PCR. We identified PCR du …

They have shown that the PCR steps in the library constuction process create a huge bias against regions of high GC content. They have also shown how to resolve this problem. Check it out...

DoubleA

**Sam64** · 06-23-2011, 07:11 AM

DoubleA,
Thank you for your link, the manuscript is very interesting but it just refers to the Illumina Sureselect protocol.
Has somebody some experience for the Agilent sureselect protocol ? in particular for the enzyme Herculase II...Does this enzyme allow to restore the fragments with high GC percent (like the AccuPrime Taq HiFi used in the manuscript) ?
Sam64

**DoubleA** · 07-27-2011, 02:37 PM

Hello everyone,

I recently tried three different polymerases as well as different PCR conditions (increases in denaturation time) in an attempt to increase read coverage in regions with >60% GC content. I enriched 12 Illumina PE libraries with either AccuPrime, Phusion, or KAPA polymerase (4 with each polymerase). I should mention that I ligated the same adapter and incorporated a unique bar code for each library using the three primer enrichment approach (PE 1.0, PE 2.0, and a primer containing the bar code). Following library production, I pooled 4 libraries (all 4 created with the same polymerase) and performed a hybridization with a custom SureSelect bait library covering ~8Mb of the HLA region on human chromosome 6 (baits on ~3.8 Mb). Following hybridization and elution, I performed a final enrichment with primers covering the last 20bp of the 5' and 3' end of the Illumina libraries. All 3 pools (12 libraries) were mixed and run on a HiSeq lane for a single 40bp run. The coverage was of the target region was a bit low (10-50X) so we'll probably sequence these libraries in the future with a PE X 100bp run. I have attached graphs of read coverage vs %GC content (20bp window). As you can see, the coverage of the the GC rich regions is pretty similar for each polymerase and the increase in denaturation time per cycle did not help much either. Below is the % reads mapping to regions with GC content >60%. I thought some of you might be interested in these results.

Regards,
Double A

AccuPrime 15 second/cycle denaturation: 12.4%
AccuPrime 30 second/cycle denaturation: 12.9%
AccuPrime 45 second/cycle denaturation: 13.2%
AccuPrime 60 second/cycle denaturation: 14.4%

Phusion 15 second denaturation/cycle: 13.5%
Phusion 30 second denaturation/cycle: 16.4%
Phusion 45 second denaturation/cycle: 13.1%
Phusion 60 second denaturation/cycle: 12.6%

KAPA 15 second denaturation/cycle: 13.9%
KAPA 15 second denaturation/cycle: 10.7%
KAPA 15 second denaturation/cycle: 13.5%
KAPA 15 second denaturation/cycle: 11.6%

Attached Files

**arvi8689** · 11-07-2011, 01:58 PM

coverage calc

Hey

I am trying to sequence the exome and the capture kit is 100MB

The sequencing core promised 120 million reads per lane and we are using paired end 100bp reads and our fragment size is 250 basepairs.

My calculation was I will get 120 million reads * 200= 240 million bases read

so coverage= 240 million bases/100MB= 240x coverage (average)

But some people say I will get a coverage of only 120x. What could be the reason? Or is the coverage actually 240x?

**DoubleA** · 11-07-2011, 02:13 PM

Hi Arvi8689,

There are two things that will reduce your fold coverage with a exome capture experiment. First, at least 10% of your reads will be PCR duplicates and should be removed before alignment. Second, ~60-80% of your unique reads will be "on target". Therefore, it's likely that only 50% of your initial reads will be unique and map to your target.

Double A

Topics	Statistics	Last Post
Bacterial Timeline Study Suggests Oxygen Use Preceded Photosynthesis by seqadmin Started by seqadmin, Yesterday, 12:59 PM	0 responses 7 views 0 reactions	Last Post by seqadmin Yesterday, 12:59 PM
New Software Simplifies 3D Gene Expression Mapping by seqadmin Started by seqadmin, 04-02-2025, 10:17 AM	0 responses 9 views 0 reactions	Last Post by seqadmin 04-02-2025, 10:17 AM
AI Tool Creates High-Resolution 3D Maps of the Mouse Brain by seqadmin Started by seqadmin, 03-20-2025, 05:03 AM	0 responses 49 views 0 reactions	Last Post by seqadmin 03-20-2025, 05:03 AM
Studying Microbial Gene Transfer with RNA Barcoding by seqadmin Started by seqadmin, 03-19-2025, 07:27 AM	0 responses 60 views 0 reactions	Last Post by seqadmin 03-19-2025, 07:27 AM

Seqanswers Leaderboard Ad

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News