Seqanswers Leaderboard Ad

**dpryan** · 07-20-2015, 04:29 AM

As I mentioned in your other thread, most variant callers are aware of the concept of paired-end reads and will deal with them in a coherent manner. You're unlikely to gain anything by merging overlapping regions for your use case.

Regarding 200bp SE with reads vs. 100bp PE sequencing, the nicest results will probably vary by facility. For our internally produced data I think the 100bp PE reads might produce slightly better results, but I suppose one would need to do the comparison to really say for sure.

**evakoe** · 07-20-2015, 04:41 AM

Originally posted by dpryan View Post

most variant callers are aware of the concept of paired-end reads and will deal with them in a coherent manner.

I see your point, thanks for sharing. Still, I wonder if the extra accuracy that I gain by the overlapping reads is worth the additional sequencing. But that is probably something that everybody has to decide for themselves.

**GenoMax** · 07-20-2015, 05:04 AM

Originally posted by evakoe View Post

I see your point, thanks for sharing. Still, I wonder if the extra accuracy that I gain by the overlapping reads is worth the additional sequencing. But that is probably something that everybody has to decide for themselves.

The price differential is small between SE and PE reads (based on a casual search for exome sequencing for a single sample on GenoHub.com). If you were doing thousands of samples then it may become a consideration but then you would be negotiating directly with the sequencing center.

**evakoe** · 07-20-2015, 05:24 AM

Originally posted by GenoMax View Post

The price differential is small between SE and PE reads

Ok, let's assume that the price for SE and PE for a given read length and number of reads is identical. My point was that I might get more data out of SE, since I don't "loose" bases to the overlapping PE. As a consequence, PE is more expensive to get to the same final coverage.

**dpryan** · 07-20-2015, 05:27 AM

While you lose a few bases with PE data you also probably have a slightly higher alignment rate, so the actual effective coverage is probably not that different.

**evakoe** · 07-20-2015, 05:40 AM

Originally posted by dpryan View Post

While you lose a few bases with PE data you also probably have a slightly higher alignment rate, so the actual effective coverage is probably not that different.

That would be very easy to test. I can just remove the paired end information and treat my reads as SE and compare the alignment accuracy. I will do that and post the results.

Unfortunately, the overlap is much more than a few bases. Imagine the PE reads with adapters already removed. When I now merge the overlapping reads and then count the number of bases, I have 30% less bases than without the merging.

**dpryan** · 07-20-2015, 05:41 AM

Given that, your SE reads would be heavily contaminated with adapters and similar junk, so again I doubt you'd be gaining anything with SE reads.

**evakoe** · 07-20-2015, 05:45 AM

Originally posted by dpryan View Post

Given that, your SE reads would be heavily contaminated with adapters and similar junk, so again I doubt you'd be gaining anything with SE reads.

But would SE reads be more heavily contaminated with adapters then PE reads? With the PE data, 11% of the bases from the raw reads are adapters, though there is quite some variability per sample.

**dpryan** · 07-20-2015, 05:48 AM

Yes, they'd be even more contaminated than PE samples, since you're likely starting with essentially the same fragment size pool.

**evakoe** · 07-20-2015, 05:52 AM

Originally posted by dpryan View Post

Yes, they'd be even more contaminated than PE samples, since you're likely starting with essentially the same fragment size pool.

I didn't know this. Thank you for your input.

**Brian Bushnell** · 07-20-2015, 08:05 AM

Originally posted by evakoe View Post

That would be very easy to test. I can just remove the paired end information and treat my reads as SE and compare the alignment accuracy. I will do that and post the results.

Unfortunately, the overlap is much more than a few bases. Imagine the PE reads with adapters already removed. When I now merge the overlapping reads and then count the number of bases, I have 30% less bases than without the merging.

PE reads are better than SE in every way, if you want to align and call variations. Not only will they have a marginally higher alignment rate (and potentially noticeably higher in areas with more significant mutations), they will be substantially more accurate. So you increase true-positives and decrease false-positives... at the same time that you gain a substantially improved ability to detect structural variations, and to trim adapters (which works much better with PE reads). Furthermore, if you decide to do duplicate removal (which is a good idea for an amplified library), PE libraries will have a much lower duplicate-removal rate because it's possible to measure the insert size of a pair, and thus determine more accurately whether it is a duplicate or just a fragment that happens to start at the same location.

If you worry about wasting bases in the overlap (which is not really a waste), then just aim for longer fragments. Exon capture does not capture fragments matching the bounds of exons; it enriches for any fragments that hybridize to baits that are designed to contain sequence of or around the exon targets. That means if you have one or more baits designed to cover a 200bp exon, and a 500bp fragment that contains the exon, they can still hybridize. I don't know about that specific kit, but all the exon-capture data I've seen - from several different kits - had coverage extending well out into the introns.

**evakoe** · 07-21-2015, 12:50 AM

Originally posted by Brian Bushnell View Post

PE reads are better than SE in every way, if you want to align and call variations. Not only will they have a marginally higher alignment rate (and potentially noticeably higher in areas with more significant mutations), they will be substantially more accurate. So you increase true-positives and decrease false-positives... at the same time that you gain a substantially improved ability to detect structural variations, and to trim adapters (which works much better with PE reads). Furthermore, if you decide to do duplicate removal (which is a good idea for an amplified library), PE libraries will have a much lower duplicate-removal rate because it's possible to measure the insert size of a pair, and thus determine more accurately whether it is a duplicate or just a fragment that happens to start at the same location.

A quick test of treating my PE data as SE did not show a decrease in alignment efficiency, but a slight increase in the general error rate, the number of mismatches and indels. But I agree that PE is more accurate in general, I think enough publications have shown this.

Originally posted by Brian Bushnell View Post

If you worry about wasting bases in the overlap (which is not really a waste), then just aim for longer fragments.

I also arrived at this conclusion and I am already discussing with the wet lab people on how we can implement this. But I am glad to hear that you don't consider the overlap a waste, maybe I did not appreciate the increase in accuracy enough. Thank you.

**Brian Bushnell** · 07-21-2015, 09:01 AM

Originally posted by evakoe View Post

A quick test of treating my PE data as SE did not show a decrease in alignment efficiency, but a slight increase in the general error rate, the number of mismatches and indels. But I agree that PE is more accurate in general, I think enough publications have shown this.

I should have mentioned that this is aligner-specific - PE reads will only map at a higher rate than SE reads if you use an aligner with a "rescue" operation, which uses a mapped read as an anchor to look for a mapping location of the unaligned mate, which did not initially align due to a high error rate or major mutations; or, aligners that allow lower-scoring mappings for properly-paired reads. Aligners that internally do not do these things will generally have identical mapping rates of PE reads when you treat them as SE.

**evakoe** · 07-22-2015, 12:35 AM

Originally posted by Brian Bushnell View Post

If you worry about wasting bases in the overlap (which is not really a waste), then just aim for longer fragments.

Aiming for longer fragments increases the number of reads/bases that are off-target. Likely there is an optimal fragment length, but I don't think that our lap could produce these reliably anyway.

Topics	Statistics	Last Post
A Closer Look at the Enigmatic Genomes of Oikopleura dioica by seqadmin Started by seqadmin, Today, 06:35 AM	0 responses 12 views 0 likes	Last Post by seqadmin Today, 06:35 AM
Advanced Epigenome Editing Platform Explores Gene Regulation Mechanisms by seqadmin Started by seqadmin, Yesterday, 02:46 PM	0 responses 18 views 0 likes	Last Post by seqadmin Yesterday, 02:46 PM
Telomere Maintenance by PARP1: A New Perspective in Cancer Research by seqadmin Started by seqadmin, 05-07-2024, 06:57 AM	0 responses 17 views 0 likes	Last Post by seqadmin 05-07-2024, 06:57 AM
Enhanced Neoantigen Detection: Introducing NeoHunter by seqadmin Started by seqadmin, 05-06-2024, 07:17 AM	0 responses 18 views 0 likes	Last Post by seqadmin 05-06-2024, 07:17 AM

Seqanswers Leaderboard Ad

Announcement

overlapping paired-end reads vs single end reads

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News