Unconfigured Ad

**What_Da_Seq** · 11-18-2009, 07:09 AM

It is of course a statistical problem. What if you adjust your read coverage (not your proportions) to a lower perhaps even consistent level - essentially taking read coverage out of the equation. Just speculating here.

**dwmohr** · 11-18-2009, 10:46 AM

Have you tried filtering your sequences for duplicates? We find this essential when dealing with long range pcr libraries. We've used bwa/picard/samtools and the FASTX toolkit/CLC bio with success.

**nilshomer** · 11-18-2009, 06:01 PM

Originally posted by dwmohr View Post

Have you tried filtering your sequences for duplicates? We find this essential when dealing with long range pcr libraries. We've used bwa/picard/samtools and the FASTX toolkit/CLC bio with success.

How do you identify duplicates if you expect at least two reads to have the same starting position? Even when you enforce both ends must have the same starting position with >1500X coverage you would expect to have two reads have both ends have the same starting position.

Anybody have any other ideas to identify PCR duplicates on high coverage data? I don't think it is possible.

**simonandrews** · 11-19-2009, 01:15 AM

Originally posted by nilshomer View Post

Anybody have any other ideas to identify PCR duplicates on high coverage data? I don't think it is possible.

I suppose you'd have to take an observed/expected approach. If you know the number and size distribution of your sequences you can work out the likelyhood of exact ovelaps of different depths (assuming reads are randomly distributed). Anything falling too far from the expected range would be suspicious.

You could also maybe look at the ratio of exact overlaps to non-exact overlaps. If you have a region composed mostly of exact overlaps then that's not right for a randomly fragmented library. This should work even with unevenly distributed reads.

Neither of these are going to detect small PCR effects, but normally we'd expect that when the PCR goes wrong it often goes very wrong - and those are the problems we're more interested in sorting out.

**nilshomer** · 11-19-2009, 01:27 AM

Originally posted by simonandrews View Post

I suppose you'd have to take an observed/expected approach. If you know the number and size distribution of your sequences you can work out the likelyhood of exact ovelaps of different depths (assuming reads are randomly distributed). Anything falling too far from the expected range would be suspicious.

You could also maybe look at the ratio of exact overlaps to non-exact overlaps. If you have a region composed mostly of exact overlaps then that's not right for a randomly fragmented library. This should work even with unevenly distributed reads.

Neither of these are going to detect small PCR effects, but normally we'd expect that when the PCR goes wrong it often goes very wrong - and those are the problems we're more interested in sorting out.

That should work. I am also thinking about clonal reads for SOLiD data. In this case, it wont be as bad as when things go wrong with PCR in prep.

Topics	Statistics	Last Post
Sequencing the Two-Toed Sloth Genome Reveals Jumping Genes Tied to Its Extreme Metabolism by SEQadmin2 Started by SEQadmin2, 06-09-2026, 11:58 AM	0 responses 24 views 0 reactions	Last Post by SEQadmin2 06-09-2026, 11:58 AM
A New Method Makes Hantavirus Genome Analysis Faster and More Accessible by SEQadmin2 Started by SEQadmin2, 06-05-2026, 10:09 AM	0 responses 29 views 0 reactions	Last Post by SEQadmin2 06-05-2026, 10:09 AM
A New Single-Cell Method Maps DNA-Protein Interactions by SEQadmin2 Started by SEQadmin2, 06-04-2026, 08:59 AM	0 responses 39 views 0 reactions	Last Post by SEQadmin2 06-04-2026, 08:59 AM
Long-Read RNA Sequencing Uncovers a Hidden Layer of Immune Cell Regulation by SEQadmin2 Started by SEQadmin2, 06-02-2026, 12:03 PM	0 responses 61 views 0 reactions	Last Post by SEQadmin2 06-02-2026, 12:03 PM

Unconfigured Ad

Very high depth of coverage

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News