View Single Post
Old 06-27-2014, 09:14 AM   #2
Brian Bushnell
Super Moderator
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707

If you had low library complexity due to insufficient DNA, overamplification, contamination, or highly-biased capture, a lot of duplicates will be present. Sounds like there were problems with your library prep and maybe it should be redone; that level is much higher than I'd expect. But, just run pileup and see if there is enough coverage for whatever you're doing, which depends on the fraction of the area covered to at least X depth rather than the average coverage.

PCR duplicates should be removed before calling variations. But I would suggest removing only exact duplicate reads, rather than anything mapping to the same location even if they have some different base calls. And just to clarify, are these paired reads that you're removing based on both reads mapping to the same location?

Removing duplicates rather than marking them is more efficient as downstream programs don't need to process as much data. But, you can use marked duplicates to generate consensus if you want, when reads are low quality.

I use the unique coverage when calling variations as it's more relevant.
Brian Bushnell is offline   Reply With Quote