sowmyai 05-05-2014 06:01 PM

How does BWA identify PCR duplicates correctly?
Does it use the usual "same-start-position-for-both-paired-ends" approach? If so, why do many users do this check manually or use picard and the like to do this check instead of just filtering by this flag in the CIGAR string? Am I missing something?

swbarnes2 05-06-2014 12:13 PM

The flag in the cigar string is left 0 by default, and is only changed if a piece of software changes it. I'm pretty sure bwa does not check for this, so the fact that the flag is unset by bwa doesn't means that bwa actually determined that the read is not a PCR duplicate.

