View Single Post
Old 08-05-2011, 06:04 AM   #3
fabrice
Member
 
Location: paris

Join Date: Oct 2009
Posts: 86
Default

Thank you.

So does it mean that Fastqc more closer to the truth?

Picard worked as this:

Q: How does MarkDuplicates work?
A: Essentially what it does (for pairs; single-end data is also handled) is to find the 5' coordinates and mapping orientations of each read pair. When doing this it takes into account all clipping that has taking place as well as any gaps or jumps in the alignment. You can thus think of it as determining "if all the bases from the read were aligned, where would the 5' most base have been aligned". It then matches all read pairs that have identical 5' coordinates and orientations and marks as duplicates all but the "best" pair. "Best" is defined as the read pair having the highest sum of base qualities as bases with Q >= 15.

If your reads have been divided into separate BAMs by chromosome, inter-chromosomal pairs will not be identified, but MarkDuplicates will not fail due to inability to find the mate pair for a read.
fabrice is offline   Reply With Quote