Dear members,
When I run Picard MarkDuplicates, in my understanding, if I choose to keep the duplicates, they will be flagged in my BAM output file.
The question is how to find records marked "duplicates" in the output BAM file ???
Details: I ran the command like this:
java -jar MarkDuplicates.jar INPUT=sorted.bam OUTPUT=sorted.flag_dup.bam METRICS_FILE=metrics.txt REMOVE_DUPLICATES=false ASSUME_SORTED=true
It runs and generate the sorted.flag_dup.bam in which duplicates must be flagged in one of optional field
According to specs, optional fields are in the format: <TAG>:<VTYPE>:<VALUE> and
0x0400 means that the read is either a PCR duplicate or an optical duplicate.
My optional fields look like that:
XT:A:R NM:i:1 SM:i:0 AM:i:0 X0:i:454 XM:i:1 XO:i:0 XG:i:0 MD:Z:31A5
The site http://picard.sourceforge.net/explain-flags.html tells me that
read is PCR or optical duplicate is '1024' in Integer format
I am searching resulting file i:1024 and find some X0:i:1024 and some X1:i:1024
but it is only about 200 of them, whereas the metrics said I have 131000 duplicates, which is likely true.
Could someone kindly explain how do I find the duplicate records in BAM/SAM file generated by Picard?
Thank you very much in advance
Vlad
When I run Picard MarkDuplicates, in my understanding, if I choose to keep the duplicates, they will be flagged in my BAM output file.
The question is how to find records marked "duplicates" in the output BAM file ???
Details: I ran the command like this:
java -jar MarkDuplicates.jar INPUT=sorted.bam OUTPUT=sorted.flag_dup.bam METRICS_FILE=metrics.txt REMOVE_DUPLICATES=false ASSUME_SORTED=true
It runs and generate the sorted.flag_dup.bam in which duplicates must be flagged in one of optional field
According to specs, optional fields are in the format: <TAG>:<VTYPE>:<VALUE> and
0x0400 means that the read is either a PCR duplicate or an optical duplicate.
My optional fields look like that:
XT:A:R NM:i:1 SM:i:0 AM:i:0 X0:i:454 XM:i:1 XO:i:0 XG:i:0 MD:Z:31A5
The site http://picard.sourceforge.net/explain-flags.html tells me that
read is PCR or optical duplicate is '1024' in Integer format
I am searching resulting file i:1024 and find some X0:i:1024 and some X1:i:1024
but it is only about 200 of them, whereas the metrics said I have 131000 duplicates, which is likely true.
Could someone kindly explain how do I find the duplicate records in BAM/SAM file generated by Picard?
Thank you very much in advance
Vlad
Comment