![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Samtools flagstat on Tophat generated .bam file | rg_gis | Bioinformatics | 6 | 02-27-2015 09:35 PM |
How to use Picard's MarkDuplicates | cliff | Bioinformatics | 12 | 01-26-2015 10:56 PM |
Single end RNAseq- no BAM file generated | ibn.adam | RNA Sequencing | 0 | 10-28-2011 06:53 AM |
MarkDuplicates in picard | bair | Bioinformatics | 3 | 12-23-2010 11:00 AM |
Picard MarkDuplicates | wangzkai | Bioinformatics | 2 | 05-18-2010 09:14 PM |
![]() |
|
Thread Tools |
![]() |
#1 |
Junior Member
Location: NY Join Date: Apr 2010
Posts: 6
|
![]()
Dear members,
When I run Picard MarkDuplicates, in my understanding, if I choose to keep the duplicates, they will be flagged in my BAM output file. The question is how to find records marked "duplicates" in the output BAM file ??? Details: I ran the command like this: java -jar MarkDuplicates.jar INPUT=sorted.bam OUTPUT=sorted.flag_dup.bam METRICS_FILE=metrics.txt REMOVE_DUPLICATES=false ASSUME_SORTED=true It runs and generate the sorted.flag_dup.bam in which duplicates must be flagged in one of optional field According to specs, optional fields are in the format: <TAG>:<VTYPE>:<VALUE> and 0x0400 means that the read is either a PCR duplicate or an optical duplicate. My optional fields look like that: XT:A:R NM:i:1 SM:i:0 AM:i:0 X0:i:454 XM:i:1 XO:i:0 XG:i:0 MD:Z:31A5 The site http://picard.sourceforge.net/explain-flags.html tells me that read is PCR or optical duplicate is '1024' in Integer format I am searching resulting file i:1024 and find some X0:i:1024 and some X1:i:1024 but it is only about 200 of them, whereas the metrics said I have 131000 duplicates, which is likely true. Could someone kindly explain how do I find the duplicate records in BAM/SAM file generated by Picard? Thank you very much in advance Vlad Last edited by makarovv; 11-09-2010 at 01:59 PM. |
![]() |
![]() |
![]() |
#2 |
Senior Member
Location: 41°17'49"N / 2°4'42"E Join Date: Oct 2008
Posts: 323
|
![]()
read (or re-read) carefully the bam specification.
Reads are flagged with 0x400 (check section 2.2.2.) if they are considered duplicates. So you have to filter those ones out. With samtools: $ samtools view -F 1023 your_bam > your_bam.without_dups.sam
__________________
-drd |
![]() |
![]() |
![]() |
#3 |
Junior Member
Location: NY Join Date: Apr 2010
Posts: 6
|
![]()
Dear Drio,
It is samtools view -F 0x400 sorted.mark_dup.bam > rm_dup.sam to output those reads marked duplicates. Thank you for pointing me in right direction |
![]() |
![]() |
![]() |
#4 |
Senior Member
Location: 41°17'49"N / 2°4'42"E Join Date: Oct 2008
Posts: 323
|
![]()
I made a typo while replying, you can both provide the decimal or the hexadecimal representation of the number that encodes your desire filtering.
__________________
-drd |
![]() |
![]() |
![]() |
#5 |
Junior Member
Location: NY Join Date: Apr 2010
Posts: 6
|
![]()
Yes, thanks, I figured out:
samtools view -F 0x40 - to filter out duplicates (create rmdup sam file) samtools view -f 0x40 - to filter out evrything else, but duplicates (save duplicates in separate sam file) I wish samtools authors provided more clear examples |
![]() |
![]() |
![]() |
#6 |
Senior Member
Location: 41°17'49"N / 2°4'42"E Join Date: Oct 2008
Posts: 323
|
![]()
0x400 not 0x40
__________________
-drd |
![]() |
![]() |
![]() |
#7 |
Junior Member
Location: NY Join Date: Apr 2010
Posts: 6
|
![]()
Yes, of course. My typo
|
![]() |
![]() |
![]() |
Thread Tools | |
|
|