I am tying to use Picard MarkDuplicates to remove my pcr duplicates from a human rna-seq bam file. The run was paired-end but I only have about 30% properly paired (that is another story).
My command for picard was this:
The picard_info.txt file suggests I have 52% duplicates in my data. However, when I convert the output file to SAM I can see (by eye) that there are duplicates still there:
Original data:
Post picard:
I have also tried to set the REMOVE_DUPLICATES flag to false (I think this just flags them and leaves them in the new file rather than excluding them) but this gives me exactly the same result:
On all three example SAM file extracts you can clearly see that five reads on chr1 at start position 13484 are duplicates, and are paired with reads starting at 13572, yet all have been left in and I can't see that any flag had changed.
Am I doing something wrong? Please help!
Thanks
Helen
My command for picard was this:
PHP Code:
java -Xmx8 -jar /path/to/MarkDuplicates.jar INPUT=accepted_hits_sorted.bam OUTPUT=picard.bam METRICS_FILE=picard_info.txt REMOVE_DUPLICATES=true ASSUME_SORTED=true VALIDATION_STRINGENCY=LENIENT
Original data:
PHP Code:
1329_105_1480_F3 355 chr1 13484 1 50M = 13572 123 CAGCTGCACCACTGCCTGGCGCTGTGCCCTTCCTTTGCTCTGCCCGCTGG ,,@NPYG423BC553AC.2BPB;:7OH0.-=><1,I3!5=D<4)-OD=44 NM:i:0 NH:i:4 CC:Z:chr12 CP:i:92080 XS:A:+ HI:i:0
1863_1224_411_F3 99 chr1 13484 3 50M = 13572 123 CAGCTGCACCACTGCCTGGCGCTGTGCCCTTCCTTTGCTCTGCCCGCTGG UUZ_[][VXY[XRNOYJFURZZULJZ_ZOQ[SRRTW@CPBJHJWU\FAMM NM:i:0 NH:i:2 CC:Z:chr2 CP:i:114357483 XS:A:+ HI:i:0
1939_1338_752_F3 355 chr1 13484 1 50M = 13572 123 CAGCTGCACCACTGCCTGGCGCTGTGCCCTTCCTTTGCTCTGCCCGCTGG 66?=3CEA:=LIE?CH8?F@J;?G4@QH7;KEHJIE1@@4=AD9=P@8<< NM:i:0 NH:i:4 CC:Z:chr12 CP:i:92080 XS:A:+ HI:i:0
1942_131_1549_F3 355 chr1 13484 3 50M = 13572 123 CAGCTGCACCACTGCCTGGCGCTGTGCCCTTCCTTTGCTCTGCCCGCTGG ,,9:.A>-.34-./491+#&DA20)>?-,36,**8<%#++20..6I<-.. NM:i:0 NH:i:2 CC:Z:chr2 CP:i:114357483 XS:A:+ HI:i:0
2022_911_2004_F3 99 chr1 13484 3 50M = 13572 123 CAGCTGCACCACTGCCTGGCGCTGTGCCCTTCCTTTGCTCTGCCCGCTGG OOTYX[TKCHXUHIUVONHDTUPTOTXQKMWQPTRB/DH?QJKNNZJJQQ NM:i:0 NH:i:2 CC:Z:chr2 CP:i:114357483 XS:A:+ HI:i:0
486_302_1756_F3 99 chr1 13528 0 50M = 13615 122 CGCTGGAGACGGTGTTTGTCATGGGCCTGGTCTGCAGGGATCCTGCTACA __ZZZ[ZQWUVV?I]`TEQWYX<;RYH==GXI;5F];!(EYE0CV\D7NN NM:i:0 NH:i:5 CC:Z:chr12 CP:i:92036 XS:A:+ HI:i:0
667_379_431_F3 355 chr1 13528 0 50M = 13615 122 CGCTGGAGACGGTGTTTGTCATGGGCCTGGTCTGCAGGGATCCTGCTACA ``]_]][X^__WAIZ\K=RYYY??PRPORMLEB<DYM'*NYT;9R[E5@@ NM:i:0 NH:i:5 CC:Z:chr12 CP:i:92036 XS:A:+ HI:i:0
1481_526_56_F3 355 chr1 13528 0 50M = 13615 122 CGCTGGAGACGGTGTTTGTCATGGGCCTGGTCTGCAGGGATCCTGCTACA RRRNC==:HHC@3CG=:1<=BB2DMG>0:B70/.0@>!!/;K>/=B1)11 NM:i:0 NH:i:5 CC:Z:chr12 CP:i:92036 XS:A:+ HI:i:0
1631_628_1988_F3 355 chr1 13528 0 50M = 13615 122 CGCTGGAGACGGTGTTTGTCATGGGCCTGGTCTGCAGGGATCCTGCTACA ^^]``_NFW]^[IO_^V39UWN=JTVH6BSYB8HSYOD@>LR;9QY6!00 NM:i:0 NH:i:5 CC:Z:chr12 CP:i:92036 XS:A:+ HI:i:0
1934_635_52_F3 99 chr1 13528 0 50M = 13615 122 CGCTGGAGACGGTGTTTGTCATGGGCCTGGTCTGCAGGGATCCTGCTACA ZZ]\\[YZ^WVZHL_`^PQYVS<EYVQHNWWJ=1@V@9=9LSABRW2!22 NM:i:0 NH:i:5 CC:Z:chr12 CP:i:92036 XS:A:+ HI:i:0
2219_1966_235_F3 355 chr1 13528 0 50M = 13615 122 CGCTGGAGACGGTGTTTGTCATGGGCCTGGTCTGCAGGGATCCTGCTACA NNBAQRE9HJH6#CMNF5EFFA$0<AJ;9CE,!-?K0!22;M4+BC,%77 NM:i:0 NH:i:5 CC:Z:chr12 CP:i:92036 XS:A:+ HI:i:0
1329_105_1480_F5-BC 403 chr1 13572 1 35M = 13484 -123 GCTACAAAGGTGAAACCCAGGAGAGTGTGGAGTCC ::74<@':CAOIH9AHFUWL:FXE6?KNOUVCCKK NM:i:0 NH:i:4 CC:Z:chr12 CP:i:92007 XS:A:+ HI:i:0
1863_1224_411_F5-BC 147 chr1 13572 3 35M = 13484 -123 GCTACAAAGGTGAAACCCAGGAGAGTGTGGAGTCC __]XRR?G_\_a^^]]bcccbba_[[accccbabb NM:i:0 NH:i:2 CC:Z:chr2 CP:i:114357410 XS:A:+ HI:i:0
1939_1338_752_F5-BC 403 chr1 13572 1 35M = 13484 -123 GCTACAAAGGTGAAACCCAGGAGAGTGTGGAGTCC 44.$5H+6GIJFQHHTW[VVTOQGEJOWYVSSVYY NM:i:0 NH:i:4 CC:Z:chr12 CP:i:92007 XS:A:+ HI:i:0
PHP Code:
1329_105_1480_F3 355 chr1 13484 1 50M = 13572 123 CAGCTGCACCACTGCCTGGCGCTGTGCCCTTCCTTTGCTCTGCCCGCTGG ,,@NPYG423BC553AC.2BPB;:7OH0.-=><1,I3!5=D<4)-OD=44 NM:i:0 NH:i:4 CC:Z:chr12 CP:i:92080 XS:A:+ HI:i:0
1863_1224_411_F3 99 chr1 13484 3 50M = 13572 123 CAGCTGCACCACTGCCTGGCGCTGTGCCCTTCCTTTGCTCTGCCCGCTGG UUZ_[][VXY[XRNOYJFURZZULJZ_ZOQ[SRRTW@CPBJHJWU\FAMM NM:i:0 NH:i:2 CC:Z:chr2 CP:i:114357483 XS:A:+ HI:i:0
1939_1338_752_F3 355 chr1 13484 1 50M = 13572 123 CAGCTGCACCACTGCCTGGCGCTGTGCCCTTCCTTTGCTCTGCCCGCTGG 66?=3CEA:=LIE?CH8?F@J;?G4@QH7;KEHJIE1@@4=AD9=P@8<< NM:i:0 NH:i:4 CC:Z:chr12 CP:i:92080 XS:A:+ HI:i:0
1942_131_1549_F3 355 chr1 13484 3 50M = 13572 123 CAGCTGCACCACTGCCTGGCGCTGTGCCCTTCCTTTGCTCTGCCCGCTGG ,,9:.A>-.34-./491+#&DA20)>?-,36,**8<%#++20..6I<-.. NM:i:0 NH:i:2 CC:Z:chr2 CP:i:114357483 XS:A:+ HI:i:0
2022_911_2004_F3 99 chr1 13484 3 50M = 13572 123 CAGCTGCACCACTGCCTGGCGCTGTGCCCTTCCTTTGCTCTGCCCGCTGG OOTYX[TKCHXUHIUVONHDTUPTOTXQKMWQPTRB/DH?QJKNNZJJQQ NM:i:0 NH:i:2 CC:Z:chr2 CP:i:114357483 XS:A:+ HI:i:0
486_302_1756_F3 99 chr1 13528 0 50M = 13615 122 CGCTGGAGACGGTGTTTGTCATGGGCCTGGTCTGCAGGGATCCTGCTACA __ZZZ[ZQWUVV?I]`TEQWYX<;RYH==GXI;5F];!(EYE0CV\D7NN NM:i:0 NH:i:5 CC:Z:chr12 CP:i:92036 XS:A:+ HI:i:0
667_379_431_F3 355 chr1 13528 0 50M = 13615 122 CGCTGGAGACGGTGTTTGTCATGGGCCTGGTCTGCAGGGATCCTGCTACA ``]_]][X^__WAIZ\K=RYYY??PRPORMLEB<DYM'*NYT;9R[E5@@ NM:i:0 NH:i:5 CC:Z:chr12 CP:i:92036 XS:A:+ HI:i:0
1481_526_56_F3 355 chr1 13528 0 50M = 13615 122 CGCTGGAGACGGTGTTTGTCATGGGCCTGGTCTGCAGGGATCCTGCTACA RRRNC==:HHC@3CG=:1<=BB2DMG>0:B70/.0@>!!/;K>/=B1)11 NM:i:0 NH:i:5 CC:Z:chr12 CP:i:92036 XS:A:+ HI:i:0
1631_628_1988_F3 355 chr1 13528 0 50M = 13615 122 CGCTGGAGACGGTGTTTGTCATGGGCCTGGTCTGCAGGGATCCTGCTACA ^^]``_NFW]^[IO_^V39UWN=JTVH6BSYB8HSYOD@>LR;9QY6!00 NM:i:0 NH:i:5 CC:Z:chr12 CP:i:92036 XS:A:+ HI:i:0
1934_635_52_F3 99 chr1 13528 0 50M = 13615 122 CGCTGGAGACGGTGTTTGTCATGGGCCTGGTCTGCAGGGATCCTGCTACA ZZ]\\[YZ^WVZHL_`^PQYVS<EYVQHNWWJ=1@V@9=9LSABRW2!22 NM:i:0 NH:i:5 CC:Z:chr12 CP:i:92036 XS:A:+ HI:i:0
2219_1966_235_F3 355 chr1 13528 0 50M = 13615 122 CGCTGGAGACGGTGTTTGTCATGGGCCTGGTCTGCAGGGATCCTGCTACA NNBAQRE9HJH6#CMNF5EFFA$0<AJ;9CE,!-?K0!22;M4+BC,%77 NM:i:0 NH:i:5 CC:Z:chr12 CP:i:92036 XS:A:+ HI:i:0
1329_105_1480_F5-BC 403 chr1 13572 1 35M = 13484 -123 GCTACAAAGGTGAAACCCAGGAGAGTGTGGAGTCC ::74<@':CAOIH9AHFUWL:FXE6?KNOUVCCKK NM:i:0 NH:i:4 CC:Z:chr12 CP:i:92007 XS:A:+ HI:i:0
1863_1224_411_F5-BC 147 chr1 13572 3 35M = 13484 -123 GCTACAAAGGTGAAACCCAGGAGAGTGTGGAGTCC __]XRR?G_\_a^^]]bcccbba_[[accccbabb NM:i:0 NH:i:2 CC:Z:chr2 CP:i:114357410 XS:A:+ HI:i:0
1939_1338_752_F5-BC 403 chr1 13572 1 35M = 13484 -123 GCTACAAAGGTGAAACCCAGGAGAGTGTGGAGTCC 44.$5H+6GIJFQHHTW[VVTOQGEJOWYVSSVYY NM:i:0 NH:i:4 CC:Z:chr12 CP:i:92007 XS:A:+ HI:i:0
PHP Code:
1329_105_1480_F3 355 chr1 13484 1 50M = 13572 123 CAGCTGCACCACTGCCTGGCGCTGTGCCCTTCCTTTGCTCTGCCCGCTGG ,,@NPYG423BC553AC.2BPB;:7OH0.-=><1,I3!5=D<4)-OD=44 NM:i:0 NH:i:4 CC:Z:chr12 CP:i:92080 XS:A:+ HI:i:0
1863_1224_411_F3 99 chr1 13484 3 50M = 13572 123 CAGCTGCACCACTGCCTGGCGCTGTGCCCTTCCTTTGCTCTGCCCGCTGG UUZ_[][VXY[XRNOYJFURZZULJZ_ZOQ[SRRTW@CPBJHJWU\FAMM NM:i:0 NH:i:2 CC:Z:chr2 CP:i:114357483 XS:A:+ HI:i:0
1939_1338_752_F3 355 chr1 13484 1 50M = 13572 123 CAGCTGCACCACTGCCTGGCGCTGTGCCCTTCCTTTGCTCTGCCCGCTGG 66?=3CEA:=LIE?CH8?F@J;?G4@QH7;KEHJIE1@@4=AD9=P@8<< NM:i:0 NH:i:4 CC:Z:chr12 CP:i:92080 XS:A:+ HI:i:0
1942_131_1549_F3 355 chr1 13484 3 50M = 13572 123 CAGCTGCACCACTGCCTGGCGCTGTGCCCTTCCTTTGCTCTGCCCGCTGG ,,9:.A>-.34-./491+#&DA20)>?-,36,**8<%#++20..6I<-.. NM:i:0 NH:i:2 CC:Z:chr2 CP:i:114357483 XS:A:+ HI:i:0
2022_911_2004_F3 99 chr1 13484 3 50M = 13572 123 CAGCTGCACCACTGCCTGGCGCTGTGCCCTTCCTTTGCTCTGCCCGCTGG OOTYX[TKCHXUHIUVONHDTUPTOTXQKMWQPTRB/DH?QJKNNZJJQQ NM:i:0 NH:i:2 CC:Z:chr2 CP:i:114357483 XS:A:+ HI:i:0
486_302_1756_F3 99 chr1 13528 0 50M = 13615 122 CGCTGGAGACGGTGTTTGTCATGGGCCTGGTCTGCAGGGATCCTGCTACA __ZZZ[ZQWUVV?I]`TEQWYX<;RYH==GXI;5F];!(EYE0CV\D7NN NM:i:0 NH:i:5 CC:Z:chr12 CP:i:92036 XS:A:+ HI:i:0
667_379_431_F3 355 chr1 13528 0 50M = 13615 122 CGCTGGAGACGGTGTTTGTCATGGGCCTGGTCTGCAGGGATCCTGCTACA ``]_]][X^__WAIZ\K=RYYY??PRPORMLEB<DYM'*NYT;9R[E5@@ NM:i:0 NH:i:5 CC:Z:chr12 CP:i:92036 XS:A:+ HI:i:0
1481_526_56_F3 355 chr1 13528 0 50M = 13615 122 CGCTGGAGACGGTGTTTGTCATGGGCCTGGTCTGCAGGGATCCTGCTACA RRRNC==:HHC@3CG=:1<=BB2DMG>0:B70/.0@>!!/;K>/=B1)11 NM:i:0 NH:i:5 CC:Z:chr12 CP:i:92036 XS:A:+ HI:i:0
1631_628_1988_F3 355 chr1 13528 0 50M = 13615 122 CGCTGGAGACGGTGTTTGTCATGGGCCTGGTCTGCAGGGATCCTGCTACA ^^]``_NFW]^[IO_^V39UWN=JTVH6BSYB8HSYOD@>LR;9QY6!00 NM:i:0 NH:i:5 CC:Z:chr12 CP:i:92036 XS:A:+ HI:i:0
1934_635_52_F3 99 chr1 13528 0 50M = 13615 122 CGCTGGAGACGGTGTTTGTCATGGGCCTGGTCTGCAGGGATCCTGCTACA ZZ]\\[YZ^WVZHL_`^PQYVS<EYVQHNWWJ=1@V@9=9LSABRW2!22 NM:i:0 NH:i:5 CC:Z:chr12 CP:i:92036 XS:A:+ HI:i:0
2219_1966_235_F3 355 chr1 13528 0 50M = 13615 122 CGCTGGAGACGGTGTTTGTCATGGGCCTGGTCTGCAGGGATCCTGCTACA NNBAQRE9HJH6#CMNF5EFFA$0<AJ;9CE,!-?K0!22;M4+BC,%77 NM:i:0 NH:i:5 CC:Z:chr12 CP:i:92036 XS:A:+ HI:i:0
1329_105_1480_F5-BC 403 chr1 13572 1 35M = 13484 -123 GCTACAAAGGTGAAACCCAGGAGAGTGTGGAGTCC ::74<@':CAOIH9AHFUWL:FXE6?KNOUVCCKK NM:i:0 NH:i:4 CC:Z:chr12 CP:i:92007 XS:A:+ HI:i:0
1863_1224_411_F5-BC 147 chr1 13572 3 35M = 13484 -123 GCTACAAAGGTGAAACCCAGGAGAGTGTGGAGTCC __]XRR?G_\_a^^]]bcccbba_[[accccbabb NM:i:0 NH:i:2 CC:Z:chr2 CP:i:114357410 XS:A:+ HI:i:0
1939_1338_752_F5-BC 403 chr1 13572 1 35M = 13484 -123 GCTACAAAGGTGAAACCCAGGAGAGTGTGGAGTCC 44.$5H+6GIJFQHHTW[VVTOQGEJOWYVSSVYY NM:i:0 NH:i:4 CC:Z:chr12 CP:i:92007 XS:A:+ HI:i:0
Am I doing something wrong? Please help!
Thanks
Helen
Comment