![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Error with MarkDuplicates in Picard | slowsmile | Bioinformatics | 13 | 11-01-2015 04:16 AM |
How to use Picard's MarkDuplicates | cliff | Bioinformatics | 12 | 01-26-2015 11:56 PM |
Picard's MarkDuplicates -> OutOfMemoryError | elgor | Bioinformatics | 15 | 08-05-2013 07:37 AM |
picard markduplicates on huge files | rcorbett | Bioinformatics | 2 | 09-17-2010 05:39 AM |
Picard MarkDuplicates | wangzkai | Bioinformatics | 2 | 05-18-2010 10:14 PM |
![]() |
|
Thread Tools |
![]() |
#1 |
Member
Location: London Join Date: Jan 2010
Posts: 65
|
![]()
Hello all,
What's the metrics file output from Markduplicates function in picard? Can I get how many reads marked as duplicates in this file? Thanks |
![]() |
![]() |
![]() |
#2 |
Member
Location: Melbourne Join Date: Jan 2010
Posts: 21
|
![]()
Yes it tells you the number of reads that have been marked as duplicates, as well as the total number of reads. But note that reads that Picard marks as duplicates do not necessarily have identical sequence they just map to the same chromosomal location.
|
![]() |
![]() |
![]() |
#3 | |
Member
Location: London Join Date: Jan 2010
Posts: 65
|
![]() Quote:
Here is what I got from picard : ## METRICS CLASS net.sf.picard.sam.DuplicationMetrics LIBRARY UNPAIRED_READS_EXAMINED READ_PAIRS_EXAMINED UNMAPPED_READS UNPAIRED_READ_DUPLICATES READ_PAIR_DUPLICATES READ_ PAIR_OPTICAL_DUPLICATES PERCENT_DUPLICATION ESTIMATED_LIBRARY_SIZE Unknown Library 27221401 548559917 190908169 14563968 58165860 0 0.11642 2400441897 ## HISTOGRAM java.lang.Double BIN VALUE 1.0 1 2.0 1.795707 3.0 2.428856 4.0 2.932657 5.0 3.333535 6.0 3.652516 7.0 3.906332 8.0 4.108295 What is this histogram about? My original bam file has 657624702 paired reads, so 2*657624702 in total. After remove duplicates, bam file has 1184353716 reads in total. So suppose, 2*657624702 - 1184353716 = 130895688 reads removed. I couldn't get this number from picard output M file, any help? Thanks |
|
![]() |
![]() |
![]() |
#4 |
Junior Member
Location: Boston, MA Join Date: Dec 2010
Posts: 1
|
![]()
The reason for the histogram is one of the FAQ on their wiki.
http://sourceforge.net/apps/mediawik...kDuplicates.3F The reason that you couldn't get that number is because for read pair duplicates, they divide the actual number of duplicates in half before reporting it. So in your case, 2 * 58165860 (value under paired_read_duplicates) = 130895688, which was the number of duplicates you were missing. =) |
![]() |
![]() |
![]() |
Thread Tools | |
|
|