 10-16-2012, 07:14 PM #1 gary Member   Location: Shanghai Join Date: Dec 2009 Posts: 16 MarkDuplicates result Hello everyone! My MarkDuplicates result looks like this: LIBRARY UNPAIRED_READS_EXAMINED READ_PAIRS_EXAMINED UNMAPPED_READS UNPAIRED_READ_DUPLICATES READ_PAIR_DUPLICATES READ_PAIR_OPTICAL_DUPLICATES PERCENT_DUPLICATION ESTIMATED_LIBRARY_SIZE TSC_3 351087 23255402 695866 192406 1108098 370488 0.051398 347340970 ## HISTOGRAM java.lang.Double BIN VALUE 1.0 1.015653 2.0 1.965532 3.0 2.853897 4.0 3.68473 5.0 4.461759 6.0 5.188466 7.0 5.868112 8.0 6.503743 9.0 7.098211 My question is: what does the ESTIMATED_LIBRARY_SIZE mean? Picard's help info is: ESTIMATED_LIBRARY_SIZE: The estimated number of unique molecules in the library based on PE duplication. But I still can't understand. Last edited by gary; 10-16-2012 at 07:17 PM.
03-27-2013, 07:07 AM   #2
davidblaney
Member

Location: Oxford, UK

Join Date: Nov 2011
Posts: 17

The Estimated number of unique fragments in your sequenced library.

it uses this (from the API documentation)

Quote:
 Estimates the size of a library based on the number of paired end molecules observed and the number of unique pairs ovserved. Based on the Lander-Waterman equation that states: C/X = 1 - exp( -N/X ) where X = number of distinct molecules in library N = number of read pairs C = number of distinct fragments observed in read pairs