View Single Post
Old 04-03-2010, 02:05 AM   #13
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 994
Default

To estimate the library size, simply taking the total number of (mapped or unmapped) reads is, in our experience, not a good idea.

Sometimes, a few very strongly expressed genes are differentially expressed, and as they make up a good part of the total counts, they skew this number. After you divide by total counts, these few strongly expressed genes become equal, and the whole rest looks differentially expressed.

The following simple alternative works much better:

- Construct a "reference sample" by taking, for each gene, the geometric mean of the counts in all samples.

- To get the sequencing depth of a sample relative to the reference, calculate for each gene the quotient of the counts in your sample divided by the counts of the reference sample. Now you have, for each gene, an estimate of the depth ratio.

- Simply take the median of all the quotients to get the relative depth of the library.

This is what the 'estimateSizeFactors' function of our DESeq package does.
Simon Anders is offline   Reply With Quote