rspitale 01-05-2013 06:22 PM

Cutoffs for differential Expression analysis (DESeq and Cufflinks/Cuffdiff)

Hey All,

This is my first post, so please bear with me.

I have been doing differential expression with DESeq and Cuffdiff.

I just wondered at what level do people usually exclude genes from analysis.

For example in DEseq, what number of raw counts do you exclude (i.e.- 10 and up, etc)?

For Cuffdiff, what level do you cut off RPKM/FPKM. (i.e.- 1 and above use for differntial expression)?

Also, for multiple samples do you sort all sample types by expression and remove all genes for each sample at that value?

Sorry if this is confusing. Thanks!

 Richard Barker 01-09-2013 02:48 PM

For cuffdiff i normally use 10 as the minimum per-locus counts for significance testing. Hope this helps... I'm now trying to get DESeq working so can't help with that one...

 dpryan 01-10-2013 12:47 AM

For DESeq I exclude genes with <20 counts (summed across all samples). Those are usually too few counts to calculate a meaningful statistic (see the Bourgon et al. (2010) paper in PNAS or the DESeq vignette, though I don't recall either of those giving specific recommendations for a threshold...).

 Richard Barker 01-10-2013 05:31 AM

Hi dpyan
Can you share the commands that you used for DESeq so i can see a functional example to try and guide my future attempts?

 dpryan 01-10-2013 05:50 AM

Have a look at the "Filtering by overall count" section of the DESeq vignette, which I'll just copy and modify below:
Code:

```rs = rowSums(counts(cdsFull)) use = (rs > 20) cdsFilt = cdsFull[use,]```
The actual threshold that's best will probably vary by sample number, it's convenient to make the plot they have in the vignette as "Figure 9" so you can eyeball if the threshold is inappropriate. See also "Figure 10".

 All times are GMT -8. The time now is 01:42 AM.