Hello,
I am conducting an RNA-Seq analysis to identify significantly differentially expressed genes (SDEGs) between 5 conditions. I followed the Tuxedo protocol which is tophat/cufflinks/cuffmerge/cuffdiff/cummeRbund (http://www.nature.com/nprot/journal/...t.2012.016.pdf) and I am at the final part where I need to determine the number of SDEGs.
I followed the cummeRbund instructions in the manual (http://compbio.mit.edu/cummeRbund/manual_2_0.html) and here's my output:
So when I use diffData() and filter only significant results using subset() I get a total of 4500 SDEGs (this is for all pairwise comparisons that I can extract separately using another subset() with sample names) BUT when I use the getSig() method with alpha=0.05, I only get 1386 gene IDs (not 4500), so that's less than a third from what I get with diffData. I tested at "isoform" level to see if it would match but it didn't.
The part in the cummeRbund manual about getSig() says "a alpha value can be provided on which to filter the resulting list (the default is 0.05 to match the default of cuffdiff)."
From what I understood, the diffData() method is supposed to just extract the information from cuffdiff without making additional tests so shouldn't I get the same number of genes?
I would be greatful if someone could explain me clearly the difference between those two methods and which one should be used to determine SDEGs.
Thanks a lot for your help,
Oli
I am conducting an RNA-Seq analysis to identify significantly differentially expressed genes (SDEGs) between 5 conditions. I followed the Tuxedo protocol which is tophat/cufflinks/cuffmerge/cuffdiff/cummeRbund (http://www.nature.com/nprot/journal/...t.2012.016.pdf) and I am at the final part where I need to determine the number of SDEGs.
I followed the cummeRbund instructions in the manual (http://compbio.mit.edu/cummeRbund/manual_2_0.html) and here's my output:
Code:
>cuff_data<-readCufflinks(genome="genome.fa",gtfFile="merged.gtf") CuffSet instance with: 5 samples 16011 genes 36460 isoforms 24184 TSS 15312 CDS 160110 promoters 241840 splicing 135440 relCDS >gene.diff<-diffData(genes(cuff_data)) > sig.gene.diff<-subset(gene.diff, significant=="yes") > nrow(sig.gene.diff) [1] 4500 > mySigGeneIds<-getSig(cuff_data,alpha=0.05,level='genes') > length(mySigGeneIds) [1] 1386 > mySigIsoformIds<-getSig(cuff_data,alpha=0.05,level='isoforms') > length(mySigIsoformIds) [1] 900
The part in the cummeRbund manual about getSig() says "a alpha value can be provided on which to filter the resulting list (the default is 0.05 to match the default of cuffdiff)."
From what I understood, the diffData() method is supposed to just extract the information from cuffdiff without making additional tests so shouldn't I get the same number of genes?
I would be greatful if someone could explain me clearly the difference between those two methods and which one should be used to determine SDEGs.
Thanks a lot for your help,
Oli
Comment