I have a dose response experiment using Rats, with Controls and 5 treatment doses, each with 5 biological replicates.
The data is single read RNAseq, 50bp reads, run on an ABI 5500xl, and I mapped the reads to the Ensembl rel. 66 Rat genome using ABI's LifeScope 2.5.1.
For mapped reads, I have an average of 12,216,885 reads on exons per sample (the numbers range from a low of 5,171,445 for a Dose group 4 animal, and a high of 17,323,22 for a Dose group 3 animal)
I've been analyzing this data for differential gene expression with a couple of approaches and am getting radically different results.
DESeq using raw counts summarized on genes, genes with FDR < 0.05:
Dose 1 = n/a (have not run this one yet)
Dose 2 = 45
Dose 3 = 146
Dose 4 = 520
Dose 5 = 540
Now with DESeq, I did analyze each dose as a single pair of Controls (N=5) and Treatment (N=5)
Similarly, I only have the highest dose done using cuffdiff v2.0.0 (3384), but again, genes with q-value < 0.05:
Dose 5 = 64
Then, I pulled the mapped BAM files into Partek Genome Suite 6.6, re-indexed them to Refseq genes, got gene RPKM values and did a 1way ANOVA on all 30 samples, with linear contrasts for individual Dose fold change and p-values.
genes significant by dose at FDR < 0.05 = 1524
genes significant by linear contrast at FDR < 0.05
Dose 1 = 3
Dose 2 = 3
Dose 3 = 280
Dose 4 = 1722
Dose 5 = 3665
Even if I break these down and run each Dose as a discrete ANOVA (n=10) I get, genes significant by dose at FDR < 0.05:
Dose 1 = 0
Dose 2 = 1
Dose 3 = 70
Dose 4 = 1043
Dose 5 = 1848
And just for comparison, I have these exact same animals run on Affy HT_Rat30_PM (Titan) arrays, again as a single ANOVA (N=30) of RMA normalized data, with linear contrasts for each Dose
genes significant by dose at FDR < 0.05 = 2375
genes significant by linear contrast at FDR < 0.05
Dose 1 = 2
Dose 2 = 67
Dose 3 = 1053
Dose 4 = 2223
Dose 5 = 3073
I have not yet looked at these lists of genes for overlap and so forth yet, although just casually looking at the top ranked (by FDR) genes in Partek and DESeq, I see a lot of the same ones. But what I am wondering about is the large discrepancy in the numbers of significant genes.
Any ideas why the different RNAseq methods would be so different? Is this purely a function of the differences between count data and RPKM data and the different variance models? I was not expecting the results to be identical, but at least more consistent than this?
The data is single read RNAseq, 50bp reads, run on an ABI 5500xl, and I mapped the reads to the Ensembl rel. 66 Rat genome using ABI's LifeScope 2.5.1.
For mapped reads, I have an average of 12,216,885 reads on exons per sample (the numbers range from a low of 5,171,445 for a Dose group 4 animal, and a high of 17,323,22 for a Dose group 3 animal)
I've been analyzing this data for differential gene expression with a couple of approaches and am getting radically different results.
DESeq using raw counts summarized on genes, genes with FDR < 0.05:
Dose 1 = n/a (have not run this one yet)
Dose 2 = 45
Dose 3 = 146
Dose 4 = 520
Dose 5 = 540
Now with DESeq, I did analyze each dose as a single pair of Controls (N=5) and Treatment (N=5)
Similarly, I only have the highest dose done using cuffdiff v2.0.0 (3384), but again, genes with q-value < 0.05:
Dose 5 = 64
Then, I pulled the mapped BAM files into Partek Genome Suite 6.6, re-indexed them to Refseq genes, got gene RPKM values and did a 1way ANOVA on all 30 samples, with linear contrasts for individual Dose fold change and p-values.
genes significant by dose at FDR < 0.05 = 1524
genes significant by linear contrast at FDR < 0.05
Dose 1 = 3
Dose 2 = 3
Dose 3 = 280
Dose 4 = 1722
Dose 5 = 3665
Even if I break these down and run each Dose as a discrete ANOVA (n=10) I get, genes significant by dose at FDR < 0.05:
Dose 1 = 0
Dose 2 = 1
Dose 3 = 70
Dose 4 = 1043
Dose 5 = 1848
And just for comparison, I have these exact same animals run on Affy HT_Rat30_PM (Titan) arrays, again as a single ANOVA (N=30) of RMA normalized data, with linear contrasts for each Dose
genes significant by dose at FDR < 0.05 = 2375
genes significant by linear contrast at FDR < 0.05
Dose 1 = 2
Dose 2 = 67
Dose 3 = 1053
Dose 4 = 2223
Dose 5 = 3073
I have not yet looked at these lists of genes for overlap and so forth yet, although just casually looking at the top ranked (by FDR) genes in Partek and DESeq, I see a lot of the same ones. But what I am wondering about is the large discrepancy in the numbers of significant genes.
Any ideas why the different RNAseq methods would be so different? Is this purely a function of the differences between count data and RPKM data and the different variance models? I was not expecting the results to be identical, but at least more consistent than this?
Comment