Hi everybody,
I am currently analysing an RNAseq eperiment I ran on recombinant inbred lines.
So far I have used DESeq but because my design is somewhat complicated and I couldn't do the proper analysis with it, I am now using DESeq2.
But I am still unsure what the legitimate/appropriate statistics is for my data and would be immensely grateful for some help and insight from you smart people! :-)
So, this is what I have done so far:
- I have a count matrix with raw read counts for my mapped genes
- I set the data.frame as you can see below
> ExpDesign
genotype flowering sampling
Bur_1_1 parent late tp1
Bur_1_2 parent late tp1
Bur_1_3 parent late tp1
Bur_2_1 parent late tp2
Bur_2_2 parent late tp2
Bur_2_3 parent late tp2
Col_1_1 parent early tp1
Col_1_2 parent early tp1
Col_1_3 parent early tp1
Col_2_1 parent early tp2
Col_2_2 parent early tp2
Col_2_3 parent early tp2
pool01 pool_lines early tp1
pool02 pool_lines early tp1
pool03 pool_lines early tp1
pool04 pool_lines late tp1
pool05 pool_lines late tp1
pool06 pool_lines late tp1
pool07 pool_lines early tp2
pool08 pool_lines early tp2
pool09 pool_lines early tp2
pool10 pool_lines late tp2
pool11 pool_lines late tp2
pool12 pool_lines late tp2
-se_input <- DESeqDataSetFromMatrix(countData = se, colData=ExpDesign, design=~genotype+flowering+sampling)
se_input
class: DESeqDataSet
dim: 21646 24
exptData(0):
assays(1): counts
rownames(21646): AT1G01010 AT1G01020 ... ATMG01380 ATMG01390
rowData metadata column names(0):
colnames(24): Bur_1_1 Bur_1_2 ... pool11 pool12
colData names(3): genotype flowering sampling
se_input_DESeq <- DESeq(se_input)
se_res <- results(se_input_DESeq)
head(se_res)
DataFrame with 6 rows and 6 columns
baseMean log2FoldChange lfcSE stat pvalue padj
<numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
AT1G01010 64.594945 -0.30440446 0.23903587 -1.2734677 0.2028521166 0.339217533
AT1G01020 140.260321 -0.05978862 0.17145139 -0.3487205 0.7272991151 0.820126969
AT1G01030 33.542471 -0.73783880 0.21824157 -3.3808352 0.0007226586 0.004030036
AT1G01040 514.335016 -0.49919387 0.14493772 -3.4441958 0.0005727608 0.003338910
AT1G01046 6.191449 -0.49877559 0.26084673 -1.9121405 0.0558581775 0.129957010
AT1G01050 689.292548 -0.13212720 0.06934961 -1.9052335 0.0567497202 0.131422678
So, now I am wondering how to account for the interactions of my three conditions in the analysis?
And what does the output table se_res actually tell me - what does the p-value signify, a significant difference between all three conditions?
I am currently analysing an RNAseq eperiment I ran on recombinant inbred lines.
So far I have used DESeq but because my design is somewhat complicated and I couldn't do the proper analysis with it, I am now using DESeq2.
But I am still unsure what the legitimate/appropriate statistics is for my data and would be immensely grateful for some help and insight from you smart people! :-)
So, this is what I have done so far:
- I have a count matrix with raw read counts for my mapped genes
- I set the data.frame as you can see below
> ExpDesign
genotype flowering sampling
Bur_1_1 parent late tp1
Bur_1_2 parent late tp1
Bur_1_3 parent late tp1
Bur_2_1 parent late tp2
Bur_2_2 parent late tp2
Bur_2_3 parent late tp2
Col_1_1 parent early tp1
Col_1_2 parent early tp1
Col_1_3 parent early tp1
Col_2_1 parent early tp2
Col_2_2 parent early tp2
Col_2_3 parent early tp2
pool01 pool_lines early tp1
pool02 pool_lines early tp1
pool03 pool_lines early tp1
pool04 pool_lines late tp1
pool05 pool_lines late tp1
pool06 pool_lines late tp1
pool07 pool_lines early tp2
pool08 pool_lines early tp2
pool09 pool_lines early tp2
pool10 pool_lines late tp2
pool11 pool_lines late tp2
pool12 pool_lines late tp2
-se_input <- DESeqDataSetFromMatrix(countData = se, colData=ExpDesign, design=~genotype+flowering+sampling)
se_input
class: DESeqDataSet
dim: 21646 24
exptData(0):
assays(1): counts
rownames(21646): AT1G01010 AT1G01020 ... ATMG01380 ATMG01390
rowData metadata column names(0):
colnames(24): Bur_1_1 Bur_1_2 ... pool11 pool12
colData names(3): genotype flowering sampling
se_input_DESeq <- DESeq(se_input)
se_res <- results(se_input_DESeq)
head(se_res)
DataFrame with 6 rows and 6 columns
baseMean log2FoldChange lfcSE stat pvalue padj
<numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
AT1G01010 64.594945 -0.30440446 0.23903587 -1.2734677 0.2028521166 0.339217533
AT1G01020 140.260321 -0.05978862 0.17145139 -0.3487205 0.7272991151 0.820126969
AT1G01030 33.542471 -0.73783880 0.21824157 -3.3808352 0.0007226586 0.004030036
AT1G01040 514.335016 -0.49919387 0.14493772 -3.4441958 0.0005727608 0.003338910
AT1G01046 6.191449 -0.49877559 0.26084673 -1.9121405 0.0558581775 0.129957010
AT1G01050 689.292548 -0.13212720 0.06934961 -1.9052335 0.0567497202 0.131422678
So, now I am wondering how to account for the interactions of my three conditions in the analysis?
And what does the output table se_res actually tell me - what does the p-value signify, a significant difference between all three conditions?
Comment