I have a question regarding data handling of RNA-seq experiments and even microarray I guess.
How do these methods take total total cellular expression into account?
For instance say I was looking at the global analysis of the transcriptome of whole arabidopsis early seed vs. mature seed. One would expect the overall expression of the late dormant seed to be low.
Let's pretend there are only 5 genes in my seed, A, B, C, D, and E. Here are the absolute expression values.
Early:
A: 10 transcripts/cell
B: 10 transcripts/cell
C: 20 transcripts/cell
D: 40 transcripts/cell
E: 20 transcripts/cell
Late:
A: 1 transcript/cell
B: 1 transcript/cell
C: 2 transcript/cell
D: 4 transcript/cell
E: 2 transcript/cell
Although Late has lower absolute expression values, if I took 5 micrograms of total RNA from both to prepare a library, each sample would have 10% transcript A, 10% transcript B, 20% transcript C, 40% transcript D, and 20% transcript E - it would just take more late seeds to produce that much RNA.
If differential expression was performed on each of these with no manipulation to the data, it would appear as if the levels of expression of these genes within each tissue are the same right?
How is this corrected for in analyses? Is a housekeeping gene generally used to normalize expression patterns, and are housekeeping genes even useful when looking at cells with dormant expression and limited proliferation?
Let's take a look at one more scenario. Instead lets say I am comparing early seed to maturing seed, in which a family of transcripts coding for storage proteins are highly highly expressed.
Early:
A: 10 transcripts/cell
B: 10 transcripts/cell
C: 20 transcripts/cell
D (storage protein): 40 transcripts/cell
E (storage protein): 20 transcripts/cell
Late:
A: 10 transcripts/cell
B: 10 transcripts/cell
C: 20 transcripts/cell
D (storage protein): 400 transcripts/cell
E (storage protein): 200 transcripts/cell
In this example, if I were to take both libraries and perform sequencing, A, B, and C would appear to be downregulated in late seed, even though their relative expression is the same.
So basically my question is, are we actually just measuring the relative proportions of transcripts in transcriptional profiling experiments, or is there some sort of correction method that allows us to speculate actual absolute values? Sorry for the long post. Wanted to be clear.
How do these methods take total total cellular expression into account?
For instance say I was looking at the global analysis of the transcriptome of whole arabidopsis early seed vs. mature seed. One would expect the overall expression of the late dormant seed to be low.
Let's pretend there are only 5 genes in my seed, A, B, C, D, and E. Here are the absolute expression values.
Early:
A: 10 transcripts/cell
B: 10 transcripts/cell
C: 20 transcripts/cell
D: 40 transcripts/cell
E: 20 transcripts/cell
Late:
A: 1 transcript/cell
B: 1 transcript/cell
C: 2 transcript/cell
D: 4 transcript/cell
E: 2 transcript/cell
Although Late has lower absolute expression values, if I took 5 micrograms of total RNA from both to prepare a library, each sample would have 10% transcript A, 10% transcript B, 20% transcript C, 40% transcript D, and 20% transcript E - it would just take more late seeds to produce that much RNA.
If differential expression was performed on each of these with no manipulation to the data, it would appear as if the levels of expression of these genes within each tissue are the same right?
How is this corrected for in analyses? Is a housekeeping gene generally used to normalize expression patterns, and are housekeeping genes even useful when looking at cells with dormant expression and limited proliferation?
Let's take a look at one more scenario. Instead lets say I am comparing early seed to maturing seed, in which a family of transcripts coding for storage proteins are highly highly expressed.
Early:
A: 10 transcripts/cell
B: 10 transcripts/cell
C: 20 transcripts/cell
D (storage protein): 40 transcripts/cell
E (storage protein): 20 transcripts/cell
Late:
A: 10 transcripts/cell
B: 10 transcripts/cell
C: 20 transcripts/cell
D (storage protein): 400 transcripts/cell
E (storage protein): 200 transcripts/cell
In this example, if I were to take both libraries and perform sequencing, A, B, and C would appear to be downregulated in late seed, even though their relative expression is the same.
So basically my question is, are we actually just measuring the relative proportions of transcripts in transcriptional profiling experiments, or is there some sort of correction method that allows us to speculate actual absolute values? Sorry for the long post. Wanted to be clear.
Comment