Seqanswers Leaderboard Ad

**NikTuzov** · 04-12-2013, 12:44 PM

One can also say that Model 2) contains more information about the data: it is always possible to convert 2) into 1), but not the other way round. Why do they lose information on purpose?

**Wolfgang Huber** · 04-13-2013, 01:05 AM

Dear Nic

thanks. This is a good question that has probably already been asked by everyone working in this field. Here's what has motivated us to follow the approach used by DESeq.

Biologists are usually not just interested in rejecting the null hypothesis of no differential expression overall, but want to pinpoint the particular genes affected. Your test for model 2 is, afaIcs, more susceptible to rejections for one gene when in fact other genes are differentially expressed (esp. if the latter take up a lot of reads).

Also, in model 1 it is straightforward to add a layer that accounts for overdispersion (i.e. biological variation in the rates underlying the counting/sampling), which is crucial for applications. I am sure that can also be done for model 2, but am less aware where it has been done.

Best wishes
Wolfgang

**dietmar13** · 04-13-2013, 09:16 AM

Dear Nik,

my opinion to your example is (regardless how it was measured: with millions of photons collected by the camera in case of microarrays or thousands of reads sequenced from the libraries - in the first case the photons are treated as one analog value and in the second case as digital count data, which has tremendous impact on the used statistical methods...):

finally, the unit of transcriptomics is the gene/transcript, not the read:
T: 1 (0.05 <- 5e4/1e6)
N: 2 (0.1 <- 2e5/2e6)

what a biologist wants to know is:
is the difference in expression of a gene/transcript statistically significant and relevant? the latter is a matter of real biological experiments (or at least a matter of biological interpretation of omics-data) the former a matter of biological replicates.

to see if the difference 1:2 is significant you have to know the biological variation in both groups! and this can only be done with biological replicates, not with technical or with modelling of variance with borrowed information from other genes or with using reads as units of interest...

this is all the more true for cancer and normal samples, because here the variance for a gene can be completely different in both groups as well as compared to other genes...

dietmar

**NikTuzov** · 04-15-2013, 06:44 AM

Your test for model 2 is, afaIcs, more susceptible to rejections for one gene when in fact other genes are differentially expressed (esp. if the latter take up a lot of reads).

-------------

In Model 2 it is obvious, but Model 1 has the very same problem. If, for a given library, transcript X attracts more reads, it means that less of them are left for the rest of transcripts. One can use some normalization tricks to mitigate that, but it's unlikely to resolve the issue completely.

**NikTuzov** · 04-15-2013, 06:48 AM

this is all the more true for cancer and normal samples, because here the variance for a gene can be completely different in both groups as well as compared to other genes...

-------------

How do you know what can or cannot be different? If that knowledge comes from having lots of biological replicates, then the issue is moot to begin with.

**NikTuzov** · 04-16-2013, 08:30 AM

Actually, I made a mistake. If Poisson is used in Model 1, then the p-value can be obtained even without replication. The Poisson setup approximately corresponds to the raw counts being Poisson(lambda = n * p), where n is the library size and p is the probability of success from Model 2. If we normalize the "raw" count by dividing it by the library size, Model 1 will be about the same as Model 2.

It's somewhat confusing that, even though we are interested in proportion p, the normalized proxy for that proportion is a count that itself is assumed to come from Poisson distribution.

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, Yesterday, 11:49 AM	0 responses 15 views 0 likes	Last Post by seqadmin Yesterday, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 16 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 61 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

RNA-Seq, Differential Expression: a theoretical question of modeling methodology

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News