I have some RNASeq data from a time course experiment that is a mix of (mostly) plant and bacterial reads. I have a senior scientist (I'm a phd student) who has approached me asking me to do a differential expression analysis between the bacterial reads at a random time point in my time course with some reads from an experiment where the bacteria has been grown in culture.
I can see how the difference between in planta and in vitro gene expression could be interesting. I've been racking my brains trying to think how you would design an experiment to look at this - no luck. I'm moderately confident that comparing data from different experiments is not the way to do it though.
It's the same strain of bacteria - though different preparations. The data from the other experiment is RPKM values (and possibly raw counts/gene) from reads aligned against the same reference genome. I'm already aware that using RPKM values for downstream analysis is a bad idea. I can get the lengths of the genes and (I think) work backwards from RPKM. Which is going to result in two libraries of massively different sizes - a lot of the real estate in the time course experiment is taken up by plant RNA, from the in vitro experiment, it's all bacterial RNA. I'm dubious about straight out adjusting for library size - the difference in size means any variation in the time course data is just going to be magnified horrifically.
Ah, also, my data, I have 3 replicates at each time point, the supplied data, only a single replicant. DESeq can handle 1 replicate if it must but this just screams dangerous to me.
Are my objections running along the correct lines (magnification of variation, starting with different bacterial preparations )? Is there any else that I should add? Is there any way to make this a good thing (nothing occurs to me) even as an incredibly tentative exploratory process? Anything anyone could add would be grand.
Cheers
Ben.
I can see how the difference between in planta and in vitro gene expression could be interesting. I've been racking my brains trying to think how you would design an experiment to look at this - no luck. I'm moderately confident that comparing data from different experiments is not the way to do it though.
It's the same strain of bacteria - though different preparations. The data from the other experiment is RPKM values (and possibly raw counts/gene) from reads aligned against the same reference genome. I'm already aware that using RPKM values for downstream analysis is a bad idea. I can get the lengths of the genes and (I think) work backwards from RPKM. Which is going to result in two libraries of massively different sizes - a lot of the real estate in the time course experiment is taken up by plant RNA, from the in vitro experiment, it's all bacterial RNA. I'm dubious about straight out adjusting for library size - the difference in size means any variation in the time course data is just going to be magnified horrifically.
Ah, also, my data, I have 3 replicates at each time point, the supplied data, only a single replicant. DESeq can handle 1 replicate if it must but this just screams dangerous to me.
Are my objections running along the correct lines (magnification of variation, starting with different bacterial preparations )? Is there any else that I should add? Is there any way to make this a good thing (nothing occurs to me) even as an incredibly tentative exploratory process? Anything anyone could add would be grand.
Cheers
Ben.
Comment