Hi,
I have some data form various samples (cell types) in different species.
I want to compare and analyze gene expression variability across the different species.
I've plotted the average expression (tags per million 'tpm' data) for each sample. I found that the average expression is more or less the same for all samples from a given species but the average expression values vary across species greatly (for example all human samples are about 20 and all mouse samples are about 25 but all dog samples are at about 100 tmp).
I'm guessing that this is because the data is not normalized and therefore the data is not comparable before performing a normalization.
I tried using different recently developed methods for data normalization (RLE, TMM) implemented in EdgeR as well as DeSeq and normalized the data from different species (by entrez id) separately for each cell type. This however still does not give much more similar average expressions for the different species. The averages of samples from the same species have now become much more dissimilar.
The only samples that went to a similar average expression level across the different species is Universal RNA - which is a mix of different tissues rather than a specific cell type.
I'm really confused, does the above mean the normalization is fine and I shouldn't worry about the fact that average expression values between different species are dissimilar, or does this rather imply something is wrong?
Maybe I should normalize by ignoring the gene ids and just looking at the whole expression profile (negative binomial) by sorting the genes according to their expression values and then having a different normalization factor for every expression value, whichever gene it is in every sample, so that I end up with 2 negative binomial curves that look the same but the genes on the x axis will be differently ordered depending on the species?
thanks.
I have some data form various samples (cell types) in different species.
I want to compare and analyze gene expression variability across the different species.
I've plotted the average expression (tags per million 'tpm' data) for each sample. I found that the average expression is more or less the same for all samples from a given species but the average expression values vary across species greatly (for example all human samples are about 20 and all mouse samples are about 25 but all dog samples are at about 100 tmp).
I'm guessing that this is because the data is not normalized and therefore the data is not comparable before performing a normalization.
I tried using different recently developed methods for data normalization (RLE, TMM) implemented in EdgeR as well as DeSeq and normalized the data from different species (by entrez id) separately for each cell type. This however still does not give much more similar average expressions for the different species. The averages of samples from the same species have now become much more dissimilar.
The only samples that went to a similar average expression level across the different species is Universal RNA - which is a mix of different tissues rather than a specific cell type.
I'm really confused, does the above mean the normalization is fine and I shouldn't worry about the fact that average expression values between different species are dissimilar, or does this rather imply something is wrong?
Maybe I should normalize by ignoring the gene ids and just looking at the whole expression profile (negative binomial) by sorting the genes according to their expression values and then having a different normalization factor for every expression value, whichever gene it is in every sample, so that I end up with 2 negative binomial curves that look the same but the genes on the x axis will be differently ordered depending on the species?
thanks.
Comment