Seqanswers Leaderboard Ad

**Gordon Smyth** · 03-21-2015, 11:28 PM

Your data sounds pretty crazy and it is quite possible that TMM is not a strong enough normalization. You might try quantile, but it is hard to imagine any normalization that will robustly handle systematic imbalances in sequencing depth as strong as in your data.

However your conclusion that the DE genes shouldn't be mostly in one direction is not correct. It is perfectly possible almost all the DE to be one direction even when normalization has worked perfectly; in fact it is one of the aims of normalization to allow this.

**bastianwur** · 03-23-2015, 04:24 AM

Okay, thanks, I'll have a try

.

Originally posted by Gordon Smyth View Post

However your conclusion that the DE genes shouldn't be mostly in one direction is not correct. It is perfectly possible almost all the DE to be one direction even when normalization has worked perfectly; in fact it is one of the aims of normalization to allow this.

Yeah, I'm aware that this could be, that the impact is just in the other direction distributed over more genes, and therefore no statistical significance can be seen. But I don't believe that for our data, we don't expect that, it wouldn't make any sense, and that the regulation really only followed the sequencing depth makes it really very suspicous.

But as said, I'll gonna try, thanks

.

**bastianwur** · 03-31-2015, 07:22 AM

Okay...doesn't seem to work. Was sort of expected :/.

Any idea what I could do with this data?
The data itself is:
- 3 sampling sites in a gut, from the tissue
- 1 sampling site in the gut, from the content
(all in triplicate, besides one, where a sample failed...and one more sampling site from the content in triplicate which had to be discarded due to DNA contamination)

2 sampling sites from the tissue have the mentioned extremly low coverage, so I'd preferably not do anything with it.
The 2 remaining sampling sites (1 tissue, 1 content) are not from the same site (e.g. one was upper gut, the other one lower gut), so you cannot really compare them either.
We also cannot take a look at which genes are present, because the animals have been inoculated with a known mix of bacteria, so there's nothing to gain from that.
I'm still waiting for the assembly of the reads, which didn't map to the genomes, to see the genetic differences, but I'm not sure if that'll be meaningful.

If anyone had an idea to what to try here, then it'd be really appreciated

.

**shunyip** · 03-31-2015, 10:25 PM

In this paper,
"Anders, S., McCarthy, D. J., Chen, Y., Okoniewski, M., Smyth, G. K., Huber, W., & Robinson, M. D. (2013). Count-based differential expression analysis of RNA sequencing data using R and Bioconductor. Nature protocols, 8(9), 1765-1786."
They mentioned that "In edgeR, it is recommended to remove features without at least 1 read per million in n of the samples, where n is the size of the smallest group of replicates."

However, since your sequencing depths are highly different with each other, maybe you can try removing features separately for all 6 comparisons.

**bastianwur** · 04-08-2015, 07:53 AM

Thanks

.
That helped a bit.

I've talked to the lab person, and we actually now went with only comparing the 2 better groups (which don't fit together).
The organism distribution is so, that in 1 group you have 3/4 organism A and 1/4 organism B, whereas in the 2. group it's the other way around.
No surprise, the DE result was that the organism with the more reads has upregulated genes in both cases.
Then I thought again what I talked about earlier with a colleague: With normalization you try to make the RNA output of a cell (or at least a population) comparable. That is not the same as making the RNA output of the ecosystem comparable, since the different players might vary in the ecosystem. While we cannot do anything about it in our other meta-omics data (because we cannot fully define the organisms), I know what bacteria I have here -> I separated the expression per organism, then did the normalization and DE on it.
That gives some really crappy output (as expected; in one case FDR needs to be 0.1, else I don't have any rsults), but I now at least get up- and downregulation per organism, and they make a tad bit sense (e.g. biofilm formation at a surface).
That'll probably be good enough to fill the poster, and that's then about it, afterwards we'll throw the data away.

Thanks for the help everyone

.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 55 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 51 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 45 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 55 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

edgeR: Normalization does not seem to work

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News