I previously analyzed a set of 24 samples and found that gene X had 5,591 counts for sample_Y1 and 50,113 counts for sample_Y2. When I looked at the pseudocounts for these values (using equalizeLibSizes(dge)$pseudo.counts), I had Y1=6900.57 and Y2=52776.9.
Then I added 12 more samples for condition Z (totally unrelated to condition Y) and added a new column to my design matrix so it looked like this:
Y Z
sample1 1 0
...
sample_Y1 1 0
sample_Y2 1 0
...
sample24 1 0
sample_Z1 0 1
...
sample_Z12 0 1
But the problem is that now Y1=0.0 and Y2=23458.0 for gene X. I understand that this value would decrease because the library sizes for my first 24 samples was around 20 million reads, and for the last 12, it was around 2 million reads.
What I don't understand is why would Y1 change so dramatically, while Y2 would only halve. Any help is very appreciated.
Then I added 12 more samples for condition Z (totally unrelated to condition Y) and added a new column to my design matrix so it looked like this:
Y Z
sample1 1 0
...
sample_Y1 1 0
sample_Y2 1 0
...
sample24 1 0
sample_Z1 0 1
...
sample_Z12 0 1
But the problem is that now Y1=0.0 and Y2=23458.0 for gene X. I understand that this value would decrease because the library sizes for my first 24 samples was around 20 million reads, and for the last 12, it was around 2 million reads.
What I don't understand is why would Y1 change so dramatically, while Y2 would only halve. Any help is very appreciated.
Comment