Hi Folks --
I'm a bit confused about the method that cuffdiff uses to normalize expression levels between the two (or more) conditions that you feed it. I generally use geometric normalization, but am unclear whether this means that normalization is dependent on the expression levels in your conditions, or whether it's tied to some independent factor that's constant regardless of what conditions you feed to cuffdiff.
e.g. Say we have GeneX, to which we want to normalize, with FPKM in Condition 1 = 10, and FPKM in sample 2 = 20. Cuffdiff could normalize in one of two ways:
1) Multiply all FPKM values in Condition 1 or divide all values in Condition 2 by a factor of 2, so that GeneX's FPKM is the same in both conditions.
2) Knowing that, on average, GeneX is usually around 30, cuffdiff could multiply the FPKM's in each condition so that, again, GeneX is the same in both conditions.
I'm curious about this because if normalization is done the first way, this means that you can only compare FPKM values between conditions that have been paired together in cuffdiff. However, if (2) is true, this means that you can compare FPKM values between conditions that were NOT directly paired in cuffdiff. Put another way, if (2) is true, this means that the FPKM value for a given gene and condition is an absolute measure of enrichment in that condition. I've noticed, generally speaking, that my FPKM value for a given gene and condition tends to be pretty stable, regardless of what other condition I pair it with in cuffdiff (e.g. GeneX from Sample 1 = 10, regardless of whether Sample 1 is paired with Sample 2 or Sample 3).
Does this mean that, if I notice GeneY = 50 in Condition 1 (when I compare condition 1 to condition 2), but equals zero in all other conditions (e.g. condition 2 vs. 3, 4 vs. 5), that I can make any definitive statement about GeneY being higher in condition 1 than conditions 3, 4 or 5 (which I have not compared condition 1 to directly)?
I'm a bit confused about the method that cuffdiff uses to normalize expression levels between the two (or more) conditions that you feed it. I generally use geometric normalization, but am unclear whether this means that normalization is dependent on the expression levels in your conditions, or whether it's tied to some independent factor that's constant regardless of what conditions you feed to cuffdiff.
e.g. Say we have GeneX, to which we want to normalize, with FPKM in Condition 1 = 10, and FPKM in sample 2 = 20. Cuffdiff could normalize in one of two ways:
1) Multiply all FPKM values in Condition 1 or divide all values in Condition 2 by a factor of 2, so that GeneX's FPKM is the same in both conditions.
2) Knowing that, on average, GeneX is usually around 30, cuffdiff could multiply the FPKM's in each condition so that, again, GeneX is the same in both conditions.
I'm curious about this because if normalization is done the first way, this means that you can only compare FPKM values between conditions that have been paired together in cuffdiff. However, if (2) is true, this means that you can compare FPKM values between conditions that were NOT directly paired in cuffdiff. Put another way, if (2) is true, this means that the FPKM value for a given gene and condition is an absolute measure of enrichment in that condition. I've noticed, generally speaking, that my FPKM value for a given gene and condition tends to be pretty stable, regardless of what other condition I pair it with in cuffdiff (e.g. GeneX from Sample 1 = 10, regardless of whether Sample 1 is paired with Sample 2 or Sample 3).
Does this mean that, if I notice GeneY = 50 in Condition 1 (when I compare condition 1 to condition 2), but equals zero in all other conditions (e.g. condition 2 vs. 3, 4 vs. 5), that I can make any definitive statement about GeneY being higher in condition 1 than conditions 3, 4 or 5 (which I have not compared condition 1 to directly)?