Hi,
I've been aligning and counting some RNA-seq reads with SHRiMP and Cuffdiff, doung the same analysis with both an older genome assembly and a newer one, and I found an interesting possible discrepancy in my Cuffdiff output. If anyone could help explain it would be much appreciated.
Basically, I noticed a number of different genes where the expression levels was similar between the two assemblies, yet for some reason Cuffdiff was reporting wildly different significance results between the two. For example:
gene locus sample_1 sample_2 status value_1 value_2 log2(fold_change) test_stat p_value q_value significant
Asmb 5: Gfap 10:90763148-90771847 lineA lineN OK 484.283 11.1909 -5.43545 1.62926 0.103258 0.394632 no
Asmb 4: Gfap 10:92059880-92068555 lineA lineN OK 526.67 12.77 -5.36606 4.09233 4.27058E-005 0.00052085 yes
Both were run with the same cuffdiff binary (Cuffdiff 2.0.2), with the exact same command (adjusted for the appropriate assembly), with an FDR of 0.05. It would stand to reason that the results are similar between the binaries- Line A is much more upregulated than line N in both cases, and the only statistical difference I can see that might have an effect is that the size of the gene in the assembly changed by 24 nucleotides, out of just under 10000.
If the gene size, fold change, and FPKM values are so similar, why are the statistical values so wildly different? This does not make sense to me.
Thanks!
I've been aligning and counting some RNA-seq reads with SHRiMP and Cuffdiff, doung the same analysis with both an older genome assembly and a newer one, and I found an interesting possible discrepancy in my Cuffdiff output. If anyone could help explain it would be much appreciated.
Basically, I noticed a number of different genes where the expression levels was similar between the two assemblies, yet for some reason Cuffdiff was reporting wildly different significance results between the two. For example:
gene locus sample_1 sample_2 status value_1 value_2 log2(fold_change) test_stat p_value q_value significant
Asmb 5: Gfap 10:90763148-90771847 lineA lineN OK 484.283 11.1909 -5.43545 1.62926 0.103258 0.394632 no
Asmb 4: Gfap 10:92059880-92068555 lineA lineN OK 526.67 12.77 -5.36606 4.09233 4.27058E-005 0.00052085 yes
Both were run with the same cuffdiff binary (Cuffdiff 2.0.2), with the exact same command (adjusted for the appropriate assembly), with an FDR of 0.05. It would stand to reason that the results are similar between the binaries- Line A is much more upregulated than line N in both cases, and the only statistical difference I can see that might have an effect is that the size of the gene in the assembly changed by 24 nucleotides, out of just under 10000.
If the gene size, fold change, and FPKM values are so similar, why are the statistical values so wildly different? This does not make sense to me.
Thanks!
Comment