I'm seeing some odd FPKM values reported by cufflinks and I'm wondering if anyone else has seen this or can suggest an explanation. Essentially, the shorter a transcript is the higher its FPKM. The shortest transcripts reach ridiculous levels. In a typical experiment, I see:
If I examine the alignment in IGV or directly in the SAM file I find that the short transcripts do not in fact have ridiculously high coverage. For example a 90bp transcript with an FPKM over 50,000 has just 18 reads (total reads in the experiment is about 20M).
I see this with cufflinks-1.1.0 and 1.0.3, with and without upper quartile normalization.
Code:
Tscript Length avg. FPKM -------------- --------- >1000 20 200 - 1000 30 100 - 200 2,500 < 100 130,000
I see this with cufflinks-1.1.0 and 1.0.3, with and without upper quartile normalization.
Comment