Perplexing,
Running Cufflinks (2.0.0 and previous version) with
the -g/--GTF-guide AND -N/--upper-quartile-norm options
can result in enormous FPKM values (~e+11), and not only
for short/novel transcripts---for many many genes.
Of 16725 genes, 12094 are nonzero (min= 432, max = 5.7e+11).
For nonzero genes:
25th %ile = 216,582
50th %ile = 1e+06
75th %ile = 3.12e+06
I appreciate that -N should just be re-scaling FPKMs
and have seen reasonable results from Cufflinks
in -G (non-deNovo) mode, but these -g levels
seem strange to me.
Removing the -N option brings the FPKM levels
back to a typical range of values.
Unfortunately the error bounds appear to be identically
equal to the mean FPKM, with and without -N, at least
for Cufflinks 2.0.0. Removing the -b option (which was
also originally used) restored the error-bars.
It seems that -g & -N together give an FPKM range that
is quite different from the range produced by -G & -N.
I would guess that the reason for this is related to tiling of
the annotated transcripts with faux reads in RABT.
Perhaps, somehow, this is making almost all genes
look like upper-quartile expressors, but this is only a guess.
Running Cufflinks (2.0.0 and previous version) with
the -g/--GTF-guide AND -N/--upper-quartile-norm options
can result in enormous FPKM values (~e+11), and not only
for short/novel transcripts---for many many genes.
Of 16725 genes, 12094 are nonzero (min= 432, max = 5.7e+11).
For nonzero genes:
25th %ile = 216,582
50th %ile = 1e+06
75th %ile = 3.12e+06
I appreciate that -N should just be re-scaling FPKMs
and have seen reasonable results from Cufflinks
in -G (non-deNovo) mode, but these -g levels
seem strange to me.
Removing the -N option brings the FPKM levels
back to a typical range of values.
Unfortunately the error bounds appear to be identically
equal to the mean FPKM, with and without -N, at least
for Cufflinks 2.0.0. Removing the -b option (which was
also originally used) restored the error-bars.
It seems that -g & -N together give an FPKM range that
is quite different from the range produced by -G & -N.
I would guess that the reason for this is related to tiling of
the annotated transcripts with faux reads in RABT.
Perhaps, somehow, this is making almost all genes
look like upper-quartile expressors, but this is only a guess.
Comment