SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
FPKM is 0, when using cufflinks in galaxy on the net chenyao Bioinformatics 1 12-04-2012 06:12 AM
cufflinks FPKM >>> Cuffdiff FPKM peromhc Bioinformatics 6 10-17-2012 01:07 PM
Cufflinks and cuffdiff FPKM values combiochem Bioinformatics 12 10-13-2012 11:37 PM
cufflinks-1.0.3 produces very high FPKM values when compared to cufflinks-0.9.3. Why? pinki999 Bioinformatics 5 06-09-2012 06:48 AM
Questions: Cufflinks FPKM lewewoo RNA Sequencing 0 05-09-2011 11:30 AM

Reply
 
Thread Tools
Old 02-13-2012, 11:51 AM   #1
flobpf
Member
 
Location: USA

Join Date: Apr 2010
Posts: 76
Default Unusually high FPKM for cufflinks

Hi,

I have been working with Cufflinks v. 1.1.0 with mRNA-Seq data from some of the old (2008) runs of Illumina with 36bp reads. The only options I specified were (-I 5000 and -b <refSeqFasta>) and there was no reference GFF specified. In the resulting transcripts.gtf, I'm getting unusually high FPKM values on the scale of tens to hundreds of thousands (eg: FPKM=83456.4, 5571.5, 1017907.8) for several thousand transcripts.

Some previous posts had suggested short read length and reference FASTA as possible culprits. But, removing the -b option does not help. This is not a problem with the BAM format since SAM format also gives similar result. I tried the newer v. 1.3.0 and that too gives similar values. I'm not sure if short transcripts are being consistently inflated.

Strangely, the older v. 0.9.3 is giving respectable FPKM values (455.4 for the transcript that had 83456.4 previously), which I'd like to trust since they match manually calculated values (not quite, but close).

However, I wonder why the new versions of Cufflinks are inflating the FPKM values by several orders of magnitude? Has anyone found a solution to this problem? Can I still use the new versions without causing such FPKM inflation?

Thanks

Last edited by flobpf; 02-13-2012 at 12:21 PM.
flobpf is offline   Reply With Quote
Old 02-13-2012, 01:26 PM   #2
flobpf
Member
 
Location: USA

Join Date: Apr 2010
Posts: 76
Default High FPKM for small transcripts

It is indeed true that small Cufflinks transcripts tend to have significantly inflated FPKMs. Anyone else seeing this?


Last edited by flobpf; 02-13-2012 at 01:34 PM.
flobpf is offline   Reply With Quote
Old 02-15-2012, 06:50 AM   #3
Nicolas
Member
 
Location: new york city

Join Date: Apr 2009
Posts: 40
Default

Hi,

Which mode did you use Cufflinks with? with a reference file, in RABT mode, or de-novo?
Nicolas is offline   Reply With Quote
Old 02-15-2012, 06:57 AM   #4
flobpf
Member
 
Location: USA

Join Date: Apr 2010
Posts: 76
Default

Quote:
Originally Posted by Nicolas View Post
Hi,

Which mode did you use Cufflinks with? with a reference file, in RABT mode, or de-novo?
Hi Nicholas,

I used the ~RABT mode with single-end reads. The reads were first mapped to reference genome using TopHat and Cufflinks was run on accepted_hits.bam file. However, reference GTF was not specified.

Last edited by flobpf; 02-15-2012 at 07:08 AM. Reason: Not exactly RABT, not exactly denovo
flobpf is offline   Reply With Quote
Old 02-15-2012, 07:29 AM   #5
kopi-o
Senior Member
 
Location: Stockholm, Sweden

Join Date: Feb 2008
Posts: 319
Default

Yes, we see this as well (and other groups I have spoken to). It's pretty consistent from run to run.
kopi-o is offline   Reply With Quote
Old 02-17-2012, 04:47 AM   #6
Cole Trapnell
Senior Member
 
Location: Boston, MA

Join Date: Nov 2008
Posts: 212
Default

A note about small transcripts and high FPKM: The reason you're seeing this is that with a very small transcript, the fragments that map to it have to be short (at least as short as the transcript), and thus often come from the tail of the library's fragment length distribution. What I mean by this is that if you plot a histogram of the length of each library fragment, there's usually a mean around 200-250 bp (depending on the protocol, and excluding adapters). Most fragments aren't much larger or much smaller than that - i.e. the variance is very small. However, there are a small fraction of fragments that are super short (100bp or even smaller) or quite long (500-600bp). Because these are rare, Cufflinks reasons that for the small transcript to have generated them, it must be very very abundant. In fact, it probably generated many many more fragments, most of which didn't make it through all of the size selection steps during library construction. So we "upscale" the FPKM to account for this effect. You can read about this correction in the supplement of the Cufflinks paper. The reason for the change between 1.1.0 and 0.9.3 is that there were some problems in the actual implementation of the correction in 0.9.3, and we fixed them in later versions.

While the correction (in our opinion) is good thing to do, the bottom line is that standard RNA-Seq is really not the right assay for measuring small RNA expression, because the very nature of the size selection introduces a lot of error and variability in the sampling of fragments from these species. I'm actually considering adding another status flag (similar to HIDATA, FAIL, etc) to warn users that their library is too large for reliable quantification of a particular transcript.
Cole Trapnell is offline   Reply With Quote
Reply

Tags
cufflinks, fpkm, mrna-seq, tophat

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 06:58 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO