Seqanswers Leaderboard Ad

**flobpf** · 02-13-2012, 02:26 PM

High FPKM for small transcripts

It is indeed true that small Cufflinks transcripts tend to have significantly inflated FPKMs. Anyone else seeing this?

**Nicolas** · 02-15-2012, 07:50 AM

Hi,

Which mode did you use Cufflinks with? with a reference file, in RABT mode, or de-novo?

**flobpf** · 02-15-2012, 07:57 AM

Originally posted by Nicolas View Post

Hi,

Which mode did you use Cufflinks with? with a reference file, in RABT mode, or de-novo?

Hi Nicholas,

I used the ~RABT mode with single-end reads. The reads were first mapped to reference genome using TopHat and Cufflinks was run on accepted_hits.bam file. However, reference GTF was not specified.

**kopi-o** · 02-15-2012, 08:29 AM

Yes, we see this as well (and other groups I have spoken to). It's pretty consistent from run to run.

**Cole Trapnell** · 02-17-2012, 05:47 AM

A note about small transcripts and high FPKM: The reason you're seeing this is that with a very small transcript, the fragments that map to it have to be short (at least as short as the transcript), and thus often come from the tail of the library's fragment length distribution. What I mean by this is that if you plot a histogram of the length of each library fragment, there's usually a mean around 200-250 bp (depending on the protocol, and excluding adapters). Most fragments aren't much larger or much smaller than that - i.e. the variance is very small. However, there are a small fraction of fragments that are super short (100bp or even smaller) or quite long (500-600bp). Because these are rare, Cufflinks reasons that for the small transcript to have generated them, it must be very very abundant. In fact, it probably generated many many more fragments, most of which didn't make it through all of the size selection steps during library construction. So we "upscale" the FPKM to account for this effect. You can read about this correction in the supplement of the Cufflinks paper. The reason for the change between 1.1.0 and 0.9.3 is that there were some problems in the actual implementation of the correction in 0.9.3, and we fixed them in later versions.

While the correction (in our opinion) is good thing to do, the bottom line is that standard RNA-Seq is really not the right assay for measuring small RNA expression, because the very nature of the size selection introduces a lot of error and variability in the sampling of fragments from these species. I'm actually considering adding another status flag (similar to HIDATA, FAIL, etc) to warn users that their library is too large for reliable quantification of a particular transcript.

Topics	Statistics	Last Post
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, Yesterday, 08:47 AM	0 responses 12 views 0 likes	Last Post by seqadmin Yesterday, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 54 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM

Seqanswers Leaderboard Ad

Announcement

Unusually high FPKM for cufflinks

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News