Seqanswers Leaderboard Ad

**savova** · 04-25-2012, 08:51 AM

I need an answer to this too...

**sdriscoll** · 04-25-2012, 10:09 AM

FPKMS are simply rate measurements. You could have a gene with an FPKM of 100 that only got 20 reads. It all depends on that last part of the normalization: per million mapped reads.

There is no logical bottom end cutoff for FPKM where you can say "these genes are not expressed", other than 0 of course.

If you mean that most of the genes in your results seem right bu a subset of them seem to have higher FPKMS than others with similar amounts of coverage then you're probably seeing an artifact from the cufflinks pipeline. I have seen that many times myself for small genes like those single exon ones. It doesn't make much sense. I recommend trying the -b option on cufflinks and/or cuffdiff. That uses the bias correction pipeline within cufflinks and it seems to fix those erroneous FPKMS.

**savova** · 04-25-2012, 12:37 PM

I have a different problem - all my RPKM values in one dataset are shifted by 10,000 with respect to another! Both are described to have been prepared the same way. I am pasting my message to cufflinks developers:

I wanted to compare this dataset

http://0-www.ncbi.nlm.nih.gov.ilsprod.lib.neu.edu/geo/query/acc.cgi?acc=GSE29119

to available Encode datasets on other cell lines:

ENC RNA-seq Caltech RNA-seq Track Settings

http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=wgEncodeCaltechRnaSeq

My expression analysis with Cufflinks is weird. In particular, it seems that the
whole RPKM distribution is shifted up for the first dataset samples (HMEC and
HCC1954) . For example, the minimum of both HMEC and H1HESC is 0, but the maximum
is 3*10^9 and 3*10^4 respectively. So in log space, the average RPKM for
the other cell lines is around 2-3, while for HMEC and HCC1955 it's 10-12. At this
point I went all the way back to fastq, realigned to Hg19 with bowtie,
and used cufflinks to compute RPKM - the difference remains. Any ideas why?

It is true that one library may have more reads. But isn't FPKM supposed to normalize for the number of total reads in the library and if so how can the entire distribution be shifted?

2) On another note, I also do not understand how I am getting some really small non-zero values from both datasets when the total number of reads would not seem to permit this:

total reads HMEC_expression:
2.2983e+10

min HMEC_expression >0
3.0939e-312

I would really appreciate your help.

**sdriscoll** · 04-25-2012, 12:52 PM

I've seen cuffdiff blow the read count normalizations but not cufflinks. In my case I saw a 10 fold increase in the baseline of one group's mean expression verses the other causing almost all genes to be tagged as significantly misexpressed.

Have you tried testing the different normalization options that Cufflinks provides? Have you tried the --compatible-hits-norm option or the -N option for upper quartile normalization.

You can also look in the isoforms.fpkm_tracking files and check the "length" and "coverage" columns. You can roughly compute the number of raw reads aligned to each gene by multiplying those columns together. Sum the column of products to get a rough "total bases aligned to genes" count and divide the column by that number to roughly normalize the counts. Try that at each sample and see if you still have that massive offset between samples.

**savova** · 04-25-2012, 01:27 PM

thanks, i will try this. but I am now worried this software works erratically. do you have any idea why such blowing of the normalization occurs? can i trust results from other people computed with this software?

**sdriscoll** · 04-25-2012, 01:45 PM

I don't use it as my primary quantification tool nor my primary differential expression tool. I've never seen DESeq or edgeR blow the normalization step. We are only talking about a division step so it doesn't make sence for any software to mess it up. To me Cufflinks is very desirable but I don't trust it so I don't use it. I have explored it quite a lot though because I very much want to be able to use it.

In your case it COULD be a result of the normalization being based on total reads aligned instead of the more robust upper quartile method. But you should check the coverages to make sure. If your manual normalizations give you the same result then you've got some small population of highly expressed genes biasing the normalization. The -N option should fix that or normalizing by the upper quartile of the read counts of the genes. I'd also try the -b option because it seems to help fix some other things that Cufflinks does that make me not trust it. I still dont trust it though. Maybe im just not smart enough to understand it.

**caballien** · 01-04-2013, 01:33 PM

very low fpkm?

sdriscoll-

Nearly all of my fpkm values are very low. The median of all of my replicates is ~0.1 and I have between 50 and 60 million mapped reads per sample. Very few genes are above 10. See the attached graph boxplot2.pdf and testdensity.pdf. Are these values too low, or as you said caused by a larger denominator and thus are okay? Also, I've attached a .pdf of a volcano plot, which is strange because I have ~870 significantly differentially expressed genes, but they all show up at the top of the graph where they don't belong (pvalues are not that small). Perhaps cummeRbund is just doing something improperly.

The sequencing is from RNA-seq from ribosomal depleted RNA, could this lower the fpkms? I did mask all repetitive regions when using cuffdif.

The sequencing was performed on a HiSeq. The data was generated through the Tuxedo package -Tophat 2, cufflinks,cuffmerge,cummeRbund.

Attached Files

Topics	Statistics	Last Post
New Software Simplifies 3D Gene Expression Mapping by seqadmin Started by seqadmin, Yesterday, 10:17 AM	0 responses 7 views 0 reactions	Last Post by seqadmin Yesterday, 10:17 AM
AI Tool Creates High-Resolution 3D Maps of the Mouse Brain by seqadmin Started by seqadmin, 03-20-2025, 05:03 AM	0 responses 49 views 0 reactions	Last Post by seqadmin 03-20-2025, 05:03 AM
Studying Microbial Gene Transfer with RNA Barcoding by seqadmin Started by seqadmin, 03-19-2025, 07:27 AM	0 responses 59 views 0 reactions	Last Post by seqadmin 03-19-2025, 07:27 AM
Mapping the snoRNAome in Zebrafish to Advance Disease Research by seqadmin Started by seqadmin, 03-18-2025, 12:50 PM	0 responses 50 views 0 reactions	Last Post by seqadmin 03-18-2025, 12:50 PM

Seqanswers Leaderboard Ad

Cufflinks FPKM range

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News