Go Back   SEQanswers > Bioinformatics > Bioinformatics

Similar Threads
Thread Thread Starter Forum Replies Last Post
Limma removeBatchEffect question Justin AC Powell Bioinformatics 0 02-17-2014 06:13 AM
Galaxy Cufflinks FPKMs vs Cuffdiff FPKMs; why different? ccard28 Bioinformatics 2 01-15-2013 11:48 AM
cuffdiff and limma, puzzled by the differences dawe Bioinformatics 1 08-31-2012 05:44 AM
limma on log tranformed RPKM values biofreak General 1 08-30-2012 11:58 PM
cuffdiff gene FPKMs and transcripts FPKMs non-identical when using replicates Noa Bioinformatics 0 05-04-2012 12:52 AM

Thread Tools
Old 05-21-2014, 02:14 AM   #1
Junior Member
Location: Lisbon

Join Date: Feb 2013
Posts: 5
Default FPKMs and Limma R package


I have generated a dataset with 9 different biological samples (plus replicates) and have analyzed it using TopHat and CuffLinks. Therefore, I currently have a table with the FPKM values for every gene in each sample.

I am trying to use the Limma R package to model and extract differentially expressed genes between these several different samples (instead of 2-by-2 comparisons that can be made using CuffDiff) and have encountered the following problem to which I would really appreciate some advice.

I have to transform the FPKM values into log2 values to then use this in the lmFit() function. However, since there are "zeros", if I do this directly on the FPKM table, a lot of "Infinite" values are generated. I was therefore thinking of adding a specific number to all of the FPKM values before transforming them into log2 data. So my questions are:

1. Is this a good approach?
Are there better alternatives?

2. Is there a specific value that should be added?
I was thinking of adding a small value (e.g. 10^-10, a value whose log2(10^-10) ~-33 is in the "opposite" range of the log2 positive values - in my table the maximum log2(FPKM)~22).
But I am not sure if this is correct and would also like to know if there is a "normal" value that people usually add.


Note: I also have the count numbers and could eventually do everything with the voom function and then Limma, but since I have all my initial analysis using the FPKMs I would really like to stick with them for consistency... so any help is deeply appreciated!
EA01 is offline   Reply With Quote
Old 05-21-2014, 03:36 AM   #2
Devon Ryan
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480

Adding a small count seems to be the common method. If you look at how edgeR calculates log2(rpkm), for example, you'll see that it adds a small value (0.25 by default) to the raw counts before computing CPM, which is then used to get RPKM. For comparison, a minimum of 0.25 on the raw count scale would be ~2.5e-7 FPKM for a 1kb gene (depending on how library sizes were computed).
dpryan is offline   Reply With Quote
Old 05-30-2014, 04:40 AM   #3
Junior Member
Location: Lisbon

Join Date: Feb 2013
Posts: 5

I have tried this but I am not happy with the results... I get really strange volcano plots (see figure), which I guess are a consequence of different variance stabilization methods...
Therefore, I think I will stick with the use of the read counts (even if it means going back and re-doing my previous analysis).

EA01 is offline   Reply With Quote
Old 06-15-2014, 11:55 PM   #4
Gordon Smyth
Location: Melbourne, Australia

Join Date: Apr 2011
Posts: 91

Yes, that's a wise decision. Use the voom function to process the counts prior to lmFit. The voom-limma pipeline needs to work with counts, rather than with FPKM.
Gordon Smyth is offline   Reply With Quote

fpkm, limma, log2, voom

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

All times are GMT -8. The time now is 06:12 AM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO