Hi!
I have generated a dataset with 9 different biological samples (plus replicates) and have analyzed it using TopHat and CuffLinks. Therefore, I currently have a table with the FPKM values for every gene in each sample.
I am trying to use the Limma R package to model and extract differentially expressed genes between these several different samples (instead of 2-by-2 comparisons that can be made using CuffDiff) and have encountered the following problem to which I would really appreciate some advice.
I have to transform the FPKM values into log2 values to then use this in the lmFit() function. However, since there are "zeros", if I do this directly on the FPKM table, a lot of "Infinite" values are generated. I was therefore thinking of adding a specific number to all of the FPKM values before transforming them into log2 data. So my questions are:
1. Is this a good approach?
Are there better alternatives?
2. Is there a specific value that should be added?
I was thinking of adding a small value (e.g. 10^-10, a value whose log2(10^-10) ~-33 is in the "opposite" range of the log2 positive values - in my table the maximum log2(FPKM)~22).
But I am not sure if this is correct and would also like to know if there is a "normal" value that people usually add.
Thanks!!!
Note: I also have the count numbers and could eventually do everything with the voom function and then Limma, but since I have all my initial analysis using the FPKMs I would really like to stick with them for consistency... so any help is deeply appreciated!
I have generated a dataset with 9 different biological samples (plus replicates) and have analyzed it using TopHat and CuffLinks. Therefore, I currently have a table with the FPKM values for every gene in each sample.
I am trying to use the Limma R package to model and extract differentially expressed genes between these several different samples (instead of 2-by-2 comparisons that can be made using CuffDiff) and have encountered the following problem to which I would really appreciate some advice.
I have to transform the FPKM values into log2 values to then use this in the lmFit() function. However, since there are "zeros", if I do this directly on the FPKM table, a lot of "Infinite" values are generated. I was therefore thinking of adding a specific number to all of the FPKM values before transforming them into log2 data. So my questions are:
1. Is this a good approach?
Are there better alternatives?
2. Is there a specific value that should be added?
I was thinking of adding a small value (e.g. 10^-10, a value whose log2(10^-10) ~-33 is in the "opposite" range of the log2 positive values - in my table the maximum log2(FPKM)~22).
But I am not sure if this is correct and would also like to know if there is a "normal" value that people usually add.
Thanks!!!
Note: I also have the count numbers and could eventually do everything with the voom function and then Limma, but since I have all my initial analysis using the FPKMs I would really like to stick with them for consistency... so any help is deeply appreciated!
Comment