 07-23-2014, 07:58 AM #1 dlepe Junior Member   Location: Mexico Join Date: Jul 2014 Posts: 8 sum of FPKMs? I'm analyzing the expression levels of certain genes in different tissues with data from a database and I need to count two different genes as one because I know by experimental data that they were erroneously annotated. The expression levels in the database are in FPKM and I know I can't simple make the sum of the two genes to count it as one. If I had raw counts what I would do is gene A = 400, gene B = 300, counting them as a single gene = 700. what would be the best thing to do this with FPKMs? gene A = 12 FPKM gene B = 20 FPKM as single gene = x ? Last edited by dlepe; 07-23-2014 at 11:51 AM.
 07-23-2014, 09:43 AM #2 blakeoft Member   Location: Connecticut Join Date: Oct 2013 Posts: 79 FPKM is fragments per kilobase of transcript per million mapped reads. So then x = total number of fragments / ((total number of bases of transcipt / 1000) * (mapped fragments / 1000000)) = (fragments mapped to gene A + fragments mapped to gene B) / ((bases of gene A + bases of gene B) / 1000 * (mapped fragments / 1000000)) This would be if gene A and gene B did not overlap (and by that I mean that no read is mapped to both gene A and gene B). If they do, you'll have to use something like the inclusion-exclusion principle. I don't think you can simply add the two FPKM values, like you mentioned.
 07-24-2014, 08:55 AM #3 dlepe Junior Member   Location: Mexico Join Date: Jul 2014 Posts: 8 The thing is I donīt have the total number of mapped fragments from the libraries, I would have to try to see if the raw data is available somewhere and do the mapping myself.. Since I'm trying to get an estimation of the correlated expression between the gene in question to another gene a friend suggested to simply use the average of gene A and gene B as the expression value I'm trying to find. His reasoning is that since FPKMs are normalized by length, and assuming that the number of raw counts in gene A and B similar, the FPKM for only gene A or B should be very similar to the number of FPKMs we'd get if we calculate the FPKMs for they both as a single gene.
 07-24-2014, 10:32 AM #4 blakeoft Member   Location: Connecticut Join Date: Oct 2013 Posts: 79 I suppose you could do an average. I think a weighted average would be better suited for this. You could weight each FPKM value by the length of the corresponding gene.
 07-24-2014, 10:52 AM #5 dlepe Junior Member   Location: Mexico Join Date: Jul 2014 Posts: 8 Yeah I guess, I'll see how that goes, thanks.
 07-24-2014, 11:03 AM #6 blakeoft Member   Location: Connecticut Join Date: Oct 2013 Posts: 79 I just did the math, and the weighted average is what you want, provided the genes don't overlap like I previously stated. So if gene A has FPKM a, and gene B has FPKM b, you want: a * |A| + b * |B| |A| + |B| where |x| is the length of gene x. Edit: If you want, I can type up my reasoning in latex. I just don't know of a nice way to display fractions on seqanswers.
 07-24-2014, 11:05 AM #7 dlepe Junior Member   Location: Mexico Join Date: Jul 2014 Posts: 8 awesome, I'll look into it, thanks again.