SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
DESeq with FPKM values frymor Bioinformatics 12 04-29-2016 12:29 PM
get FPKM for Ensembl Gene IDs linp5 Bioinformatics 4 11-03-2014 05:17 PM
NOISeq with fpkm values NitaC Bioinformatics 5 07-12-2014 05:11 AM
Calculating p-values from FPKM? Artur Jaroszewicz Bioinformatics 16 10-25-2012 12:04 PM
FPKM values are zero budgie lover Bioinformatics 1 09-12-2012 04:54 AM

Reply
 
Thread Tools
Old 10-14-2015, 04:26 PM   #1
rkawa
Junior Member
 
Location: Los Angeles

Join Date: Jan 2014
Posts: 5
Question How to get FPKM values for Ensembl

Hello Everyone,

I am scratching my head because I just can't seem to find a way to calculate FPKM values for Ensembl genes. I aligned the reads by STAR, and got count data using HTseq.

For refSeq, FPKM calculation is relatively easy since there isn't much overlap in the genome. However, Ensembl genes contain so many isoforms and overlapping region is a problem in calculating FPKM.

I first thought getting unique exon regions in gtf files will do the work (this is in a different post). However, STAR aligner also aligns the reads with two (or more) spanning exons. Therefore, I have to take the unique exon junctions into consideration for FPKM calculation. So far I don't know how to do this effectively.

I wish HTseq had a tool to spit out all the regions that were used in read counting, then the problem can be solved easily.

If anyone has encountered and solved this problem, I will appreciate your thoughts and inputs.

Thank you!

RK
rkawa is offline   Reply With Quote
Old 10-15-2015, 12:55 AM   #2
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,479
Default

It looks like you're trying to make things vastly more complicated than they are. If you just want FPKMs from the non-multimappers then just take the counts, divide by some gene length metric (there are many) and continue with FPKM calculation from there.

In general, though, I would just use Salmon or Kallisto, though they'll give you transcript-level metrics (just sum them to get gene-level values).
dpryan is offline   Reply With Quote
Old 10-15-2015, 07:39 PM   #3
blancha
Senior Member
 
Location: Montreal

Join Date: May 2013
Posts: 367
Default

Quote:
I wish HTseq had a tool to spit out all the regions that were used in read counting, then the problem can be solved easily.
featureCounts just does that, by default.
It computes the length of the genes from the GTF file used to count the reads.
As a bonus, it's much faster.

I take the gene lengths calculated by featureCounts from the GTF file, and then give the lengths in input to DESeq2's fpkm() function.

Or, you could use Cuffdiff, but that's a different pipeline altogether.
blancha is offline   Reply With Quote
Reply

Tags
fpkm, htsesq, star

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 01:08 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO