Hello Everyone,
I am scratching my head because I just can't seem to find a way to calculate FPKM values for Ensembl genes. I aligned the reads by STAR, and got count data using HTseq.
For refSeq, FPKM calculation is relatively easy since there isn't much overlap in the genome. However, Ensembl genes contain so many isoforms and overlapping region is a problem in calculating FPKM.
I first thought getting unique exon regions in gtf files will do the work (this is in a different post). However, STAR aligner also aligns the reads with two (or more) spanning exons. Therefore, I have to take the unique exon junctions into consideration for FPKM calculation. So far I don't know how to do this effectively.
I wish HTseq had a tool to spit out all the regions that were used in read counting, then the problem can be solved easily.
If anyone has encountered and solved this problem, I will appreciate your thoughts and inputs.
Thank you!
RK
I am scratching my head because I just can't seem to find a way to calculate FPKM values for Ensembl genes. I aligned the reads by STAR, and got count data using HTseq.
For refSeq, FPKM calculation is relatively easy since there isn't much overlap in the genome. However, Ensembl genes contain so many isoforms and overlapping region is a problem in calculating FPKM.
I first thought getting unique exon regions in gtf files will do the work (this is in a different post). However, STAR aligner also aligns the reads with two (or more) spanning exons. Therefore, I have to take the unique exon junctions into consideration for FPKM calculation. So far I don't know how to do this effectively.
I wish HTseq had a tool to spit out all the regions that were used in read counting, then the problem can be solved easily.
If anyone has encountered and solved this problem, I will appreciate your thoughts and inputs.
Thank you!
RK
Comment