I'm aware this topic is similar to http://seqanswers.com/forums/showthread.php?t=61116 , but I had a few additional questions.
I'd like to extract fpkm from an RNAseq experiment that has been analyzed with DESeq2. This requires adding a gene size table to the dds object. I retrieved gene sizes from biomart by selecting the following attributes:
Associated Gene Name
Transcript length (including UTRs and CDS)
This gets me the following table:
Some transcripts have more than one value for transcript length because, I assume, of splice variants. But this means I can't combine the tables as shown in the previous thread.
I am not sure how to work around this. The gtf used in the alignment had gene names rather than transcript ids. Does the data need to be remapped using a different gtf?
Alternatively, maybe there's a better way to get fpkm values out of a read count table?
I'd like to extract fpkm from an RNAseq experiment that has been analyzed with DESeq2. This requires adding a gene size table to the dds object. I retrieved gene sizes from biomart by selecting the following attributes:
Associated Gene Name
Transcript length (including UTRs and CDS)
This gets me the following table:
Code:
> head(gene_sizes) transcript_length gene_name 12700 803 0610006L08Rik 12701 1589 0610006L08Rik 64621 401 0610007P14Rik 64622 408 0610007P14Rik 64619 696 0610007P14Rik 64618 1169 0610007P14Rik
I am not sure how to work around this. The gtf used in the alignment had gene names rather than transcript ids. Does the data need to be remapped using a different gtf?
Alternatively, maybe there's a better way to get fpkm values out of a read count table?