View Single Post
Old 06-08-2010, 04:34 PM   #2
Location: St Louis, MO

Join Date: Nov 2009
Posts: 27
Default advice on tophat/cufflinks genes/transcripts and RPKM

I've seen similar behavior when using cufflinks to look at RPKM assigned to Ensembl transcript entries. If you look carefully at exactly which chromosomal locations are involved for each transcript entry, I think you'll find that each line in transcripts.expr (or genes.expr) corresponds to a different chromosomal location within the same ENS transcript entry.

i.e. I don't think cufflinks is outputting multiple RPKMs for the same 'bundle' of reads - it's just that these 'bundles' match to multiple areas of a given ENS entry.

I get around this by having tophat and cufflinks do RPKM calculations for the whole ENSG genes rather than individual ENST transcripts. Tophat with no -G, but use it in cufflinks, e.g.:

tophat -o /path/to/output_folder /path/to/bowtie_index input_reads/fastq
cd /path/to/output_folder
cufflinks -G Ensembl.gtf accepted_hits.sam

Then extract your data from the genes.expr file.

Hope that helps!
sjm is offline   Reply With Quote