View Single Post
Old 05-21-2010, 03:27 AM   #1
oliviera
Member
 
Location: germany

Join Date: Apr 2010
Posts: 31
Default tuning the tophat/cufflink pipeline

Dear all,

I am using tophat/cufflinks on a set of single end sequencing data from 4 different biological samples (without replicates). My first goal is to compare the rpkm in the different conditions.

Here is how I use tophat:
tophat -p 4 -G /path/to/gff_file /path/to/genome_file /path/to_input_file

And here cufflinks usage I used:
cufflinks -G /path/to_gff_file /path/to/sam_file

When I look to the transcript.expr file I am surprize to see that sometimes same ensembl transcript have multiple rpkm, such:

ENSDART00000000198 1367346 Zv8_scaffold3091 65836 148200 0.03 1 1 0 0.1 0.02 2139
ENSDART00000000198 1367332 Zv8_scaffold3091 65836 148200 1.47 1 1 1.03 1.9 0.77 2139

This is true also in the gene.expr file...

Is it because I didnt use the -g 1 option in tophat to restrict to single hit in the genome?
Could help me to tune the options in tophat and cufflink to avoid this splitting of rpkm from the same location?

Cheers

Oliviera
oliviera is offline   Reply With Quote