Go Back   SEQanswers > Applications Forums > RNA Sequencing

Similar Threads
Thread Thread Starter Forum Replies Last Post
Minimal FPKM values for analysis in Cufflinks AdamB Bioinformatics 4 07-24-2013 12:48 AM
Different FPKM values of cufflinks and cuffdiff mrfox Bioinformatics 5 10-17-2012 02:10 PM
Cufflinks and cuffdiff FPKM values combiochem Bioinformatics 12 10-14-2012 12:37 AM
cufflinks-1.0.3 produces very high FPKM values when compared to cufflinks-0.9.3. Why? pinki999 Bioinformatics 5 06-09-2012 07:48 AM
Cufflinks' computation of FPKM for --GTF and --GTF-guide estimation burt Bioinformatics 0 08-24-2011 12:59 AM

Thread Tools
Old 10-07-2011, 01:31 PM   #1
Junior Member
Location: Michigan

Join Date: Aug 2011
Posts: 3
Default cufflinks no FPKM values due to gtf issue


I am facing an issue with getting cufflinks results with Arabidopsis. The problem seems to be the reference gtf.

When I ran cufflinks without a reference gtf, I got FPKM values for genes in the genes.fpkm_tracking file.

tracking_id class_code nearest_ref_id gene_id gene_short_name tss_id locus length coverage FPKM FPKM_conf_lo FPKM_conf_hi FPKM_status

CUFF.1 - - CUFF.1 - - Chr1:3675-3911 - - 49.5395 33.4668 65.6123 OK

CUFF.2 - - CUFF.2 - - Chr1:3995-4272 - - 30.6876 20.7312 40.6439 OK

CUFF.3 - - CUFF.3 - - Chr1:4467-5098 - - 21.4671 17.3165 25.6177 OK

But when I ran cufflinks with a reference gtf.
I am not getting any FPKM values.

tracking_id class_code nearest_ref_id gene_id gene_short_name tss_id locus length coverage FPKM FPKM_conf_lo FPKM_conf_hi FPKM_status

XLOC_000001 - - XLOC_000001 ANAC001 TSS1 1:3630-5899 - - 0 0 0 OK

XLOC_003797 - - XLOC_003797 ARV1 TSS3983 1:5927-8737 - - 0 0 0 OK

XLOC_003798 - - XLOC_003798 NGA3 TSS3984 1:11648-13714 - - 0 0 0 OK

I am aware that cufflinks/cufffdiff need a compatible reference genome format. So for both these cufflinks runs I used a compatible reference.gtf file, which was created as follows:
  1. Downloaded NCBI dataset for Arabidopsis from
  2. took genes.gtf file
  3. created cuffcombined.gtf using this command: cuffcompare -s /ccmb/CoreBA/BioinfCore/Common/DATA/Cufflinks_Data/Arabidopsis/Arabidopsis_thaliana/NCBI/build9.1/Sequence/WholeGenomeFasta/genome.fa -CG -r ../arabidopsis_genes.gtf ../arabidopsis_genes.gtf
  4. Used cuffcmp.combined.gtf which was created as the reference.gtf

I also ran cufflinks using the ncbi genes.gtf (instead of using the cuffcmp.combined.gtf), but I still got no FPKM calculations in the result file.

I also ran cuffdiff with the cuffcmp.combined.gtf, and here again I did not get any FPKM values, and hence, I am getting a NOTEST.

Would grealty appreciate your help in figuring out what is the problem.

Thanks in advance,

ngsbee is offline   Reply With Quote
Old 10-10-2011, 10:54 AM   #2
Junior Member
Location: USA

Join Date: Jul 2010
Posts: 8

Please make sure your gtf file is compatible to the genome you aligned. Such as if chromosome name is same.
unidodo is offline   Reply With Quote
Old 10-10-2011, 11:19 AM   #3
Junior Member
Location: Michigan

Join Date: Aug 2011
Posts: 3


The problem was the .gtf had just the chromosome numbers (eg: 1,2 etc) , while my tophat output files had the chromosome numbers as : Chr1, Chr2, etc.

Once I modified the gtf and made the chromosome numbers the same, cufflinks is running fine.
ngsbee is offline   Reply With Quote
Old 10-14-2011, 03:05 PM   #4
Junior Member
Location: ames, IA

Join Date: Jul 2011
Posts: 8

Hi Ash:

I am meeting the same problem as you did and I have not found any solutions yet.

Could you post some graphs to explain how you did fix it? I wanna know what your original chromosome numbers in .gtf look like and what the situation in tophat output? And finally, hwo did you modify it?

taozuo is offline   Reply With Quote
Old 10-25-2011, 07:42 AM   #5
Location: New Haven

Join Date: Jul 2008
Posts: 36

As ngsbee mentioned:

you must make consistence of chromosome name. Sometime, 1, 2,3,...22,X,Y, M used as chromosome name, but sometime chr1, chr2,chr3,...chrM were used as chromosome name. You must make them same in your gtf file and mapping file. BTW, you also need check chrM, because it is also used as MT sometime.
lmf_bill is offline   Reply With Quote
Old 10-25-2011, 08:39 AM   #6
Junior Member
Location: Michigan

Join Date: Aug 2011
Posts: 3

Sorry for the late reply.
Tao, like Imf_bill said, you must check your tophat output file (accepted_hits.bam) and check the chromosome format in your reference gtf. If they are not the same, then you have to modify one of the files (I prefer the gtf) and make sure the formats are matching.

For example, if your tophat output has the chromosome format : Chr1, Chr2,.., ChrMt, ChrUn and your gtf file has the chromosome format : 1,2,..., Mt,Un, then modify your gtf and make the chromosome formats: Chr1, Chr2,.., ChrMt, ChrUn.
This will ensure that cufflinks/cuffdif runs properly.

Another issue I faced is with colons ":" in the chromosome name.
With rice data, I did the formatting of the gtf files to match the tophat bam files. The tophat bam files had chromosome format: EG:1, EG:2, etc. So I made the gtf format the same. But I still wasn't getting any results.
Thank to this post in seqanswers:, I was able to figure it out.

I have summarized below the issues I faced and how to fix them :
1) If you are creating a Bowtie build from scratch:
* Please check the chromosome format in your fasta files
* Compare the chr format in the fasta file to the reference gtf file. If they are in the same format (eg: Chr1)
* Format Fasta file to match reference gtf format. Once you make the chromosome formats in the fasta and your gtf the same, you can proceed to create your build.
* Run bowtie-build.
* Making sure that the chromosome formats are uniform is a vital step to ensure that your accepted_hits.bam (tophat output) and reference gtf (required for running cuffdiff) are compatible. Only if they are compatible will you get cuffdiff result.
* Note: Currently the chr format issue is a silent bug. Cuffdiff doesn’t handle this issue nor does it generate an error or warning.

(2) If you already have a stable bowtie build (downloaded from bowtie website) and you have used it to run tophat:
* Check the chr format in your reference gtf file, and make sure you format your gtf to match the chr format in the accepted_hits.bam file
*This will ensure that your accepted_hits.bam and reference gtf are compatible and you will be able to run cuffdiff without any issues

(3) GFF3 issues.
* DO NOT USE GFF3 FORMAT TO CREATE reference GTFs to run cuffdiff
* If you use gff3, CUFFCOMPARE program truncates the long string from the gene annotation column, and gene IDs are lost. Hence, when you run cuffdiff, your output file won't have gene IDs.

(4) When you are working with “sequencing-in-progress" data:
*IT IS BEST TO USE A STABLE VERSION of GTFs and Fastas AVAILABLE FROM REFERENCE DATABASES (eg: Ensembl) instead of getting data from independent genome sequencing groups. More formatting issues are associated with these files and formats might change in-between versions.
ngsbee is offline   Reply With Quote

cuffcompare, cuffdiff, gtf, rna-seq;gene expression;

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

All times are GMT -8. The time now is 03:26 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO