SEQanswers

Go Back   SEQanswers > Applications Forums > RNA Sequencing



Similar Threads
Thread Thread Starter Forum Replies Last Post
Minimal FPKM values for analysis in Cufflinks AdamB Bioinformatics 4 07-24-2013 12:48 AM
Different FPKM values of cufflinks and cuffdiff mrfox Bioinformatics 5 10-17-2012 02:10 PM
Cufflinks and cuffdiff FPKM values combiochem Bioinformatics 12 10-14-2012 12:37 AM
cufflinks-1.0.3 produces very high FPKM values when compared to cufflinks-0.9.3. Why? pinki999 Bioinformatics 5 06-09-2012 07:48 AM
Cufflinks' computation of FPKM for --GTF and --GTF-guide estimation burt Bioinformatics 0 08-24-2011 12:59 AM

Reply
 
Thread Tools
Old 10-07-2011, 01:31 PM   #1
ngsbee
Junior Member
 
Location: Michigan

Join Date: Aug 2011
Posts: 3
Default cufflinks no FPKM values due to gtf issue

Hi,

I am facing an issue with getting cufflinks results with Arabidopsis. The problem seems to be the reference gtf.

When I ran cufflinks without a reference gtf, I got FPKM values for genes in the genes.fpkm_tracking file.

----------------------------------------------
tracking_id class_code nearest_ref_id gene_id gene_short_name tss_id locus length coverage FPKM FPKM_conf_lo FPKM_conf_hi FPKM_status

CUFF.1 - - CUFF.1 - - Chr1:3675-3911 - - 49.5395 33.4668 65.6123 OK

CUFF.2 - - CUFF.2 - - Chr1:3995-4272 - - 30.6876 20.7312 40.6439 OK

CUFF.3 - - CUFF.3 - - Chr1:4467-5098 - - 21.4671 17.3165 25.6177 OK
----------------------------------------------

But when I ran cufflinks with a reference gtf.
I am not getting any FPKM values.

----------------------------------------------
tracking_id class_code nearest_ref_id gene_id gene_short_name tss_id locus length coverage FPKM FPKM_conf_lo FPKM_conf_hi FPKM_status

XLOC_000001 - - XLOC_000001 ANAC001 TSS1 1:3630-5899 - - 0 0 0 OK

XLOC_003797 - - XLOC_003797 ARV1 TSS3983 1:5927-8737 - - 0 0 0 OK

XLOC_003798 - - XLOC_003798 NGA3 TSS3984 1:11648-13714 - - 0 0 0 OK
----------------------------------------------

I am aware that cufflinks/cufffdiff need a compatible reference genome format. So for both these cufflinks runs I used a compatible reference.gtf file, which was created as follows:
  1. Downloaded NCBI dataset for Arabidopsis from http://cufflinks.cbcb.umd.edu/igenomes.html
  2. took genes.gtf file
  3. created cuffcombined.gtf using this command: cuffcompare -s /ccmb/CoreBA/BioinfCore/Common/DATA/Cufflinks_Data/Arabidopsis/Arabidopsis_thaliana/NCBI/build9.1/Sequence/WholeGenomeFasta/genome.fa -CG -r ../arabidopsis_genes.gtf ../arabidopsis_genes.gtf
  4. Used cuffcmp.combined.gtf which was created as the reference.gtf

I also ran cufflinks using the ncbi genes.gtf (instead of using the cuffcmp.combined.gtf), but I still got no FPKM calculations in the result file.

I also ran cuffdiff with the cuffcmp.combined.gtf, and here again I did not get any FPKM values, and hence, I am getting a NOTEST.

Would grealty appreciate your help in figuring out what is the problem.

Thanks in advance,

Ash
ngsbee is offline   Reply With Quote
Old 10-10-2011, 10:54 AM   #2
unidodo
Junior Member
 
Location: USA

Join Date: Jul 2010
Posts: 8
Default

Please make sure your gtf file is compatible to the genome you aligned. Such as if chromosome name is same.
unidodo is offline   Reply With Quote
Old 10-10-2011, 11:19 AM   #3
ngsbee
Junior Member
 
Location: Michigan

Join Date: Aug 2011
Posts: 3
Default

Thanks.

The problem was the .gtf had just the chromosome numbers (eg: 1,2 etc) , while my tophat output files had the chromosome numbers as : Chr1, Chr2, etc.

Once I modified the gtf and made the chromosome numbers the same, cufflinks is running fine.
ngsbee is offline   Reply With Quote
Old 10-14-2011, 03:05 PM   #4
taozuo
Junior Member
 
Location: ames, IA

Join Date: Jul 2011
Posts: 8
Default

Hi Ash:

I am meeting the same problem as you did and I have not found any solutions yet.

Could you post some graphs to explain how you did fix it? I wanna know what your original chromosome numbers in .gtf look like and what the situation in tophat output? And finally, hwo did you modify it?

Thanks
Tao
taozuo is offline   Reply With Quote
Old 10-25-2011, 07:42 AM   #5
lmf_bill
Member
 
Location: New Haven

Join Date: Jul 2008
Posts: 36
Default

As ngsbee mentioned:

you must make consistence of chromosome name. Sometime, 1, 2,3,...22,X,Y, M used as chromosome name, but sometime chr1, chr2,chr3,...chrM were used as chromosome name. You must make them same in your gtf file and mapping file. BTW, you also need check chrM, because it is also used as MT sometime.
lmf_bill is offline   Reply With Quote
Old 10-25-2011, 08:39 AM   #6
ngsbee
Junior Member
 
Location: Michigan

Join Date: Aug 2011
Posts: 3
Default

Sorry for the late reply.
Tao, like Imf_bill said, you must check your tophat output file (accepted_hits.bam) and check the chromosome format in your reference gtf. If they are not the same, then you have to modify one of the files (I prefer the gtf) and make sure the formats are matching.

For example, if your tophat output has the chromosome format : Chr1, Chr2,.., ChrMt, ChrUn and your gtf file has the chromosome format : 1,2,..., Mt,Un, then modify your gtf and make the chromosome formats: Chr1, Chr2,.., ChrMt, ChrUn.
This will ensure that cufflinks/cuffdif runs properly.

Another issue I faced is with colons ":" in the chromosome name.
With rice data, I did the formatting of the gtf files to match the tophat bam files. The tophat bam files had chromosome format: EG:1, EG:2, etc. So I made the gtf format the same. But I still wasn't getting any results.
Thank to this post in seqanswers: http://seqanswers.com/forums/showthr...ghlight=colons, I was able to figure it out.

I have summarized below the issues I faced and how to fix them :
-------
1) If you are creating a Bowtie build from scratch:
* Please check the chromosome format in your fasta files
* IMPORTANT!! IF YOUR FASTA FILE HAS COLONS( IN IT (eg: rice ensembl fasta: >EG:1) YOU MUST REMOVE COLONS FROM YOUR FASTA FILE. CUFFDIFF WON’T RUN IF COLONS ARE PRESENT IN CHR NAME!!
* Compare the chr format in the fasta file to the reference gtf file. If they are in the same format (eg: Chr1)
* Format Fasta file to match reference gtf format. Once you make the chromosome formats in the fasta and your gtf the same, you can proceed to create your build.
* Run bowtie-build.
* Making sure that the chromosome formats are uniform is a vital step to ensure that your accepted_hits.bam (tophat output) and reference gtf (required for running cuffdiff) are compatible. Only if they are compatible will you get cuffdiff result.
* Note: Currently the chr format issue is a silent bug. Cuffdiff doesn’t handle this issue nor does it generate an error or warning.

(2) If you already have a stable bowtie build (downloaded from bowtie website) and you have used it to run tophat:
* Check the chr format in your reference gtf file, and make sure you format your gtf to match the chr format in the accepted_hits.bam file
*This will ensure that your accepted_hits.bam and reference gtf are compatible and you will be able to run cuffdiff without any issues

(3) GFF3 issues.
* DO NOT USE GFF3 FORMAT TO CREATE reference GTFs to run cuffdiff
* If you use gff3, CUFFCOMPARE program truncates the long string from the gene annotation column, and gene IDs are lost. Hence, when you run cuffdiff, your output file won't have gene IDs.

(4) When you are working with “sequencing-in-progress" data:
*IT IS BEST TO USE A STABLE VERSION of GTFs and Fastas AVAILABLE FROM REFERENCE DATABASES (eg: Ensembl) instead of getting data from independent genome sequencing groups. More formatting issues are associated with these files and formats might change in-between versions.
ngsbee is offline   Reply With Quote
Reply

Tags
cuffcompare, cuffdiff, gtf, rna-seq;gene expression;

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 10:05 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO