SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Problems creating GTF for Cufflinks annotation DrD2009 Bioinformatics 10 02-23-2015 06:20 AM
cufflinks output against annotation file masylichu Bioinformatics 1 09-19-2012 02:43 AM
Run cufflinks with or without annotation? reut Bioinformatics 7 06-01-2011 01:01 PM
Cufflinks, Cuffdiff and annotation chrisbala RNA Sequencing 8 04-05-2011 03:10 PM
tophat/cufflinks for novel genome annotation darked89 Bioinformatics 1 11-18-2010 06:53 AM

Reply
 
Thread Tools
Old 10-12-2011, 02:43 AM   #1
mattia
Member
 
Location: Milano

Join Date: Aug 2011
Posts: 30
Default Cufflinks and annotation file

Hi,
I'm using Cufflinks and I have a great problem when I use the annotation file;
In particular, if:

1) my input is only bam file (cufflinks_0.9.3.Linux_x86_64 -v -p 1 -Q 0 -I 300000 --library-type fr-unstranded --num-importance-samples 1000 --max-mle-iterations 5000 -a 0.01 -j 0.05 -F 0.05 --min-frags-per-transfrag 10 "accepted_hits.sorted.bam"), my output are genes.expr,transcripts.expr and transcript.gtf like hereunder:

genes.expr

gene_id bundle_id chr left right FPKM FPKM_conf_lo FPKM_conf_hi status
CUFF.329 35842 chr1 11968237 11968315 923.039 862.276 983.802 OK
CUFF.333 35844 chr1 11969913 11969985 1791.31 1706.66 1875.96 OK
CUFF.631 35993 chr1 22973756 22974004 287.909 253.973 321.845 OK
CUFF.661 36008 chr1 23696746 23696781 30806.2 30455.2 31157.3 OK
CUFF.807 36081 chr1 28160911 28160947 87284.4 86693.5 87875.3 OK
CUFF.835 36095 chr1 28833876 28834087 1740.4 1656.96 1823.83 OK

transcripts.expr

trans_id bundle_id chr left right FPKM FMI frac FPKM_conf_lo FPKM_conf_hi coverage length effective_length status
CUFF.329.1 35842 chr1 11968237 11968315 923.039 1 1 862.276 983.802 6.29217 78 44 OK
CUFF.333.1 35844 chr1 11969913 11969985 1791.31 1 1 1706.66 1875.96 12.211 72 38 OK
CUFF.631.1 35993 chr1 22973756 22974004 287.909 1 1 253.973 321.845 1.96262 248 214 OK
CUFF.661.1 36008 chr1 23696746 23696781 30806.2 1 1 30455.2 31157.3 210 35 1 OK
CUFF.807.1 36081 chr1 28160911 28160947 87284.4 1 1 86693.5 87875.3 595 36 2 OK

transcrips.gtf

chr1 Cufflinks transcript 11968238 11968315 1000 . . gene_id "CUFF.329"; transcript_id "CUFF.329.1"; FPKM "923.0388234431"; frac "1.000000"; conf_lo "862.275715"; conf_hi "983.801931"; cov "6.292170";
chr1 Cufflinks exon 11968238 11968315 1000 . . gene_id "CUFF.329"; transcript_id "CUFF.329.1"; exon_number "1"; FPKM "923.0388234431"; frac "1.000000"; conf_lo "862.275715"; conf_hi "983.801931"; cov "6.292170";
chr1 Cufflinks transcript 11969914 11969985 1000 . . gene_id "CUFF.333"; transcript_id "CUFF.333.1"; FPKM "1791.3078160611"; frac "1.000000"; conf_lo "1706.660127"; conf_hi "1875.955505"; cov "12.210985";
chr1 Cufflinks exon 11969914 11969985 1000 . . gene_id "CUFF.333"; transcript_id "CUFF.333.1"; exon_number "1"; FPKM "1791.3078160611"; frac "1.000000"; conf_lo "1706.660127"; conf_hi "1875.955505"; cov "12.210985";

2) my inputs are bam file and annotation file (cufflinks_0.9.3.Linux_x86_64 -v --GTF "genes.gtf" -p 1 -Q 0 -I 300000 --library-type fr-unstranded --num-importance-samples 1000 --max-mle-iterations 5000 -a 0.01 -j 0.05 -F 0.05 --min-frags-per-transfrag 10 "accepted_hits.sorted.bam"), my output are genes.expr,transcripts.expr and transcript.gtf like hereunder:

genes.expr

gene_id bundle_id chr left right FPKM FPKM_conf_lo FPKM_conf_hi status
ENSG00000253101 32866 1 11868 14409 0 0 0 OK
ENSG00000223972 32866 1 12009 13670 0 0 0 OK
ENSG00000243485 32866 1 29553 31109 0 0 0 OK
ENSG00000221311 32866 1 30365 30503 0 0 0 OK

transcript.expr

trans_id bundle_id chr left right FPKM FMI frac FPKM_conf_lo FPKM_conf_hi coverage length effective_length status
ENST00000518655 32866 1 11868 14409 0 0 0 0 0 0 1657 1657 OK
ENST00000450305 32866 1 12009 13670 0 0 0 0 0 0 632 632 OK
ENST00000473358 32866 1 29553 31097 0 0 0 0 0 0 712 712 OK
ENST00000469289 32866 1 30266 31109 0 0 0 0 0 0 535 535 OK
ENST00000408384 32866 1 30365 30503 0 0 0 0 0 0 138 138 OK

transcript.gtf

1 Cufflinks transcript 11869 14409 1 + . gene_id "ENSG00000253101"; transcript_id "ENST00000518655"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
1 Cufflinks exon 11869 12227 1 + . gene_id "ENSG00000253101"; transcript_id "ENST00000518655"; exon_number "1"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
1 Cufflinks exon 12613 12721 1 + . gene_id "ENSG00000253101"; transcript_id "ENST00000518655"; exon_number "2"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
1 Cufflinks exon 13221 14409 1 + . gene_id "ENSG00000253101"; transcript_id "ENST00000518655"; exon_number "3"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
1 Cufflinks transcript 12010 13670 1 + . gene_id "ENSG00000223972"; transcript_id "ENST00000450305"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
1 Cufflinks exon 12010 12057 1 + . gene_id "ENSG00000223972"; transcript_id "ENST00000450305"; exon_number "1"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
1 Cufflinks exon 12179 12227 1 + . gene_id "ENSG00000223972"; transcript_id "ENST00000450305"; exon_number "2"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";

You can observe the difference (ok for gene and transcript ID.), expecially: FPKM, FMI, frac, FPKM_conf_lo, FPKM_conf_hi, coverage.

Any suggestion?
Thanks a lot!!!!!!!!
mattia is offline   Reply With Quote
Old 10-12-2011, 10:42 AM   #2
Jon_Keats
Senior Member
 
Location: Phoenix, AZ

Join Date: Mar 2010
Posts: 279
Default

What exactly is your problem?
Jon_Keats is offline   Reply With Quote
Old 10-13-2011, 06:04 AM   #3
Emilie
Member
 
Location: Toronto

Join Date: Nov 2010
Posts: 21
Default

Hi Mattia,

It looks like you are using different chromosome names for the genome and for the genome annotations. The bowtie index you are using in TopHat has a "chr" prefix for the chromosomes (UCSC?), and the GTF file from EnsEMBL doesn't. It may be the reason why FPKM are not calculated correctly. You can try using fasta and gtf files from the same database (EnsEMBL for example) or changing the chromosome names in the gtf file or in the initial fasta file before building the index.

Emilie
Emilie is offline   Reply With Quote
Old 10-13-2011, 06:26 AM   #4
mattia
Member
 
Location: Milano

Join Date: Aug 2011
Posts: 30
Default

Thanks Emilie; Tomorrow I'll download (maybe from http://cufflinks.cbcb.umd.edu/igenomes.html or do you suggest me more?) gtf and bowtie index for color space from the same source.
mattia is offline   Reply With Quote
Old 10-14-2011, 04:22 AM   #5
mattia
Member
 
Location: Milano

Join Date: Aug 2011
Posts: 30
Default

I downloaded from http://cufflinks.cbcb.umd.edu/igenomes.html Homo_Sapiens_Ensembl_GRCh37.tar.gz; in this file there are reference (fasta) and gtf file. Also using these, I have the same problem................ (for this test I used reads in fastq format,not in color space)

Last edited by mattia; 10-14-2011 at 04:36 AM.
mattia is offline   Reply With Quote
Old 10-14-2011, 04:41 AM   #6
Thomas Doktor
Senior Member
 
Location: University of Southern Denmark (SDU), Denmark

Join Date: Apr 2009
Posts: 105
Default

Did you check that the Homo_Sapiens_Ensembl_GRCh37.tar.gz contained a gtf file with "chr1,chr2,.." type identifiers instead of the "1,2,..." type?

You can convert it this way:
Code:
cat genes.gtf | awk '{print "chr"$0}' | sed 's/chrMT/chrM/g' > genes.cufflinks.gtf
Note that this will not convert the names of the "other" chromosomes (the "random" ones).
Thomas Doktor is offline   Reply With Quote
Old 10-14-2011, 05:34 AM   #7
Emilie
Member
 
Location: Toronto

Join Date: Nov 2010
Posts: 21
Default

Hi Mattia

Did you re-run TopHat with the new bowtie index that you have downloaded before running Cufflinks? TopHat and Cufflinks need to be run using the same chromosome names.

Emilie
Emilie is offline   Reply With Quote
Reply

Tags
annotation, cufflinks, problem

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 11:01 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO