![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Problems creating GTF for Cufflinks annotation | DrD2009 | Bioinformatics | 10 | 02-23-2015 07:20 AM |
cufflinks output against annotation file | masylichu | Bioinformatics | 1 | 09-19-2012 03:43 AM |
Run cufflinks with or without annotation? | reut | Bioinformatics | 7 | 06-01-2011 02:01 PM |
Cufflinks, Cuffdiff and annotation | chrisbala | RNA Sequencing | 8 | 04-05-2011 04:10 PM |
tophat/cufflinks for novel genome annotation | darked89 | Bioinformatics | 1 | 11-18-2010 07:53 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Member
Location: Milano Join Date: Aug 2011
Posts: 30
|
![]()
Hi,
I'm using Cufflinks and I have a great problem when I use the annotation file; In particular, if: 1) my input is only bam file (cufflinks_0.9.3.Linux_x86_64 -v -p 1 -Q 0 -I 300000 --library-type fr-unstranded --num-importance-samples 1000 --max-mle-iterations 5000 -a 0.01 -j 0.05 -F 0.05 --min-frags-per-transfrag 10 "accepted_hits.sorted.bam"), my output are genes.expr,transcripts.expr and transcript.gtf like hereunder: genes.expr gene_id bundle_id chr left right FPKM FPKM_conf_lo FPKM_conf_hi status CUFF.329 35842 chr1 11968237 11968315 923.039 862.276 983.802 OK CUFF.333 35844 chr1 11969913 11969985 1791.31 1706.66 1875.96 OK CUFF.631 35993 chr1 22973756 22974004 287.909 253.973 321.845 OK CUFF.661 36008 chr1 23696746 23696781 30806.2 30455.2 31157.3 OK CUFF.807 36081 chr1 28160911 28160947 87284.4 86693.5 87875.3 OK CUFF.835 36095 chr1 28833876 28834087 1740.4 1656.96 1823.83 OK transcripts.expr trans_id bundle_id chr left right FPKM FMI frac FPKM_conf_lo FPKM_conf_hi coverage length effective_length status CUFF.329.1 35842 chr1 11968237 11968315 923.039 1 1 862.276 983.802 6.29217 78 44 OK CUFF.333.1 35844 chr1 11969913 11969985 1791.31 1 1 1706.66 1875.96 12.211 72 38 OK CUFF.631.1 35993 chr1 22973756 22974004 287.909 1 1 253.973 321.845 1.96262 248 214 OK CUFF.661.1 36008 chr1 23696746 23696781 30806.2 1 1 30455.2 31157.3 210 35 1 OK CUFF.807.1 36081 chr1 28160911 28160947 87284.4 1 1 86693.5 87875.3 595 36 2 OK transcrips.gtf chr1 Cufflinks transcript 11968238 11968315 1000 . . gene_id "CUFF.329"; transcript_id "CUFF.329.1"; FPKM "923.0388234431"; frac "1.000000"; conf_lo "862.275715"; conf_hi "983.801931"; cov "6.292170"; chr1 Cufflinks exon 11968238 11968315 1000 . . gene_id "CUFF.329"; transcript_id "CUFF.329.1"; exon_number "1"; FPKM "923.0388234431"; frac "1.000000"; conf_lo "862.275715"; conf_hi "983.801931"; cov "6.292170"; chr1 Cufflinks transcript 11969914 11969985 1000 . . gene_id "CUFF.333"; transcript_id "CUFF.333.1"; FPKM "1791.3078160611"; frac "1.000000"; conf_lo "1706.660127"; conf_hi "1875.955505"; cov "12.210985"; chr1 Cufflinks exon 11969914 11969985 1000 . . gene_id "CUFF.333"; transcript_id "CUFF.333.1"; exon_number "1"; FPKM "1791.3078160611"; frac "1.000000"; conf_lo "1706.660127"; conf_hi "1875.955505"; cov "12.210985"; 2) my inputs are bam file and annotation file (cufflinks_0.9.3.Linux_x86_64 -v --GTF "genes.gtf" -p 1 -Q 0 -I 300000 --library-type fr-unstranded --num-importance-samples 1000 --max-mle-iterations 5000 -a 0.01 -j 0.05 -F 0.05 --min-frags-per-transfrag 10 "accepted_hits.sorted.bam"), my output are genes.expr,transcripts.expr and transcript.gtf like hereunder: genes.expr gene_id bundle_id chr left right FPKM FPKM_conf_lo FPKM_conf_hi status ENSG00000253101 32866 1 11868 14409 0 0 0 OK ENSG00000223972 32866 1 12009 13670 0 0 0 OK ENSG00000243485 32866 1 29553 31109 0 0 0 OK ENSG00000221311 32866 1 30365 30503 0 0 0 OK transcript.expr trans_id bundle_id chr left right FPKM FMI frac FPKM_conf_lo FPKM_conf_hi coverage length effective_length status ENST00000518655 32866 1 11868 14409 0 0 0 0 0 0 1657 1657 OK ENST00000450305 32866 1 12009 13670 0 0 0 0 0 0 632 632 OK ENST00000473358 32866 1 29553 31097 0 0 0 0 0 0 712 712 OK ENST00000469289 32866 1 30266 31109 0 0 0 0 0 0 535 535 OK ENST00000408384 32866 1 30365 30503 0 0 0 0 0 0 138 138 OK transcript.gtf 1 Cufflinks transcript 11869 14409 1 + . gene_id "ENSG00000253101"; transcript_id "ENST00000518655"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000"; 1 Cufflinks exon 11869 12227 1 + . gene_id "ENSG00000253101"; transcript_id "ENST00000518655"; exon_number "1"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000"; 1 Cufflinks exon 12613 12721 1 + . gene_id "ENSG00000253101"; transcript_id "ENST00000518655"; exon_number "2"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000"; 1 Cufflinks exon 13221 14409 1 + . gene_id "ENSG00000253101"; transcript_id "ENST00000518655"; exon_number "3"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000"; 1 Cufflinks transcript 12010 13670 1 + . gene_id "ENSG00000223972"; transcript_id "ENST00000450305"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000"; 1 Cufflinks exon 12010 12057 1 + . gene_id "ENSG00000223972"; transcript_id "ENST00000450305"; exon_number "1"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000"; 1 Cufflinks exon 12179 12227 1 + . gene_id "ENSG00000223972"; transcript_id "ENST00000450305"; exon_number "2"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000"; You can observe the difference (ok for gene and transcript ID. ![]() Any suggestion? Thanks a lot!!!!!!!! |
![]() |
![]() |
![]() |
#2 |
Senior Member
Location: Phoenix, AZ Join Date: Mar 2010
Posts: 279
|
![]()
What exactly is your problem?
|
![]() |
![]() |
![]() |
#3 |
Member
Location: Toronto Join Date: Nov 2010
Posts: 21
|
![]()
Hi Mattia,
It looks like you are using different chromosome names for the genome and for the genome annotations. The bowtie index you are using in TopHat has a "chr" prefix for the chromosomes (UCSC?), and the GTF file from EnsEMBL doesn't. It may be the reason why FPKM are not calculated correctly. You can try using fasta and gtf files from the same database (EnsEMBL for example) or changing the chromosome names in the gtf file or in the initial fasta file before building the index. Emilie |
![]() |
![]() |
![]() |
#4 |
Member
Location: Milano Join Date: Aug 2011
Posts: 30
|
![]()
Thanks Emilie; Tomorrow I'll download (maybe from http://cufflinks.cbcb.umd.edu/igenomes.html or do you suggest me more?) gtf and bowtie index for color space from the same source.
|
![]() |
![]() |
![]() |
#5 |
Member
Location: Milano Join Date: Aug 2011
Posts: 30
|
![]()
I downloaded from http://cufflinks.cbcb.umd.edu/igenomes.html Homo_Sapiens_Ensembl_GRCh37.tar.gz; in this file there are reference (fasta) and gtf file. Also using these, I have the same problem................ (for this test I used reads in fastq format,not in color space)
Last edited by mattia; 10-14-2011 at 05:36 AM. |
![]() |
![]() |
![]() |
#6 |
Senior Member
Location: University of Southern Denmark (SDU), Denmark Join Date: Apr 2009
Posts: 105
|
![]()
Did you check that the Homo_Sapiens_Ensembl_GRCh37.tar.gz contained a gtf file with "chr1,chr2,.." type identifiers instead of the "1,2,..." type?
You can convert it this way: Code:
cat genes.gtf | awk '{print "chr"$0}' | sed 's/chrMT/chrM/g' > genes.cufflinks.gtf |
![]() |
![]() |
![]() |
#7 |
Member
Location: Toronto Join Date: Nov 2010
Posts: 21
|
![]()
Hi Mattia
Did you re-run TopHat with the new bowtie index that you have downloaded before running Cufflinks? TopHat and Cufflinks need to be run using the same chromosome names. Emilie |
![]() |
![]() |
![]() |
Tags |
annotation, cufflinks, problem |
Thread Tools | |
|
|