Hi,
I'm using Cufflinks and I have a great problem when I use the annotation file;
In particular, if:
1) my input is only bam file (cufflinks_0.9.3.Linux_x86_64 -v -p 1 -Q 0 -I 300000 --library-type fr-unstranded --num-importance-samples 1000 --max-mle-iterations 5000 -a 0.01 -j 0.05 -F 0.05 --min-frags-per-transfrag 10 "accepted_hits.sorted.bam"), my output are genes.expr,transcripts.expr and transcript.gtf like hereunder:
genes.expr
gene_id bundle_id chr left right FPKM FPKM_conf_lo FPKM_conf_hi status
CUFF.329 35842 chr1 11968237 11968315 923.039 862.276 983.802 OK
CUFF.333 35844 chr1 11969913 11969985 1791.31 1706.66 1875.96 OK
CUFF.631 35993 chr1 22973756 22974004 287.909 253.973 321.845 OK
CUFF.661 36008 chr1 23696746 23696781 30806.2 30455.2 31157.3 OK
CUFF.807 36081 chr1 28160911 28160947 87284.4 86693.5 87875.3 OK
CUFF.835 36095 chr1 28833876 28834087 1740.4 1656.96 1823.83 OK
transcripts.expr
trans_id bundle_id chr left right FPKM FMI frac FPKM_conf_lo FPKM_conf_hi coverage length effective_length status
CUFF.329.1 35842 chr1 11968237 11968315 923.039 1 1 862.276 983.802 6.29217 78 44 OK
CUFF.333.1 35844 chr1 11969913 11969985 1791.31 1 1 1706.66 1875.96 12.211 72 38 OK
CUFF.631.1 35993 chr1 22973756 22974004 287.909 1 1 253.973 321.845 1.96262 248 214 OK
CUFF.661.1 36008 chr1 23696746 23696781 30806.2 1 1 30455.2 31157.3 210 35 1 OK
CUFF.807.1 36081 chr1 28160911 28160947 87284.4 1 1 86693.5 87875.3 595 36 2 OK
transcrips.gtf
chr1 Cufflinks transcript 11968238 11968315 1000 . . gene_id "CUFF.329"; transcript_id "CUFF.329.1"; FPKM "923.0388234431"; frac "1.000000"; conf_lo "862.275715"; conf_hi "983.801931"; cov "6.292170";
chr1 Cufflinks exon 11968238 11968315 1000 . . gene_id "CUFF.329"; transcript_id "CUFF.329.1"; exon_number "1"; FPKM "923.0388234431"; frac "1.000000"; conf_lo "862.275715"; conf_hi "983.801931"; cov "6.292170";
chr1 Cufflinks transcript 11969914 11969985 1000 . . gene_id "CUFF.333"; transcript_id "CUFF.333.1"; FPKM "1791.3078160611"; frac "1.000000"; conf_lo "1706.660127"; conf_hi "1875.955505"; cov "12.210985";
chr1 Cufflinks exon 11969914 11969985 1000 . . gene_id "CUFF.333"; transcript_id "CUFF.333.1"; exon_number "1"; FPKM "1791.3078160611"; frac "1.000000"; conf_lo "1706.660127"; conf_hi "1875.955505"; cov "12.210985";
2) my inputs are bam file and annotation file (cufflinks_0.9.3.Linux_x86_64 -v --GTF "genes.gtf" -p 1 -Q 0 -I 300000 --library-type fr-unstranded --num-importance-samples 1000 --max-mle-iterations 5000 -a 0.01 -j 0.05 -F 0.05 --min-frags-per-transfrag 10 "accepted_hits.sorted.bam"), my output are genes.expr,transcripts.expr and transcript.gtf like hereunder:
genes.expr
gene_id bundle_id chr left right FPKM FPKM_conf_lo FPKM_conf_hi status
ENSG00000253101 32866 1 11868 14409 0 0 0 OK
ENSG00000223972 32866 1 12009 13670 0 0 0 OK
ENSG00000243485 32866 1 29553 31109 0 0 0 OK
ENSG00000221311 32866 1 30365 30503 0 0 0 OK
transcript.expr
trans_id bundle_id chr left right FPKM FMI frac FPKM_conf_lo FPKM_conf_hi coverage length effective_length status
ENST00000518655 32866 1 11868 14409 0 0 0 0 0 0 1657 1657 OK
ENST00000450305 32866 1 12009 13670 0 0 0 0 0 0 632 632 OK
ENST00000473358 32866 1 29553 31097 0 0 0 0 0 0 712 712 OK
ENST00000469289 32866 1 30266 31109 0 0 0 0 0 0 535 535 OK
ENST00000408384 32866 1 30365 30503 0 0 0 0 0 0 138 138 OK
transcript.gtf
1 Cufflinks transcript 11869 14409 1 + . gene_id "ENSG00000253101"; transcript_id "ENST00000518655"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
1 Cufflinks exon 11869 12227 1 + . gene_id "ENSG00000253101"; transcript_id "ENST00000518655"; exon_number "1"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
1 Cufflinks exon 12613 12721 1 + . gene_id "ENSG00000253101"; transcript_id "ENST00000518655"; exon_number "2"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
1 Cufflinks exon 13221 14409 1 + . gene_id "ENSG00000253101"; transcript_id "ENST00000518655"; exon_number "3"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
1 Cufflinks transcript 12010 13670 1 + . gene_id "ENSG00000223972"; transcript_id "ENST00000450305"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
1 Cufflinks exon 12010 12057 1 + . gene_id "ENSG00000223972"; transcript_id "ENST00000450305"; exon_number "1"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
1 Cufflinks exon 12179 12227 1 + . gene_id "ENSG00000223972"; transcript_id "ENST00000450305"; exon_number "2"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
You can observe the difference (ok for gene and transcript ID.), expecially: FPKM, FMI, frac, FPKM_conf_lo, FPKM_conf_hi, coverage.
Any suggestion?
Thanks a lot!!!!!!!!
I'm using Cufflinks and I have a great problem when I use the annotation file;
In particular, if:
1) my input is only bam file (cufflinks_0.9.3.Linux_x86_64 -v -p 1 -Q 0 -I 300000 --library-type fr-unstranded --num-importance-samples 1000 --max-mle-iterations 5000 -a 0.01 -j 0.05 -F 0.05 --min-frags-per-transfrag 10 "accepted_hits.sorted.bam"), my output are genes.expr,transcripts.expr and transcript.gtf like hereunder:
genes.expr
gene_id bundle_id chr left right FPKM FPKM_conf_lo FPKM_conf_hi status
CUFF.329 35842 chr1 11968237 11968315 923.039 862.276 983.802 OK
CUFF.333 35844 chr1 11969913 11969985 1791.31 1706.66 1875.96 OK
CUFF.631 35993 chr1 22973756 22974004 287.909 253.973 321.845 OK
CUFF.661 36008 chr1 23696746 23696781 30806.2 30455.2 31157.3 OK
CUFF.807 36081 chr1 28160911 28160947 87284.4 86693.5 87875.3 OK
CUFF.835 36095 chr1 28833876 28834087 1740.4 1656.96 1823.83 OK
transcripts.expr
trans_id bundle_id chr left right FPKM FMI frac FPKM_conf_lo FPKM_conf_hi coverage length effective_length status
CUFF.329.1 35842 chr1 11968237 11968315 923.039 1 1 862.276 983.802 6.29217 78 44 OK
CUFF.333.1 35844 chr1 11969913 11969985 1791.31 1 1 1706.66 1875.96 12.211 72 38 OK
CUFF.631.1 35993 chr1 22973756 22974004 287.909 1 1 253.973 321.845 1.96262 248 214 OK
CUFF.661.1 36008 chr1 23696746 23696781 30806.2 1 1 30455.2 31157.3 210 35 1 OK
CUFF.807.1 36081 chr1 28160911 28160947 87284.4 1 1 86693.5 87875.3 595 36 2 OK
transcrips.gtf
chr1 Cufflinks transcript 11968238 11968315 1000 . . gene_id "CUFF.329"; transcript_id "CUFF.329.1"; FPKM "923.0388234431"; frac "1.000000"; conf_lo "862.275715"; conf_hi "983.801931"; cov "6.292170";
chr1 Cufflinks exon 11968238 11968315 1000 . . gene_id "CUFF.329"; transcript_id "CUFF.329.1"; exon_number "1"; FPKM "923.0388234431"; frac "1.000000"; conf_lo "862.275715"; conf_hi "983.801931"; cov "6.292170";
chr1 Cufflinks transcript 11969914 11969985 1000 . . gene_id "CUFF.333"; transcript_id "CUFF.333.1"; FPKM "1791.3078160611"; frac "1.000000"; conf_lo "1706.660127"; conf_hi "1875.955505"; cov "12.210985";
chr1 Cufflinks exon 11969914 11969985 1000 . . gene_id "CUFF.333"; transcript_id "CUFF.333.1"; exon_number "1"; FPKM "1791.3078160611"; frac "1.000000"; conf_lo "1706.660127"; conf_hi "1875.955505"; cov "12.210985";
2) my inputs are bam file and annotation file (cufflinks_0.9.3.Linux_x86_64 -v --GTF "genes.gtf" -p 1 -Q 0 -I 300000 --library-type fr-unstranded --num-importance-samples 1000 --max-mle-iterations 5000 -a 0.01 -j 0.05 -F 0.05 --min-frags-per-transfrag 10 "accepted_hits.sorted.bam"), my output are genes.expr,transcripts.expr and transcript.gtf like hereunder:
genes.expr
gene_id bundle_id chr left right FPKM FPKM_conf_lo FPKM_conf_hi status
ENSG00000253101 32866 1 11868 14409 0 0 0 OK
ENSG00000223972 32866 1 12009 13670 0 0 0 OK
ENSG00000243485 32866 1 29553 31109 0 0 0 OK
ENSG00000221311 32866 1 30365 30503 0 0 0 OK
transcript.expr
trans_id bundle_id chr left right FPKM FMI frac FPKM_conf_lo FPKM_conf_hi coverage length effective_length status
ENST00000518655 32866 1 11868 14409 0 0 0 0 0 0 1657 1657 OK
ENST00000450305 32866 1 12009 13670 0 0 0 0 0 0 632 632 OK
ENST00000473358 32866 1 29553 31097 0 0 0 0 0 0 712 712 OK
ENST00000469289 32866 1 30266 31109 0 0 0 0 0 0 535 535 OK
ENST00000408384 32866 1 30365 30503 0 0 0 0 0 0 138 138 OK
transcript.gtf
1 Cufflinks transcript 11869 14409 1 + . gene_id "ENSG00000253101"; transcript_id "ENST00000518655"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
1 Cufflinks exon 11869 12227 1 + . gene_id "ENSG00000253101"; transcript_id "ENST00000518655"; exon_number "1"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
1 Cufflinks exon 12613 12721 1 + . gene_id "ENSG00000253101"; transcript_id "ENST00000518655"; exon_number "2"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
1 Cufflinks exon 13221 14409 1 + . gene_id "ENSG00000253101"; transcript_id "ENST00000518655"; exon_number "3"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
1 Cufflinks transcript 12010 13670 1 + . gene_id "ENSG00000223972"; transcript_id "ENST00000450305"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
1 Cufflinks exon 12010 12057 1 + . gene_id "ENSG00000223972"; transcript_id "ENST00000450305"; exon_number "1"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
1 Cufflinks exon 12179 12227 1 + . gene_id "ENSG00000223972"; transcript_id "ENST00000450305"; exon_number "2"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
You can observe the difference (ok for gene and transcript ID.), expecially: FPKM, FMI, frac, FPKM_conf_lo, FPKM_conf_hi, coverage.
Any suggestion?
Thanks a lot!!!!!!!!
Comment