This seems to be a common issue, but I think I have met all the criteria that people have said would solve it, and still am stuck. My Cuffdiff output files contain only Xloc identifiers. There aren't any gene names or the original ensembl ids. I provided an Ensembl GTF that contains the annotations with the -g command. The only bit of advice I didn't follow was using -G to prevent novel transcript location because that (along with alternate splicing) is the reason I am doing this analysis.
Here is my truncated pipeline:
tophat --solexa1.3-quals --no-coverage-search -g 1 -G /u/home/mcdb/xf/GTF/mm9.ensembl.gtf -p 8 -o ./$k /u/home/mcdb/x/bowtie-0.12.8/indexes/temp/mm9 $k.fastq
cufflinks -p 8 -g /u/home/mcdb/x/GTF/mm9.ensembl.gtf -o /u/home/mcdb/x/y/ output name here
cuffmerge -p 8 -s /u/home/mcdb/x/bowtie-0.12.8/indexes/genome.fa /u/home/mcdb/x/y/assemblies.txt #assemblies has the transcripts.gtf paths from cufflinks in it
cuffdiff -o diff_out -b /u/home/mcdb/x/bowtie-0.12.8/indexes/genome.fa -p 8 -L C,R -u /u/home/mcdb/x/y/merged_asm/merged.gtf sample 1 bam sample 2 bam paths #(not listed here because there were many)
Here's an example from the mm9.ensembl.gtf (Sorry for the poor formatting)
chr18 protein_coding exon 3122455 3123465 . - . gene_id "ENSMUSG00000091539"; transcript_id "ENSMUST00000165255"; exon_number "1"; gene_name "Vmn1r238"; gene_biotype "protein_coding"; transcript_name "Vmn1r238-201";
And here is an example from genes.fpkm_tracking:
tracking_id class_code nearest_ref_id gene_id gene_short_name tss_id locus length coverage C_FPKM C_conf_lo C_conf_hi C_status R_FPKM R_conf_lo R_conf_hi R_status
XLOC_000001 - - XLOC_000001 - TSS1 chr1:3044313-3044814 - - 0.592503 0 1.24438 OK 0.426783 0 0.890284 OK
Is there something I have done wrong or a way for me to get Ensembl IDs or gene names into these output files? Thanks in advance
Here is my truncated pipeline:
tophat --solexa1.3-quals --no-coverage-search -g 1 -G /u/home/mcdb/xf/GTF/mm9.ensembl.gtf -p 8 -o ./$k /u/home/mcdb/x/bowtie-0.12.8/indexes/temp/mm9 $k.fastq
cufflinks -p 8 -g /u/home/mcdb/x/GTF/mm9.ensembl.gtf -o /u/home/mcdb/x/y/ output name here
cuffmerge -p 8 -s /u/home/mcdb/x/bowtie-0.12.8/indexes/genome.fa /u/home/mcdb/x/y/assemblies.txt #assemblies has the transcripts.gtf paths from cufflinks in it
cuffdiff -o diff_out -b /u/home/mcdb/x/bowtie-0.12.8/indexes/genome.fa -p 8 -L C,R -u /u/home/mcdb/x/y/merged_asm/merged.gtf sample 1 bam sample 2 bam paths #(not listed here because there were many)
Here's an example from the mm9.ensembl.gtf (Sorry for the poor formatting)
chr18 protein_coding exon 3122455 3123465 . - . gene_id "ENSMUSG00000091539"; transcript_id "ENSMUST00000165255"; exon_number "1"; gene_name "Vmn1r238"; gene_biotype "protein_coding"; transcript_name "Vmn1r238-201";
And here is an example from genes.fpkm_tracking:
tracking_id class_code nearest_ref_id gene_id gene_short_name tss_id locus length coverage C_FPKM C_conf_lo C_conf_hi C_status R_FPKM R_conf_lo R_conf_hi R_status
XLOC_000001 - - XLOC_000001 - TSS1 chr1:3044313-3044814 - - 0.592503 0 1.24438 OK 0.426783 0 0.890284 OK
Is there something I have done wrong or a way for me to get Ensembl IDs or gene names into these output files? Thanks in advance
Comment