SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
XLOC identifiers from cufflinks/cuffmerge/cuffdiff AdamB RNA Sequencing 4 01-24-2016 07:12 PM
cuffdiff output kasutubh Bioinformatics 1 07-30-2013 06:46 PM
CuffDiff output Rachelly Bioinformatics 11 04-17-2012 08:04 PM
q>1 in cuffdiff output kasutubh Bioinformatics 1 04-03-2012 05:29 PM
Cuffdiff output sheenams RNA Sequencing 0 11-27-2011 03:41 PM

Reply
 
Thread Tools
Old 12-03-2012, 02:55 PM   #1
Xinlitik
Junior Member
 
Location: California

Join Date: Nov 2012
Posts: 7
Default Output of Cuffdiff has only XLOC

This seems to be a common issue, but I think I have met all the criteria that people have said would solve it, and still am stuck. My Cuffdiff output files contain only Xloc identifiers. There aren't any gene names or the original ensembl ids. I provided an Ensembl GTF that contains the annotations with the -g command. The only bit of advice I didn't follow was using -G to prevent novel transcript location because that (along with alternate splicing) is the reason I am doing this analysis.

Here is my truncated pipeline:
tophat --solexa1.3-quals --no-coverage-search -g 1 -G /u/home/mcdb/xf/GTF/mm9.ensembl.gtf -p 8 -o ./$k /u/home/mcdb/x/bowtie-0.12.8/indexes/temp/mm9 $k.fastq

cufflinks -p 8 -g /u/home/mcdb/x/GTF/mm9.ensembl.gtf -o /u/home/mcdb/x/y/ output name here

cuffmerge -p 8 -s /u/home/mcdb/x/bowtie-0.12.8/indexes/genome.fa /u/home/mcdb/x/y/assemblies.txt #assemblies has the transcripts.gtf paths from cufflinks in it

cuffdiff -o diff_out -b /u/home/mcdb/x/bowtie-0.12.8/indexes/genome.fa -p 8 -L C,R -u /u/home/mcdb/x/y/merged_asm/merged.gtf sample 1 bam sample 2 bam paths #(not listed here because there were many)

Here's an example from the mm9.ensembl.gtf (Sorry for the poor formatting)
chr18 protein_coding exon 3122455 3123465 . - . gene_id "ENSMUSG00000091539"; transcript_id "ENSMUST00000165255"; exon_number "1"; gene_name "Vmn1r238"; gene_biotype "protein_coding"; transcript_name "Vmn1r238-201";

And here is an example from genes.fpkm_tracking:
tracking_id class_code nearest_ref_id gene_id gene_short_name tss_id locus length coverage C_FPKM C_conf_lo C_conf_hi C_status R_FPKM R_conf_lo R_conf_hi R_status

XLOC_000001 - - XLOC_000001 - TSS1 chr1:3044313-3044814 - - 0.592503 0 1.24438 OK 0.426783 0 0.890284 OK

Is there something I have done wrong or a way for me to get Ensembl IDs or gene names into these output files? Thanks in advance
Xinlitik is offline   Reply With Quote
Old 12-20-2012, 12:15 PM   #2
zimmernv
Junior Member
 
Location: Dallas, Texas

Join Date: Dec 2012
Posts: 4
Default

I get this problem as well. Since my setup is nearly identical, I'm not going to write down all the details. Why does the use of Ensembl GTF files and the Ensembl genomes then lead to XLOC values in the Cuffdiff output?
zimmernv is offline   Reply With Quote
Old 08-16-2013, 04:39 AM   #3
jmwhitha
Senior Member
 
Location: NC State, Raleigh, NC

Join Date: Mar 2013
Posts: 107
Default

What I do is use the fpkm_tracking file and the complete genome record of my organism to create a GeneBank Excel file. Then, with my exp.diff files, I write a visual basic script to insert a column for "gene" (locus_tag) and "gene_long_name". So what I'm saying is that the XLOC tag can be linked to gene names and Ensembl IDs, and rather than cutting and pasting, VBA in Excel will automatically do it, which avoids human error. You just have to check to make sure you wrote the script correctly.
jmwhitha is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 12:58 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO