![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
XLOC identifiers from cufflinks/cuffmerge/cuffdiff | AdamB | RNA Sequencing | 4 | 01-24-2016 07:12 PM |
cuffdiff output | kasutubh | Bioinformatics | 1 | 07-30-2013 06:46 PM |
CuffDiff output | Rachelly | Bioinformatics | 11 | 04-17-2012 08:04 PM |
q>1 in cuffdiff output | kasutubh | Bioinformatics | 1 | 04-03-2012 05:29 PM |
Cuffdiff output | sheenams | RNA Sequencing | 0 | 11-27-2011 03:41 PM |
![]() |
|
Thread Tools |
![]() |
#1 |
Junior Member
Location: California Join Date: Nov 2012
Posts: 7
|
![]()
This seems to be a common issue, but I think I have met all the criteria that people have said would solve it, and still am stuck. My Cuffdiff output files contain only Xloc identifiers. There aren't any gene names or the original ensembl ids. I provided an Ensembl GTF that contains the annotations with the -g command. The only bit of advice I didn't follow was using -G to prevent novel transcript location because that (along with alternate splicing) is the reason I am doing this analysis.
Here is my truncated pipeline: tophat --solexa1.3-quals --no-coverage-search -g 1 -G /u/home/mcdb/xf/GTF/mm9.ensembl.gtf -p 8 -o ./$k /u/home/mcdb/x/bowtie-0.12.8/indexes/temp/mm9 $k.fastq cufflinks -p 8 -g /u/home/mcdb/x/GTF/mm9.ensembl.gtf -o /u/home/mcdb/x/y/ output name here cuffmerge -p 8 -s /u/home/mcdb/x/bowtie-0.12.8/indexes/genome.fa /u/home/mcdb/x/y/assemblies.txt #assemblies has the transcripts.gtf paths from cufflinks in it cuffdiff -o diff_out -b /u/home/mcdb/x/bowtie-0.12.8/indexes/genome.fa -p 8 -L C,R -u /u/home/mcdb/x/y/merged_asm/merged.gtf sample 1 bam sample 2 bam paths #(not listed here because there were many) Here's an example from the mm9.ensembl.gtf (Sorry for the poor formatting) chr18 protein_coding exon 3122455 3123465 . - . gene_id "ENSMUSG00000091539"; transcript_id "ENSMUST00000165255"; exon_number "1"; gene_name "Vmn1r238"; gene_biotype "protein_coding"; transcript_name "Vmn1r238-201"; And here is an example from genes.fpkm_tracking: tracking_id class_code nearest_ref_id gene_id gene_short_name tss_id locus length coverage C_FPKM C_conf_lo C_conf_hi C_status R_FPKM R_conf_lo R_conf_hi R_status XLOC_000001 - - XLOC_000001 - TSS1 chr1:3044313-3044814 - - 0.592503 0 1.24438 OK 0.426783 0 0.890284 OK Is there something I have done wrong or a way for me to get Ensembl IDs or gene names into these output files? Thanks in advance |
![]() |
![]() |
![]() |
#2 |
Junior Member
Location: Dallas, Texas Join Date: Dec 2012
Posts: 4
|
![]()
I get this problem as well. Since my setup is nearly identical, I'm not going to write down all the details. Why does the use of Ensembl GTF files and the Ensembl genomes then lead to XLOC values in the Cuffdiff output?
|
![]() |
![]() |
![]() |
#3 |
Senior Member
Location: NC State, Raleigh, NC Join Date: Mar 2013
Posts: 107
|
![]()
What I do is use the fpkm_tracking file and the complete genome record of my organism to create a GeneBank Excel file. Then, with my exp.diff files, I write a visual basic script to insert a column for "gene" (locus_tag) and "gene_long_name". So what I'm saying is that the XLOC tag can be linked to gene names and Ensembl IDs, and rather than cutting and pasting, VBA in Excel will automatically do it, which avoids human error. You just have to check to make sure you wrote the script correctly.
|
![]() |
![]() |
![]() |
Thread Tools | |
|
|