Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Output of Cuffdiff has only XLOC

    This seems to be a common issue, but I think I have met all the criteria that people have said would solve it, and still am stuck. My Cuffdiff output files contain only Xloc identifiers. There aren't any gene names or the original ensembl ids. I provided an Ensembl GTF that contains the annotations with the -g command. The only bit of advice I didn't follow was using -G to prevent novel transcript location because that (along with alternate splicing) is the reason I am doing this analysis.

    Here is my truncated pipeline:
    tophat --solexa1.3-quals --no-coverage-search -g 1 -G /u/home/mcdb/xf/GTF/mm9.ensembl.gtf -p 8 -o ./$k /u/home/mcdb/x/bowtie-0.12.8/indexes/temp/mm9 $k.fastq

    cufflinks -p 8 -g /u/home/mcdb/x/GTF/mm9.ensembl.gtf -o /u/home/mcdb/x/y/ output name here

    cuffmerge -p 8 -s /u/home/mcdb/x/bowtie-0.12.8/indexes/genome.fa /u/home/mcdb/x/y/assemblies.txt #assemblies has the transcripts.gtf paths from cufflinks in it

    cuffdiff -o diff_out -b /u/home/mcdb/x/bowtie-0.12.8/indexes/genome.fa -p 8 -L C,R -u /u/home/mcdb/x/y/merged_asm/merged.gtf sample 1 bam sample 2 bam paths #(not listed here because there were many)

    Here's an example from the mm9.ensembl.gtf (Sorry for the poor formatting)
    chr18 protein_coding exon 3122455 3123465 . - . gene_id "ENSMUSG00000091539"; transcript_id "ENSMUST00000165255"; exon_number "1"; gene_name "Vmn1r238"; gene_biotype "protein_coding"; transcript_name "Vmn1r238-201";

    And here is an example from genes.fpkm_tracking:
    tracking_id class_code nearest_ref_id gene_id gene_short_name tss_id locus length coverage C_FPKM C_conf_lo C_conf_hi C_status R_FPKM R_conf_lo R_conf_hi R_status

    XLOC_000001 - - XLOC_000001 - TSS1 chr1:3044313-3044814 - - 0.592503 0 1.24438 OK 0.426783 0 0.890284 OK

    Is there something I have done wrong or a way for me to get Ensembl IDs or gene names into these output files? Thanks in advance

  • #2
    I get this problem as well. Since my setup is nearly identical, I'm not going to write down all the details. Why does the use of Ensembl GTF files and the Ensembl genomes then lead to XLOC values in the Cuffdiff output?

    Comment


    • #3
      What I do is use the fpkm_tracking file and the complete genome record of my organism to create a GeneBank Excel file. Then, with my exp.diff files, I write a visual basic script to insert a column for "gene" (locus_tag) and "gene_long_name". So what I'm saying is that the XLOC tag can be linked to gene names and Ensembl IDs, and rather than cutting and pasting, VBA in Excel will automatically do it, which avoids human error. You just have to check to make sure you wrote the script correctly.

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Essential Discoveries and Tools in Epitranscriptomics
        by seqadmin




        The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
        04-22-2024, 07:01 AM
      • seqadmin
        Current Approaches to Protein Sequencing
        by seqadmin


        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
        04-04-2024, 04:25 PM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, Yesterday, 08:47 AM
      0 responses
      14 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-11-2024, 12:08 PM
      0 responses
      60 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 10:19 PM
      0 responses
      60 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 09:21 AM
      0 responses
      54 views
      0 likes
      Last Post seqadmin  
      Working...
      X