Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Output of Cuffdiff has only XLOC

    This seems to be a common issue, but I think I have met all the criteria that people have said would solve it, and still am stuck. My Cuffdiff output files contain only Xloc identifiers. There aren't any gene names or the original ensembl ids. I provided an Ensembl GTF that contains the annotations with the -g command. The only bit of advice I didn't follow was using -G to prevent novel transcript location because that (along with alternate splicing) is the reason I am doing this analysis.

    Here is my truncated pipeline:
    tophat --solexa1.3-quals --no-coverage-search -g 1 -G /u/home/mcdb/xf/GTF/mm9.ensembl.gtf -p 8 -o ./$k /u/home/mcdb/x/bowtie-0.12.8/indexes/temp/mm9 $k.fastq

    cufflinks -p 8 -g /u/home/mcdb/x/GTF/mm9.ensembl.gtf -o /u/home/mcdb/x/y/ output name here

    cuffmerge -p 8 -s /u/home/mcdb/x/bowtie-0.12.8/indexes/genome.fa /u/home/mcdb/x/y/assemblies.txt #assemblies has the transcripts.gtf paths from cufflinks in it

    cuffdiff -o diff_out -b /u/home/mcdb/x/bowtie-0.12.8/indexes/genome.fa -p 8 -L C,R -u /u/home/mcdb/x/y/merged_asm/merged.gtf sample 1 bam sample 2 bam paths #(not listed here because there were many)

    Here's an example from the mm9.ensembl.gtf (Sorry for the poor formatting)
    chr18 protein_coding exon 3122455 3123465 . - . gene_id "ENSMUSG00000091539"; transcript_id "ENSMUST00000165255"; exon_number "1"; gene_name "Vmn1r238"; gene_biotype "protein_coding"; transcript_name "Vmn1r238-201";

    And here is an example from genes.fpkm_tracking:
    tracking_id class_code nearest_ref_id gene_id gene_short_name tss_id locus length coverage C_FPKM C_conf_lo C_conf_hi C_status R_FPKM R_conf_lo R_conf_hi R_status

    XLOC_000001 - - XLOC_000001 - TSS1 chr1:3044313-3044814 - - 0.592503 0 1.24438 OK 0.426783 0 0.890284 OK

    Is there something I have done wrong or a way for me to get Ensembl IDs or gene names into these output files? Thanks in advance

  • #2
    I get this problem as well. Since my setup is nearly identical, I'm not going to write down all the details. Why does the use of Ensembl GTF files and the Ensembl genomes then lead to XLOC values in the Cuffdiff output?

    Comment


    • #3
      What I do is use the fpkm_tracking file and the complete genome record of my organism to create a GeneBank Excel file. Then, with my exp.diff files, I write a visual basic script to insert a column for "gene" (locus_tag) and "gene_long_name". So what I'm saying is that the XLOC tag can be linked to gene names and Ensembl IDs, and rather than cutting and pasting, VBA in Excel will automatically do it, which avoids human error. You just have to check to make sure you wrote the script correctly.

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Recent Innovations in Spatial Biology
        by seqadmin


        Spatial biology is an exciting field that encompasses a wide range of techniques and technologies aimed at mapping the organization and interactions of various biomolecules in their native environments. As this area of research progresses, new tools and methodologies are being introduced, accompanied by efforts to establish benchmarking standards and drive technological innovation.

        3D Genomics
        While spatial biology often involves studying proteins and RNAs in their...
        01-01-2025, 07:30 PM
      • seqadmin
        Advancing Precision Medicine for Rare Diseases in Children
        by seqadmin




        Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
        12-16-2024, 07:57 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, 01-03-2025, 11:18 AM
      1 response
      31 views
      1 like
      Last Post Tonia
      by Tonia
       
      Started by seqadmin, 12-30-2024, 01:35 PM
      0 responses
      39 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 12-17-2024, 10:28 AM
      0 responses
      45 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 12-13-2024, 08:24 AM
      0 responses
      58 views
      0 likes
      Last Post seqadmin  
      Working...
      X