Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Cuffmerge error: loading reference annotation failed?

    Hi,

    I am trying to run cuffmerge for quite some time now but so far I always get the same message:

    Code:
    [Thu Apr 19 18:41:47 2012] Beginning transcriptome assembly merge
    -------------------------------------------
    
    [Thu Apr 19 18:41:47 2012] Preparing output location ./merged_asm/
    [Thu Apr 19 18:41:55 2012] Converting GTF files to SAM
    [18:41:55] Loading reference annotation.
    [18:42:01] Loading reference annotation.
    [Thu Apr 19 18:42:13 2012] Quantitating transcripts
    You are using Cufflinks v1.3.0, which is the most recent release.
    Command line:
    cufflinks -o ./merged_asm/ -F 0.05 -g Homo_sapiens.GRCh37.62.filter1.sorted.gtf -q --overhang-tolerance 200 --library-type=transfrags -A 0.0 --min-frags-per-transfrag 0 --no-5-extend -p 12 ./merged_asm/tmp/mergeSam_file54ILK4 
    [bam_header_read] EOF marker is absent. The input is probably truncated.
    [bam_header_read] invalid BAM binary header (this is not a BAM file).
    File ./merged_asm/tmp/mergeSam_file54ILK4 doesn't appear to be a valid BAM file, trying SAM...
    [18:42:13] Loading reference annotation.
            [FAILED]
    Error: could not execute cufflinks
    So, reading in the forum here I applied two strategies to try to solve the error:

    I applied the change in the python script for the header_for_chrom_info function as mentioned by damiankao in another threat:
    Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc


    Furthermore, I sorted the reference GTF, so that they have all the same order as the merged SAM file.

    I followed so far the tuxedo protocol given by Cole Trapnell et al., my version is:

    cufflinks-1.3.0.

    I guess that I am still missing something but I wonder if there is a script now that works for reference gtf files ( In my case Homo_sapiens.GRCh37.62.gtf where I changed the Chromosome IDs to chr1, chr 2, ... chrM, chrX, chrY.

    Here is the header of the mergeSam file:

    Code:
    @HD     VN:1.0  SO:coordinate
    @SQ     SN:chr1 LN:     249231242
    @SQ     SN:chr10        LN:     135516024
    @SQ     SN:chr11        LN:     134945793
    @SQ     SN:chr12        LN:     133815135
    @SQ     SN:chr13        LN:     115099423
    @SQ     SN:chr14        LN:     107288019
    @SQ     SN:chr15        LN:     102519298
    @SQ     SN:chr16        LN:     90237391
    @SQ     SN:chr17        LN:     81188573
    @SQ     SN:chr18        LN:     78005429
    @SQ     SN:chr19        LN:     59111168
    @SQ     SN:chr2 LN:     243102304
    @SQ     SN:chr20        LN:     62959443
    @SQ     SN:chr21        LN:     48111157
    @SQ     SN:chr22        LN:     51239737
    @SQ     SN:chr3 LN:     197955247
    @SQ     SN:chr4 LN:     191013476
    @SQ     SN:chr5 LN:     180899431
    @SQ     SN:chr6 LN:     171055065
    @SQ     SN:chr7 LN:     159026067
    @SQ     SN:chr8 LN:     146281416
    @SQ     SN:chr9 LN:     141150148
    @SQ     SN:chrM LN:     16023
    @SQ     SN:chrX LN:     155257848
    @SQ     SN:chrY LN:     59002251
    @PG     ID:cuffmerge    VN:1.0.0
    So, I wonder if it is possible, given a bowtie index with chr1, chr2. etc. entries instead of redoing the bowtie index with the ensemble identifiers. Would be great if someone would know the missing link

    Marc

  • #2
    no luck with cuffdiff either

    So, when using cuffdiff and say :

    cuffdiff -N -o diff_out -p 1 -L C1,C2 -u Homo_sapiens.GRCh37.62.gtf ~/s_1_sequence_gtf_tophat_accepted_hits.bam ~/s_2_sequence_gtf_tophat_accepted_hits.bam

    I got the error:
    [11:29:30] Loading reference annotation.
    /opt/sge-6_2u4/default/spool/sunnode30/job_scripts/86547: line 12: 16032 Segmentation fault (core dumped) cuffdiff -N -o diff_out -p $NSLOTS -L C1,C2 -u Homo_sapiens.GRCh37.62.filter1.sorted3.gtf ~/s_1_sequence_gtf_tophat_accepted_hits.bam ~/s_2_sequence_gtf_tophat_accepted_hits.bam

    For some reasons, the problem always points down to my reference. Funny enough, when I leave out chrX chrY and chrM, it is running beyond the Loading reference step and probably cuffmerge would as well.
    Could anyone provide me a reference for human, which worked, given that your bowtie index is:
    bowtie-inspect hg19 --names
    chr1
    chr2
    chr3
    chr4
    chr5
    chr6
    chr7
    chr8
    chr9
    chr10
    chr11
    chr12
    chr13
    chr14
    chr15
    chr16
    chr17
    chr18
    chr19
    chr20
    chr21
    chr22
    chrX
    chrY
    chrM

    Marc

    Comment


    • #3
      ok, the cuttdiff part seems to run now for 2 sets, one having a gtf with only chrX chrY and chrM. The sorting with one gtf file was done according to the header file I got for cuffmerge but is not running? So what could go wrong...

      Given the ensembl reference, I first substitute the chromosome IDs so that they correspond to the bowtie-index. Then I sort out each chromosome:

      Code:
      for i in 1 10 11 12 13 14 15 16 17 18 19 2 20 21 22 3 4 5 6 7 8 9 M X Y; 
      do 
      grep "^chr$i[[:blank:]]" Homo_sapiens.GRCh37.62.filter1.sorted.gtf > chr${i}_Homo_sapiens.GRCh37.62.filter3.sorted.gtf;
      done
      and then I combine in the "right " order with cat.

      I am not sure yet how this could result into a corrupted gtf file.

      Comment


      • #4
        Seems like a very strange error. I've never had to resort GTF files in order to get them to work with Cufflinks/merge/diff. I'm surprised the GTF files is giving you this kind of trouble.

        I almost always use GTF files from UCSC that I edit in the official gene names to make the DE output more readable. Aside from swapping in gene names I don't change anything. I'm also working with bowtie references I built from FASTA files downloaded from UCSC.
        /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
        Salk Institute for Biological Studies, La Jolla, CA, USA */

        Comment


        • #5
          Running now with the newer Homo_sapiens.GRCh37.66.gtf and it is fine so far.
          I guess my old reference file was somehow corrupted.

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Essential Discoveries and Tools in Epitranscriptomics
            by seqadmin


            The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
            Yesterday, 07:01 AM
          • seqadmin
            Current Approaches to Protein Sequencing
            by seqadmin


            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
            04-04-2024, 04:25 PM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 04-11-2024, 12:08 PM
          0 responses
          54 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 10:19 PM
          0 responses
          50 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 09:21 AM
          0 responses
          44 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-04-2024, 09:00 AM
          0 responses
          55 views
          0 likes
          Last Post seqadmin  
          Working...
          X