Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Tophat -> Cufflinks: BAM header too large

    I am getting this error message when running Cufflinks on a Tophat created BAM file. Tophat version 1.3.3 and Cufflinks version 1.1.0. Bowtie 0.12.7 and Samtools 0.1.18

    Tophat command:
    Code:
    /home/matthew/tophat-1.3.3/tophat -p 16 -r 195 -z pbzip2 --mate-std-dev 50 /media/hd2/tuco/bowtie.index/tuco7 \
    /media/hd2/tuco/unshuff/sgatrim/nocontam/MDM01_index1_qualshuff.nohomo.1.fq /media/hd2/tuco/unshuff/sgatrim/nocontam/MDM01_index1_qualshuff.nohomo.2.fq
    Cufflinks:
    Code:
    /home/matthew/cufflinks/cufflinks -p16 -u -o /media/hd2/tuco/tophat/406A/cuff \
    -b /media/hd2/tuco/bowtie.index/tuco.fa --upper-quartile-norm --max-mle-iterations 20000 \
    --num-importance-samples 10000 /media/hd2/tuco/tophat/406A/tophat_out/accepted_hits.bam
    Tophat finishes without error, but Cufflinks does not..

    You are using Cufflinks v1.1.0, which is the most recent release.
    Warning: BAM header too large
    File /media/hd2/tuco/tophat/406A/tophat_out/accepted_hits.bam doesn't appear to be a valid BAM file, trying SAM...
    [21:18:11] Inspecting reads and determining fragment length distribution.
    SAM error on line 25873: CIGAR op has zero length
    SAM error on line 26633: CIGAR op has zero length
    ...

  • #2
    one other thing I forgot to include above:

    1st few lines of the SAM file:

    Code:
    @HD	VN:1.0	SO:coordinate
    @SQ	SN:10000084.208.674	LN:208
    @SQ	SN:1000016.233.27383	LN:233
    @SQ	SN:10000164.283.623	LN:283
    @SQ	SN:10000188.1527.11468	LN:1527

    Comment


    • #3
      Lame... I am having the same issue and it looks like no one has responded to you. Have you figured this out yourself yet? I am wondering, are you also using this on a highly fragmented de-novo assembly with a few hundred thousand contigs/scaffolds? Maybe cufflinks doesn't work when the assembly has a large number of fragments?

      Comment


      • #4
        maybe figured this out

        Hello,
        I noticed that the same general question was posted on stack exchange and didn't have an answer there either. To summarize I modified the max header length variable in hits.cpp (line 731 in v1.3.0) to the following (was 4MB)

        Code:
        static const unsigned MAX_HEADER_LEN = 6 * 1024 * 1024; // 6 MB
        After changing that, the program appears to be proceeding normally.

        To see my full previous post on this go to the stack exchange site:



        good luck!
        Last edited by jstjohn; 01-04-2012, 11:13 AM. Reason: Modified the link to point to my answer on biostar rather than the question.

        Comment


        • #5
          Problems with warning "BAM header too large" using Cufflinks2 on Linux server

          Hi jstjohn,
          I am having a similar problem to yours when trying to run Cufflinks on my TopHat accepted_hits.bams output


          Here is the output of the log file:

          Command line:
          cufflinks -o /outfile_location -p 16 -g /gtf_file_location -v --no-update-check -u -b /ref_fasta_location --max-bundle-frags 1000000000 /accepted_hits.bam_location
          Warning: BAM header too large
          File accepted_hits.bam doesn't appear to be a valid BAM file, trying SAM...
          [17:10:06] Loading reference annotation.
          GFF warning: merging adjacent/overlapping segments of ENSOANT00000031404 on Contig9854 (16061-16163, 16168-16239)
          GFF warning: merging adjacent/overlapping segments of ENSOANT00000023588 on Contig9784 (7502-7816, 7821-8557)
          GFF warning: merging adjacent/overlapping segments of ENSOANT00000023588 on Contig9784 (8582-8714, 8717-8998)


          The genome I am using is very fragmented (i.e. contains 200,000 contigs on top of the Chr) and the BAM header is around 5.5 Mb. However, I read in the Cufflinks2 manual that: " The header size limit in Cufflinks' BAM parser used to have a 4 megabyte limit. This has been removed to allow Cufflinks to be used on assemblies with many contigs. "

          I have looked online for some help regarding this issue and some people have suggested changing the source code in the hits.cpp file (line 736 : static const unsigned MAX_HEADER_LEN = 4 * 1024 * 1024; // 4 MB) for Windows version, but there does not seem to be any equivalent file in the Linux version.

          Any help will be greatly appreciated.

          Comment


          • #6
            Hi,

            I have the same problem. I am using cufflinks v.2. Anyone found a solution to this?
            I don't have the skills to change the cufflinks source code unfortunately...

            Jon

            Comment


            • #7
              I guess would anyone have a Cufflinks 2 version that they compiled themselves from source code (and that is modified to allow for larger bam headers) that they would be willing to share. I would need one to run on Linux x86_64.

              Comment


              • #8
                "BAM header too large" problem/issue is caused by the genome file, which you used to make bowtie[12] index. To resolve the issue, clean up the genome file by removing all scaffold sequences that are not shown in your GTF file.

                Comment


                • #9
                  One alternative is to use the pseudochromosome to replace the fragmented scaffolds when run tophat/cufflinks/cuffdiff.
                  I don't know the possibility and whether there is influence for the following expression calculation and differential expression measurement.

                  Is it need a try?

                  Comment


                  • #10
                    I also had this problem on a shared machine where I couldn't recompile code. My transcriptome was pretty poorly assembled so filtering out low sequence reads got the header size to 4.1 MB. I was able to remove REGEX's in the fasta titles (like Genus_sp) of the headers with sed and it bumped the header size down to 3.9 MB. I was able to reheader the accepted_hits.bam file with the truncated titles and cufflinks ran it just fine...

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Current Approaches to Protein Sequencing
                      by seqadmin


                      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                      04-04-2024, 04:25 PM
                    • seqadmin
                      Strategies for Sequencing Challenging Samples
                      by seqadmin


                      Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                      03-22-2024, 06:39 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, 04-11-2024, 12:08 PM
                    0 responses
                    25 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 10:19 PM
                    0 responses
                    28 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 09:21 AM
                    0 responses
                    24 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-04-2024, 09:00 AM
                    0 responses
                    52 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X