Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • peromhc
    Senior Member
    • Sep 2009
    • 108

    Tophat -> Cufflinks: BAM header too large

    I am getting this error message when running Cufflinks on a Tophat created BAM file. Tophat version 1.3.3 and Cufflinks version 1.1.0. Bowtie 0.12.7 and Samtools 0.1.18

    Tophat command:
    Code:
    /home/matthew/tophat-1.3.3/tophat -p 16 -r 195 -z pbzip2 --mate-std-dev 50 /media/hd2/tuco/bowtie.index/tuco7 \
    /media/hd2/tuco/unshuff/sgatrim/nocontam/MDM01_index1_qualshuff.nohomo.1.fq /media/hd2/tuco/unshuff/sgatrim/nocontam/MDM01_index1_qualshuff.nohomo.2.fq
    Cufflinks:
    Code:
    /home/matthew/cufflinks/cufflinks -p16 -u -o /media/hd2/tuco/tophat/406A/cuff \
    -b /media/hd2/tuco/bowtie.index/tuco.fa --upper-quartile-norm --max-mle-iterations 20000 \
    --num-importance-samples 10000 /media/hd2/tuco/tophat/406A/tophat_out/accepted_hits.bam
    Tophat finishes without error, but Cufflinks does not..

    You are using Cufflinks v1.1.0, which is the most recent release.
    Warning: BAM header too large
    File /media/hd2/tuco/tophat/406A/tophat_out/accepted_hits.bam doesn't appear to be a valid BAM file, trying SAM...
    [21:18:11] Inspecting reads and determining fragment length distribution.
    SAM error on line 25873: CIGAR op has zero length
    SAM error on line 26633: CIGAR op has zero length
    ...
  • peromhc
    Senior Member
    • Sep 2009
    • 108

    #2
    one other thing I forgot to include above:

    1st few lines of the SAM file:

    Code:
    @HD	VN:1.0	SO:coordinate
    @SQ	SN:10000084.208.674	LN:208
    @SQ	SN:1000016.233.27383	LN:233
    @SQ	SN:10000164.283.623	LN:283
    @SQ	SN:10000188.1527.11468	LN:1527

    Comment

    • jstjohn
      Member
      • Jun 2010
      • 35

      #3
      Lame... I am having the same issue and it looks like no one has responded to you. Have you figured this out yourself yet? I am wondering, are you also using this on a highly fragmented de-novo assembly with a few hundred thousand contigs/scaffolds? Maybe cufflinks doesn't work when the assembly has a large number of fragments?

      Comment

      • jstjohn
        Member
        • Jun 2010
        • 35

        #4
        maybe figured this out

        Hello,
        I noticed that the same general question was posted on stack exchange and didn't have an answer there either. To summarize I modified the max header length variable in hits.cpp (line 731 in v1.3.0) to the following (was 4MB)

        Code:
        static const unsigned MAX_HEADER_LEN = 6 * 1024 * 1024; // 6 MB
        After changing that, the program appears to be proceeding normally.

        To see my full previous post on this go to the stack exchange site:



        good luck!
        Last edited by jstjohn; 01-04-2012, 11:13 AM. Reason: Modified the link to point to my answer on biostar rather than the question.

        Comment

        • ataraxia
          Junior Member
          • Feb 2013
          • 7

          #5
          Problems with warning "BAM header too large" using Cufflinks2 on Linux server

          Hi jstjohn,
          I am having a similar problem to yours when trying to run Cufflinks on my TopHat accepted_hits.bams output


          Here is the output of the log file:

          Command line:
          cufflinks -o /outfile_location -p 16 -g /gtf_file_location -v --no-update-check -u -b /ref_fasta_location --max-bundle-frags 1000000000 /accepted_hits.bam_location
          Warning: BAM header too large
          File accepted_hits.bam doesn't appear to be a valid BAM file, trying SAM...
          [17:10:06] Loading reference annotation.
          GFF warning: merging adjacent/overlapping segments of ENSOANT00000031404 on Contig9854 (16061-16163, 16168-16239)
          GFF warning: merging adjacent/overlapping segments of ENSOANT00000023588 on Contig9784 (7502-7816, 7821-8557)
          GFF warning: merging adjacent/overlapping segments of ENSOANT00000023588 on Contig9784 (8582-8714, 8717-8998)


          The genome I am using is very fragmented (i.e. contains 200,000 contigs on top of the Chr) and the BAM header is around 5.5 Mb. However, I read in the Cufflinks2 manual that: " The header size limit in Cufflinks' BAM parser used to have a 4 megabyte limit. This has been removed to allow Cufflinks to be used on assemblies with many contigs. "

          I have looked online for some help regarding this issue and some people have suggested changing the source code in the hits.cpp file (line 736 : static const unsigned MAX_HEADER_LEN = 4 * 1024 * 1024; // 4 MB) for Windows version, but there does not seem to be any equivalent file in the Linux version.

          Any help will be greatly appreciated.

          Comment

          • JonB
            Member
            • Jan 2010
            • 85

            #6
            Hi,

            I have the same problem. I am using cufflinks v.2. Anyone found a solution to this?
            I don't have the skills to change the cufflinks source code unfortunately...

            Jon

            Comment

            • ataraxia
              Junior Member
              • Feb 2013
              • 7

              #7
              I guess would anyone have a Cufflinks 2 version that they compiled themselves from source code (and that is modified to allow for larger bam headers) that they would be willing to share. I would need one to run on Linux x86_64.

              Comment

              • qwsqe
                Junior Member
                • Jun 2010
                • 4

                #8
                "BAM header too large" problem/issue is caused by the genome file, which you used to make bowtie[12] index. To resolve the issue, clean up the genome file by removing all scaffold sequences that are not shown in your GTF file.

                Comment

                • pengchy
                  Senior Member
                  • Feb 2009
                  • 116

                  #9
                  One alternative is to use the pseudochromosome to replace the fragmented scaffolds when run tophat/cufflinks/cuffdiff.
                  I don't know the possibility and whether there is influence for the following expression calculation and differential expression measurement.

                  Is it need a try?

                  Comment

                  • danjg
                    Junior Member
                    • Jun 2011
                    • 4

                    #10
                    I also had this problem on a shared machine where I couldn't recompile code. My transcriptome was pretty poorly assembled so filtering out low sequence reads got the header size to 4.1 MB. I was able to remove REGEX's in the fasta titles (like Genus_sp) of the headers with sed and it bumped the header size down to 3.9 MB. I was able to reheader the accepted_hits.bam file with the truncated titles and cufflinks ran it just fine...

                    Comment

                    Latest Articles

                    Collapse

                    • SEQadmin2
                      Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                      by SEQadmin2


                      I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.


                      Here are nine questions we think about, in roughly the order they matter, before...
                      06-18-2026, 07:11 AM
                    • SEQadmin2
                      From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                      by SEQadmin2


                      Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                      The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                      ...
                      06-02-2026, 10:05 AM
                    • SEQadmin2
                      Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                      by SEQadmin2


                      With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                      Introduction

                      Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                      05-22-2026, 06:42 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by SEQadmin2, 06-17-2026, 06:09 AM
                    0 responses
                    22 views
                    0 reactions
                    Last Post SEQadmin2  
                    Started by SEQadmin2, 06-09-2026, 11:58 AM
                    0 responses
                    40 views
                    0 reactions
                    Last Post SEQadmin2  
                    Started by SEQadmin2, 06-05-2026, 10:09 AM
                    0 responses
                    47 views
                    0 reactions
                    Last Post SEQadmin2  
                    Started by SEQadmin2, 06-04-2026, 08:59 AM
                    0 responses
                    49 views
                    0 reactions
                    Last Post SEQadmin2  
                    Working...