Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Tophat segment junction error 1, invalid BAM binary header

    **Note: This has been resolved-- please see post 12 in this thread for the update.

    Hello folks,

    Nearly at my wits end with tophat. I've gone through this twice now, using Bowtie once and Bowtie2 once. After days of mapping reads and read segments, I get the error below when it comes to "Searching for junctions via segment mapping":

    [2012-06-27 13:47:45] Beginning TopHat run (v2.0.3)
    -----------------------------------------------
    [2012-06-27 13:47:45] Checking for Bowtie
    Bowtie version: 2.0.0.6
    [2012-06-27 13:47:45] Checking for Samtools
    Samtools version: 0.1.18.0
    [2012-06-27 13:47:45] Checking for Bowtie index files
    [2012-06-27 13:47:45] Checking for reference FASTA file
    [2012-06-27 13:47:45] Generating SAM header for /work/jeremy/BowtieIndex/Mus_musculus/NCBI/build37.2/Sequence/Bowtie2Index/genome

    format: fastq
    quality scale: phred33 (default)
    [2012-06-27 13:48:39] Preparing reads
    left reads: min. length=40, max. length=100, 151335054 kept reads (1380 discarded)
    right reads: min. length=40, max. length=100, 151333005 kept reads (3429 discarded)
    [2012-06-27 16:54:38] Mapping left_kept_reads to genome genome with Bowtie2
    [2012-06-28 10:53:22] Mapping left_kept_reads_seg1 to genome genome with Bowtie2 (1/4)
    [2012-06-28 16:06:36] Mapping left_kept_reads_seg2 to genome genome with Bowtie2 (2/4)
    [2012-06-28 20:40:10] Mapping left_kept_reads_seg3 to genome genome with Bowtie2 (3/4)
    [2012-06-29 02:13:31] Mapping left_kept_reads_seg4 to genome genome with Bowtie2 (4/4)
    [2012-06-29 07:53:33] Mapping right_kept_reads to genome genome with Bowtie2
    [2012-06-30 02:23:29] Mapping right_kept_reads_seg1 to genome genome with Bowtie2 (1/4)
    [2012-06-30 07:03:59] Mapping right_kept_reads_seg2 to genome genome with Bowtie2 (2/4)
    [2012-06-30 12:44:20] Mapping right_kept_reads_seg3 to genome genome with Bowtie2 (3/4)
    [2012-06-30 17:33:54] Mapping right_kept_reads_seg4 to genome genome with Bowtie2 (4/4)
    [2012-06-30 23:10:44] Searching for junctions via segment mapping
    [FAILED]
    Error: segment-based junction search failed with err =1
    [bam_header_read] invalid BAM binary header (this is not a BAM file).
    When I look at the segment_juncs log, I find out that the problem I have is that a single one of my segment BAM files lacks a header:

    segment_juncs v2.0.3 (3443S)
    ---------------------------
    [samopen] SAM header is present: 22 sequences.
    Loading reference sequences...
    Loading 10...done
    Loading 11...done
    Loading 12...done
    Loading 13...done
    Loading 14...done
    Loading 15...done
    Loading 16...done
    Loading 17...done
    Loading 18...done
    Loading 19...done
    Loading 1...done
    Loading 2...done
    Loading 3...done
    Loading 4...done
    Loading 5...done
    Loading 6...done
    Loading 7...done
    Loading 8...done
    Loading 9...done
    Loading MT...done
    Loading X...done
    Loading Y...done
    Loading ...done
    >> Performing segment-search:
    Loading left segment hits...
    [bam_header_read] EOF marker is absent. The input is probably truncated.
    [bam_header_read] EOF marker is absent. The input is probably truncated.
    [bam_header_read] EOF marker is absent. The input is probably truncated.
    [bam_header_read] EOF marker is absent. The input is probably truncated.
    [bam_header_read] EOF marker is absent. The input is probably truncated.
    [bam_header_read] EOF marker is absent. The input is probably truncated.
    [bam_header_read] EOF marker is absent. The input is probably truncated.
    [bam_header_read] EOF marker is absent. The input is probably truncated.
    [bam_header_read] EOF marker is absent. The input is probably truncated.
    [bam_header_read] EOF marker is absent. The input is probably truncated.
    [bam_header_read] EOF marker is absent. The input is probably truncated.
    [bam_header_read] EOF marker is absent. The input is probably truncated.
    [bam_header_read] EOF marker is absent. The input is probably truncated.
    [bam_header_read] EOF marker is absent. The input is probably truncated.
    [bam_header_read] EOF marker is absent. The input is probably truncated.
    [bam_header_read] EOF marker is absent. The input is probably truncated.
    [bam_header_read] EOF marker is absent. The input is probably truncated.
    [bam_header_read] EOF marker is absent. The input is probably truncated.
    [bam_header_read] EOF marker is absent. The input is probably truncated.
    [bam_header_read] EOF marker is absent. The input is probably truncated.
    [bam_header_read] invalid BAM binary header (this is not a BAM file).
    [bam_header_read] invalid BAM binary header (this is not a BAM file).
    [bam_header_read] invalid BAM binary header (this is not a BAM file).
    Error: no SAM header found for file ./tophat_out/tmp/left_kept_reads_seg2.bam
    [bam_header_read] invalid BAM binary header (this is not a BAM file).
    [bam_header_read] invalid BAM binary header (this is not a BAM file).
    [bam_header_read] invalid BAM binary header (this is not a BAM file).
    [bam_header_read] invalid BAM binary header (this is not a BAM file).
    [bam_header_read] invalid BAM binary header (this is not a BAM file).
    Error: no SAM header found for file ./tophat_out/tmp/left_kept_reads_seg2.bam
    Error: no SAM header found for file ./tophat_out/tmp/left_kept_reads_seg2.bam
    Error: no SAM header found for file ./tophat_out/tmp/left_kept_reads_seg2.bam
    [bam_header_read] invalid BAM binary header (this is not a BAM file).
    [bam_header_read] invalid BAM binary header (this is not a BAM file).
    [bam_header_read] invalid BAM binary header (this is not a BAM file).
    Error: no SAM header found for file ./tophat_out/tmp/left_kept_reads_seg2.bam
    [bam_header_read] invalid BAM binary header (this is not a BAM file).
    [bam_header_read] invalid BAM binary header (this is not a BAM file).
    [bam_header_read] invalid BAM binary header (this is not a BAM file).
    Error: no SAM header found for file ./tophat_out/tmp/left_kept_reads_seg2.bam
    Error: no SAM header found for file ./tophat_out/tmp/left_kept_reads_seg2.bam
    Error: no SAM header found for file ./tophat_out/tmp/left_kept_reads_seg2.bam
    Error: no SAM header found for file ./tophat_out/tmp/left_kept_reads_seg2.bam
    Error: no SAM header found for file ./tophat_out/tmp/left_kept_reads_seg2.bam
    [bam_header_read] invalid BAM binary header (this is not a BAM file).
    [bam_header_read] invalid BAM binary header (this is not a BAM file).
    Does anyone have any suggestions as to how to deal with this? Considering that only a single one of the BAM files is missing a header, I'm hoping there is some way to repair it, using the same header found in all the other BAM files. Does anyone know if this is possible? I've tried using samtools reheader tool, but that for some odd reason sends what seems like millions of lines of weird symbols through my terminal (eventually crashing it) and doesn't end up changing the file at all.
    Last edited by JChase; 07-13-2012, 07:25 AM.

  • #2
    I re-ran TopHat, and got only slightly further than the previous times. This time, the BAM files are all intact (no missing headers, I can view all of them), but I got an error that the program couldn't open one of my BAM files.. The segment-juncs log says this was because there were "Too many open files." Has anyone run into this problem before?? I'm desperate to sort this out!


    [2012-07-03 23:15:54] Beginning TopHat run (v2.0.3)
    -----------------------------------------------
    [2012-07-03 23:15:54] Checking for Bowtie
    Bowtie version: 2.0.0.6
    [2012-07-03 23:15:54] Checking for Samtools
    Samtools version: 0.1.18.0
    [2012-07-03 23:15:54] Checking for Bowtie index files
    [2012-07-03 23:15:54] Checking for reference FASTA file
    [2012-07-03 23:15:54] Generating SAM header for /work/jeremy/BowtieIndex/Mus_musculus/NCBI/build37.2/Sequence/Bowtie2Index/genome
    format: fastq
    quality scale: phred33 (default)
    [2012-07-03 23:16:02] Preparing reads
    left reads: min. length=40, max. length=100, 151335054 kept reads (1380 discarded)
    right reads: min. length=40, max. length=100, 151333005 kept reads (3429 discarded)
    [2012-07-04 02:19:20] Mapping left_kept_reads to genome genome with Bowtie2
    [2012-07-04 18:36:05] Mapping left_kept_reads_seg1 to genome genome with Bowtie2 (1/4)
    [2012-07-04 22:45:41] Mapping left_kept_reads_seg2 to genome genome with Bowtie2 (2/4)
    [2012-07-05 02:56:53] Mapping left_kept_reads_seg3 to genome genome with Bowtie2 (3/4)
    [2012-07-05 07:22:00] Mapping left_kept_reads_seg4 to genome genome with Bowtie2 (4/4)
    [2012-07-05 12:48:52] Mapping right_kept_reads to genome genome with Bowtie2
    [2012-07-06 06:24:25] Mapping right_kept_reads_seg1 to genome genome with Bowtie2 (1/4)
    [2012-07-06 11:29:32] Mapping right_kept_reads_seg2 to genome genome with Bowtie2 (2/4)
    [2012-07-06 16:44:32] Mapping right_kept_reads_seg3 to genome genome with Bowtie2 (3/4)
    [2012-07-06 21:18:45] Mapping right_kept_reads_seg4 to genome genome with Bowtie2 (4/4)
    [2012-07-07 02:37:47] Searching for junctions via segment mapping
    [FAILED]
    Error: segment-based junction search failed with err =1
    Error opening SAM file tophatout2/tmp/right_kept_reads_seg2.bam

    -bash-4.1$ less segment_juncs.log
    segment_juncs v2.0.3 (3443S)
    ---------------------------
    [samopen] SAM header is present: 22 sequences.
    Loading reference sequences...
    Loading 10...done
    Loading 11...done
    Loading 12...done
    Loading 13...done
    Loading 14...done
    Loading 15...done
    Loading 16...done
    Loading 17...done
    Loading 18...done
    Loading 19...done
    Loading 1...done
    Loading 2...done
    Loading 3...done
    Loading 4...done
    Loading 5...done
    Loading 6...done
    Loading 7...done
    Loading 8...done
    Loading 9...done
    Loading MT...done
    Loading X...done
    Loading Y...done
    Loading ...done
    >> Performing segment-search:
    Loading left segment hits...
    done.
    Loading right segment hits...
    open: Too many open files
    Error opening SAM file tophatout2/tmp/right_kept_reads_seg2.bam
    Last edited by JChase; 07-07-2012, 01:00 AM.

    Comment


    • #3
      There are already many threads related to this issue (http://seqanswers.com/forums/showthread.php?t=15142, http://seqanswers.com/forums/showthread.php?t=7266)

      So maybe the problem is arising due to less memory. Just try to run the process solely.

      Comment


      • #4
        Originally posted by AsoBioInfo View Post
        There are already many threads related to this issue (http://seqanswers.com/forums/showthread.php?t=15142, http://seqanswers.com/forums/showthread.php?t=7266)

        So maybe the problem is arising due to less memory. Just try to run the process solely.
        Hello,

        I did read through those threads before I posted, but they are referring to different errors (eg, error 9). I'm running this process on a node with half a terabyte of memory, so I do not think that memory can be the issue. When you say "run the process solely", what exactly do you mean? Is it possible to re-start the tophat process without re-mapping?

        Comment


        • #5
          What command did you use?

          Comment


          • #6
            Originally posted by AsoBioInfo View Post
            What command did you use?
            tophat -p 56 -o tophatout2 --genome-read-mismatches 4 --read-mismatches 4 /NCBI/build37.2/Sequence/Bowtie2Index/genome /C4/PChapmani4F_AdpCutTrimmed.fastq,/C4/PC5F_AdpCutTrimmed.fastq,/C4/PC6F_AdpCutTrimmed.fastq,/C4/PC12F_AdpCutTrimmed.fastq /C4/PC4R_AdpCutTrimmed.fastq,/C4/PC5R_AdpCutTrimmed.fastq,/C4/PC6R_AdpCutTrimmed.fastq,/C4/PC12R_AdpCutTrimmed.fastq

            Comment


            • #7
              Run the command without the option -p.

              Comment


              • #8
                Originally posted by AsoBioInfo View Post
                Run the command without the option -p.
                Well, I'll give that a try... Considering it takes 4 days to do the mapping when multithreading with 54 cores, I guess it will take a while to see if this is going to throw an error with only 1 core.

                Comment


                • #9
                  Even I want to give it a try too... as mentioned in the following link:

                  Comment


                  • #10
                    So i'm re-running Tophat on just a single core, and it this rate it should finish mapping my 300million reads and all of their segments sometime next year. I also decided to try a few other things... I ran Tophat on just 10,000 reads, again on 54 cores, and it completed without errors; this suggests to me that multithreading in itself isn't the issue. I then tried it on a quarter of my reads (35 million forward, 35 million reverse), and the program threw the same error as above (too many open files). Will try a few more things; wish me luck.

                    Comment


                    • #11
                      I've seen the "too many open files" error when dealing with alignments else where. I've seen this when sorting very large bam files. This ultimately happens when samtools tries merging all the intermediary files together.
                      If this is the case, the only option may be to raise the limit of simultaneously open files. By default most linux is set at 1024.

                      Code:
                      ulimit -n unlimited
                      to remove the ceiling (for ubuntu.)

                      Be careful as this will only set for the given bash session. I forget how to make it persist. In my case, i increased performance by increasing -m 10x (in samtools sort.) and thus had to merge 1/10 the amount of files.

                      Comment


                      • #12
                        Hello,

                        I wanted to follow up with one final post that, hopefully, will help others should they run into similar problems in the future. Like ians, I suspected that the "too many files open" might have been a node-specific issue despite the fact that it shouldn't have been a problem on the node I was using. In any case, I transferred my data to another node with the same specifications (64 cores, half a terabyte of memory) that had fewer people using it. I re-ran things as before, multithreading across 54 cores, and instead of the "too many files open" issue, I got this error: "Error: ReadStream::getRead() called with out-of-order id#!" I may be incorrect in this assumption, but I think this is directly related to multithreading and the fact that Bowtie2 doesn't keep multithreaded alignments in order (unless you tell it to). In any case, I was unable to multithread Tophat on my reads on either node.

                        To resolve this, it was suggested that I run Tophat on a single thread. Because 300 million reads takes Tophat a long time to process on a single thread, I split my reads up into fourths and ran each set of ~70million reads on a single thread. I am happy to report that this worked! So, for those of you running into these problems, I hope that removing the multi-thread option will also work for you.

                        Thanks again to those in the community who helped me work through this.

                        Comment


                        • #13
                          Originally posted by JChase View Post
                          Hello,

                          I re-ran things as before, multithreading across 54 cores, and instead of the "too many files open" issue, I got this error: "Error: ReadStream::getRead() called with out-of-order id#!" I may be incorrect in this assumption, but I think this is directly related to multithreading and the fact that Bowtie2 doesn't keep multithreaded alignments in order (unless you tell it to). In any case, I was unable to multithread Tophat on my reads on either node.
                          Is there a way to explicitly "keep multithreaded alignments in order"?

                          Don't give up on multithreading. If you can reserve a box, where you don't have to compete for cpu, you may find the run finishes successfully. I had the same problem:
                          Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc

                          Comment


                          • #14
                            Originally posted by ians View Post
                            Is there a way to explicitly "keep multithreaded alignments in order"?

                            Don't give up on multithreading. If you can reserve a box, where you don't have to compete for cpu, you may find the run finishes successfully. I had the same problem:
                            http://seqanswers.com/forums/showthread.php?t=15142
                            Well, Bowtie2 has an option to keep things in order, but I've never had luck with it. And I wouldn't know how to feed that option through Tophat to Bowtie2 anyhow...

                            Comment


                            • #15
                              Code:
                              ulimit -n unlimited
                              to remove the ceiling (for ubuntu.)
                              I had the "too many open files" error in tophat during segment mapping also. Changing the maximum number of open files seems to have fixed the error (Post #11). However, I am running OSX and found out you achieve the same result with a different command.

                              See the following website for details on how to change the max. open files limit on Linux and OSX: http://wiki.basho.com/Open-Files-Limit.html

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Essential Discoveries and Tools in Epitranscriptomics
                                by seqadmin




                                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                                04-22-2024, 07:01 AM
                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 11:49 AM
                              0 responses
                              15 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-24-2024, 08:47 AM
                              0 responses
                              16 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              61 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 10:19 PM
                              0 responses
                              60 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X