Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Newbie Question: [bam_header_read] EOF marker is absent.

    What does this error mean with respect to the completion of my samtools command?

    [bam_header_read] EOF marker is absent.

    Does it mean that the command made it to the end of the file and completed satisfactorily? But just found no specific line indicating the end of the file?

  • #2
    The cryptic error from samtools "EOF marker is absent" is referring to the absence of a special empty BGZF block of 28 bytes, which samtools looks for at the end of the data to indicate the BAM file is complete.

    If you see that error, either:

    (a) Your file is somehow truncated or incomplete (a real error)
    (b) Your file is from a tool not writing this EOF marker (perhaps a very old samtools?)

    Where did your BAM file come from?

    Comment


    • #3
      My bam file was actually made with BWA and the most recent version of SAM. I am concerned because although i received the error, the files are the right size. I'll probably just redo them. Thanks for the clarification though

      Comment


      • #4
        oiio, please post the command you are trying to execute.

        This message happens too if you're trying to run samtools with a SAM file instead of a BAM file.

        Comment


        • #5
          The command lines are very simple... samtools sort 1.bam 1.sorted ... etc
          Also I don't think some of them would work if the file was still a SAM. Thanks though.

          Does anyone know of/practice a fast way to check a ton of BAMs for the presence of the EOF marker?

          Comment


          • #6
            Probably this would work:

            Code:
            tail problem.bam | hexdump -C
            You're looking for the following in hex as the final 28 bytes,

            Code:
            0x1f 0x8b 0x08 0x04 0x00 0x00 0x00 0x00
            0x00 0xff 0x06 0x00 0x42 0x43 0x02 0x00
            0x1b 0x00 0x03 0x00 0x00 0x00 0x00 0x00
            0x00 0x00 0x00 0x00
            Or in octal if you prefer that, "\037\213\010\4\0\0\0\0\0\377\6\0\102\103\2\0\033\0\3\0\0\0\0\0\0\0\0\0" as used in function bgzf_check_EOF in samtools file bgzf.c

            Comment


            • #7
              Awesome, thanks

              Comment


              • #8
                What about this?

                What if the end is 31 bytes:

                1F 8B 08 04 00 00 00 00 00 FF 06 00 42 43 02 00 1E 00 01 00 00 FF FF 00 00 00 00 00 00 00 00

                And by the way if you use Windows, HxD is really cool to open how ever large your BAM.

                Best,

                dong

                Comment


                • #9
                  Originally posted by xied75 View Post
                  What if the end is 31 bytes:

                  1F 8B 08 04 00 00 00 00 00 FF 06 00 42 43 02 00 1E 00 01 00 00 FF FF 00 00 00 00 00 00 00 00

                  And by the way if you use Windows, HxD is really cool to open how ever large your BAM.

                  Best,

                  dong
                  You're seeing a different empty BGZF block, a known bug in samtools output for uncompressed BAM. See https://github.com/lh3/samtools/pull/7 and associated mailing list thread http://sourceforge.net/mailarchive/m...sg_id=28413844

                  Edit: Recap post with current patch http://sourceforge.net/mailarchive/m...sg_id=28843382
                  Last edited by maubp; 02-25-2012, 11:07 AM. Reason: Adding another URL

                  Comment


                  • #10
                    Thanks Peter, you are my hero.

                    Comment


                    • #11
                      Originally posted by maubp View Post
                      You're seeing a different empty BGZF block, a known bug in samtools output for uncompressed BAM. See https://github.com/lh3/samtools/pull/7 and associated mailing list thread http://sourceforge.net/mailarchive/m...sg_id=28413844
                      Hi, sorry to bother you, but I found your code and was wondering how to implement it. I'm pretty new to Unix and bioinformatics in general and I was wondering if you could refer me to a guide on how to set this up or give me a general step-by-step thing. Thanks a lot!

                      Comment


                      • #12
                        I meant it for information only really (and as a reminder to the samtools team).

                        The easy answer is to be aware that this EOF warning can be a false positive.

                        If you are interested, you'll need to learn a bit about patch files. The Unix command diff creates a list of differences, also called a patch. The Unix patch command takes these files as inputs and applies the changes to your copy of the original files. The idea is you could download the samtools source code, apply this patch (make the correction for the bug), then compile and install the fixed samtools.

                        Comment


                        • #13
                          Thank you very much! I will look into that.

                          -Edwin

                          Comment


                          • #14
                            I got he same error [bam_header_read] EOF marker is absent.
                            [bam_header_read] invalid BAM binary header (this is not a BAM file).
                            File ./merged_asm/tmp/mergeSam_filepsu0Hv doesn't appear to be a valid BAM file, trying SAM...
                            [11:16:29] Loading reference annotation.
                            [11:16:55] Inspecting reads and determining fragment length distribution.
                            Processed 39384 loci.

                            As you can see, trying for SAM... and Loading reference annotation.. and then the process continues...
                            My questions are
                            1. what if continue like this (trying SAM)? is it OK (without trying tail problem.bam | hexdump -C) ?

                            2. It skips the large bundle as below:
                            [11:16:56] Assembling transcripts and estimating abundances.
                            6:126102153-130463972 Warning: Skipping large bundle.
                            Processed 39383 loci.
                            Is it okay to go for this ? or how can I add large bundle ?




                            Originally posted by maubp View Post
                            I meant it for information only really (and as a reminder to the samtools team).

                            The easy answer is to be aware that this EOF warning can be a false positive.

                            If you are interested, you'll need to learn a bit about patch files. The Unix command diff creates a list of differences, also called a patch. The Unix patch command takes these files as inputs and applies the changes to your copy of the original files. The idea is you could download the samtools source code, apply this patch (make the correction for the bug), then compile and install the fixed samtools.

                            Comment


                            • #15
                              Hi all,

                              This is a really old thread, but I have come across the same issue and I'm not sure how to fix it with the patch.

                              I am using samtools to convert a .sam file mapped using bowtie2 to a .bam file.
                              The .sam file looks like it's all there, but when I use the below command, something strange happens during the conversion. I'm trying to incorporate this info into the anvio pipeline and I am using the anvio-init-bam command to sort. Any ideas?

                              $samtools view -F 4 -bS -u ecosphere_merged_MAPPING/Past_Sample_01.sam > ecosphere_merged_MAPPING/Past_Sample_01-RAW.bam
                              [samopen] SAM header is present: 196761 sequences.

                              $anvi-init-bam ecosphere_merged_MAPPING/Past_Sample_01-RAW.bam -o ecosphere_merged_MAPPING/Past_Sample_01.bam

                              [28 May 17 12:34:55 SORT] Sorting BAM File... May take a while depending on the size. [W::bam_hdr_read] EOF marker is absent. The input is probably truncated.
                              [E::bgzf_read] bgzf_read_block error -1 after 0 of 4 bytes
                              Traceback (most recent call last):
                              File "/usr/local/bin/anvi-init-bam", line 75, in <module>
                              output_file_path = args.output_file,))
                              File "/usr/local/bin/anvi-init-bam", line 48, in init_bam_file
                              pysam.sort("-o", output_file_path, input_file_path)
                              File "/usr/local/lib/python3.5/dist-packages/pysam/utils.py", line 75, in __call__
                              stderr))
                              pysam.utils.SamtoolsError: 'samtools returned with error 1: stdout=, stderr=[bam_sort_core] truncated file. Aborting.\n'

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM
                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 06:37 PM
                              0 responses
                              7 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, Yesterday, 06:07 PM
                              0 responses
                              7 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-22-2024, 10:03 AM
                              0 responses
                              49 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-21-2024, 07:32 AM
                              0 responses
                              66 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X