Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • samtools sam to bam header

    Hi

    I am trying to sort my bam file using samtools 0.1.19-44428cd
    with the following command:

    samtools sort test_sequence.bam test_sorted

    gives:
    [bam_header_read] invalid BAM binary header (this is not a BAM file).
    Segmentation fault (core dumped)


    The bam file was generated in samtools with :
    samtools view -bS test_sequence.sam > test_sequence.bam

    and the header in the sam file (generated with bwa) is

    @SQ SN:chr10 LN:135534747
    @SQ SN:chr11 LN:135006516
    @SQ SN:chr11_gl000202_random LN:40103
    @SQ SN:chr12 LN:133851895
    @SQ SN:chr13 LN:115169878
    @SQ SN:chr14 LN:107349540
    @SQ SN:chr15 LN:102531392
    @SQ SN:chr16 LN:90354753
    @SQ SN:chr17_ctg5_hap1 LN:1680828
    @SQ SN:chr17 LN:81195210


    I also tried:

    samtools view -bT wg.fa test_sequence.sam > test_sequence.bam

    but got the same error.

    I am just starting out in this area, and any help would be appreciated.

    Thanks,
    Last edited by mattbawn; 05-28-2014, 10:51 AM.

  • #2
    Hi,
    for question specific to samtools this mailing list is a better place to ask:

    [email protected]

    You need to subscribe to it at:
    SAM (Sequence Alignment/Map) is a flexible generic format for storing nucleotide sequence alignment. SAMtools provide efficient utilities on…


    before you can post.
    ---------
    Regarding your actual question, the sam header you've posted is certainly not valid.
    You can actually use samtools view with the -H switch to show just the header of a sam or bam file.
    This should also give an error in your case.

    Comment


    • #3
      Did you look at the options in samtools view? Try

      Code:
      samtools view -bSh test_sequence.sam > test_sequence.bam

      Comment


      • #4
        [I]
        Originally posted by swbarnes2 View Post
        Did you look at the options in samtools view? Try

        Code:
        samtools view -bSh test_sequence.sam > test_sequence.bam
        [I/]

        Hi, Yes I get the same result.

        Comment


        • #5
          It's likely that the file somehow got corrupted. Just delete it and remake it.

          Comment


          • #6
            the header as you've posted it -

            @SQ SN:chr10 LN:135534747
            @SQ SN:chr11 LN:135006516
            @SQ SN:chr11_gl000202_random LN:40103
            @SQ SN:chr12 LN:133851895
            @SQ SN:chr13 LN:115169878
            @SQ SN:chr14 LN:107349540
            @SQ SN:chr15 LN:102531392
            @SQ SN:chr16 LN:90354753
            @SQ SN:chr17_ctg5_hap1 LN:1680828
            @SQ SN:chr17 LN:81195210

            lacks a @HD header line and @RG read group lines, so I wouldn't expect this to work with samtools.
            I don't know why you would get such an incomplete header with bwa though.

            Did you provide the full sam header and , if so, how did you generate it ?

            Comment


            • #7
              Originally posted by wolma View Post
              the header as you've posted it -

              @SQ SN:chr10 LN:135534747
              @SQ SN:chr11 LN:135006516
              @SQ SN:chr11_gl000202_random LN:40103
              @SQ SN:chr12 LN:133851895
              @SQ SN:chr13 LN:115169878
              @SQ SN:chr14 LN:107349540
              @SQ SN:chr15 LN:102531392
              @SQ SN:chr16 LN:90354753
              @SQ SN:chr17_ctg5_hap1 LN:1680828
              @SQ SN:chr17 LN:81195210

              lacks a @HD header line and @RG read group lines, so I wouldn't expect this to work with samtools.
              I don't know why you would get such an incomplete header with bwa though.

              Did you provide the full sam header and , if so, how did you generate it ?
              I used the bwa command:

              bwa mem -aM -t 8 hg19bwaidx eG2013-072pf_RR_FAM-SJDM-3604.fastq > testX_sequence.sam

              with and without the -aM flags. and I also tried bwa samse and obtained the same header.

              Comment


              • #8
                The @HD line isn't actually required, though it's normally there. Read groups (@RG lines) are also not required and, in fact, are normally only used if you're calling variants or want to keep some metainfo around. In fact, you don't actually need a header in a BAM file at all, though that only makes sense with unmapped reads.

                The "[bam_header_read] invalid BAM binary header (this is not a BAM file)." only occurs if the file actually isn't a BAM file, since what causes this is if the first 4 decompressed bytes aren't equal to "BAM\001" or if samtools can't read up to 4 bytes (or if it somehow reads more, though I'm not sure how that'd happen since it's just reading into a buffer and returning up to the first 4 bytes). Often this error occurs if the file somehow got corrupted or is actually a SAM file with the wrong extension.

                Comment


                • #9
                  I just found the following on github:

                  Using both the binary distribution of 0.1.18 and the 0.1.19+ (4dde8f5) source compiled on Mac OS 10.8.3, I see the following behavior with multiple valid files: $ samtools view -bS file.sam -o file...




                  Your command line takes advantage of GNU extensions to the behaviour of getopt() that allow options and arguments (e.g. filenames) to be intermingled. Mac OS X provides the BSD getopt() which implements only the standard behaviour with all options preceding the arguments.

                  The easy solution is to write your samtools command portably, with all the options at the start:

                  samtools view -bS -o file.bam file.sam

                  The alternative would be to compile samtools yourself using GNU getopt. However this is likely more bother than it's worth, and you would end up with a samtools command that behaves differently (albeit more conveniently!) from the other commands on your OS X machine.


                  i therefore retried with the following:

                  samtools view -bS -o testX_sequence.bam testX_sequence.sam

                  which at the moment seems to have worked.

                  Comment


                  • #10
                    That's actually not the source of your problem, since "samtools view -bS test_sequence.sam > test_sequence.bam" wouldn't run into that issue. Remaking the BAM file just deletes it and replaces it with a non-corrupt version, as I had previously suggested.

                    Comment


                    • #11
                      The error message "[bam_header_read] invalid BAM binary header (this is not a BAM file)"
                      is generated when "samopen" calls "bam_header_read". When bam_header_read retrieves the first four bytes from the file and either gets less than 4 Bytes or gets somthing different from "BAM\001", this kind of error is thrown.

                      Seems as if the file is corrupt.

                      Comment


                      • #12
                        but if you were right, then why did ALL the previous attempts fail with the same error ?
                        mattbawn says he tried all of these:

                        samtools view -bS test_sequence.sam > test_sequence.bam
                        samtools view -bT wg.fa test_sequence.sam > test_sequence.bam
                        samtools view -bSh test_sequence.sam > test_sequence.bam

                        but now
                        samtools view -bS -o testX_sequence.bam testX_sequence.sam
                        seems to work.

                        The only difference I can see is that this time there is no shell-dependent redirection involved.

                        P.S.: you are right about the header specification. thanks for pointing that out.

                        Comment


                        • #13
                          @wolma: To whom are you replying?

                          My guess is that there's a bad sector on the drive that kept getting used. That's more likely than finding a broken shell these days.

                          BTW, a quick way to check if a BAM file has any chance of being valid is:

                          Code:
                          zcat foo.bam | head -c 4 | od -c
                          If you don't get the following, then samtools will fail:
                          Code:
                          0000000   B   A   M 001

                          Comment


                          • #14
                            personally, I favor the less cryptic python version:

                            Code:
                            import gzip
                            with gzip.open('test.bam','rb') as i:
                                print(i.read(4))
                            which prints:
                            b'BAM\x01'

                            but leaving that aside the appropriate test would then be to just try to write to yet another file using redirection, then to the same file using -o .
                            I haven't looked into the source code, but potentially it could also have to do with how samtools handles writing to files as opposed to writing to stdout. Maybe the later fails under some exotic combination of compiler/OS/zlib version ?

                            Comment


                            • #15
                              Originally posted by wolma View Post
                              I haven't looked into the source code, but potentially it could also have to do with how samtools handles writing to files as opposed to writing to stdout. Maybe the later fails under some exotic combination of compiler/OS/zlib version ?
                              Perhaps, it's always tough to rule out obscure issues like that. The underlying C code doesn't do anything particularly weird there, so I'd expect a lot of things to break in that case (but who knows).

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Essential Discoveries and Tools in Epitranscriptomics
                                by seqadmin




                                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                                04-22-2024, 07:01 AM
                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 08:47 AM
                              0 responses
                              13 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              60 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 10:19 PM
                              0 responses
                              60 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 09:21 AM
                              0 responses
                              54 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X