Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Samtools bam sort and index errors

    I'm having issues with the sorting/indexing of a bam file. I just started working with samtools (and all bioinformatics) last week so I'm pretty new to this.

    First I generated a SAM alignment file using STAR. Then I converted that to a BAM file and it seemed to run correctly. But the I try to sort the bam file, and here are my inputs and the resulting output:

    samtools sort alignment1.bam alignment1.sorted
    [bam_header_read] EOF marker is absent. The input is probably truncated.
    [bam_header_read] invalid BAM binary header (this is not a BAM file).
    Segmentation fault

    If I try to go straight to indexing, I get a similar error:

    samtools index alignment1.bam alignment1.bai
    [bam_header_read] EOF marker is absent. The input is probably truncated.
    [bam_header_read] invalid BAM binary header (this is not a BAM file).
    [bam_index_core] Invalid BAM header.[bam_index_build2] fail to index the BAM file.

    Can someone help me fix these issues?

    Thanks

  • #2
    For the first part see this thread (post #2): http://seqanswers.com/forums/showthread.php?t=21908

    Are you doing this using a virtual (or real) server?

    Comment


    • #3
      Show your command for creating the BAM from the SAM.
      What version of samtools are you using?

      If you know the "od" command, try this ...
      od -x YOURFILENAME.bam | head -1

      output should be

      0000000 8b1f 0408 0000 0000 ff00 0006 4342 0002

      The message you are getting is from this code which is checking for the "BAM\001" characters ...

      magic_len = bam_read(fp, buf, 4);
      if (magic_len != 4 || strncmp(buf, "BAM\001", 4) != 0) {
      fprintf(stderr, "[bam_header_read] invalid BAM binary header (this is not a BAM file).\n");
      return 0;
      }

      Possible reasons are : the file zero length, the file is a sam file, it is corrupted.

      Can you view the first few bytes of the "bam" file in question ?
      Last edited by Richard Finney; 06-13-2013, 09:27 AM.

      Comment


      • #4
        Originally posted by GenoMax View Post
        Are you doing this using a virtual (or real) server?
        I'm not entirely sure. I'm accessing a computer that I think has 24 cores using the ssh command in Unix, and using that to run my programs. My program files and data are stored in what I think is probably a virtual server (to access it directly from my macbook, not using Unix, I use the Go > Connect to server function in the Finder).

        Show your command for creating the BAM from the SAM.
        nohup samtools view -bS Aligned.out.sam > alignment1.bam &>secondtrybam.out&

        I tried the od command and got this:

        od -x alignment1.bam | head -1
        0000000

        My program file is : samtools-0.1.19

        Can you view the first few bytes of the "bam" file in question ?
        I just tried to use vi, more, and less to view and there doesn't seem to be anything in the file. Is there something wrong with my SAM>BAM conversion?

        Thank you

        Comment


        • #5
          When I checked the SAM file, it definitely seems correct. The first number of lines begin with @ and seem to refer to the chromosomes mapped, and after out I see line after line of reads

          Comment


          • #6
            What does "&>" do?

            Try running just this ...
            samtools view -bS Aligned.out.sam > alignment1.bam

            What happens ?
            What is the size of alignment1.bam ?

            Comment


            • #7
              Since nohup makes the program run in the background and not quit if I have to log out of my laptop, I thought "&>" changed the nohup.out file to the name I assigned it, but I think that may have changed the command somehow.

              The original bam file that was giving me trouble is 2 kb.

              The new bam file is 622 mb and counting. When I entered the simpler command I got:

              samtools view -bS Aligned.out.sam > alignment1again.bam
              [samopen] SAM header is present: 25 sequences.
              []

              and its been running since. I think this is probably going to fix it! Thank you!

              The SAM file is about 50 gb, what can I expect the bam file size to be at the end?

              Comment


              • #8
                I checked the size of the secondtrybam.out file that I used &> on, and it's 13.04 gb. I think that the &> directed the binary to write to the .out file rather than the .bam file. I think that's the problem I had.

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Strategies for Sequencing Challenging Samples
                  by seqadmin


                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                  03-22-2024, 06:39 AM
                • seqadmin
                  Techniques and Challenges in Conservation Genomics
                  by seqadmin



                  The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                  Avian Conservation
                  Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                  03-08-2024, 10:41 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, Yesterday, 06:37 PM
                0 responses
                10 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, Yesterday, 06:07 PM
                0 responses
                9 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-22-2024, 10:03 AM
                0 responses
                51 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-21-2024, 07:32 AM
                0 responses
                67 views
                0 likes
                Last Post seqadmin  
                Working...
                X