Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • help recovering data from bam w/ bad header

    I have a bam file that I need to work with to reproduce an issue for some troubleshooting. This bam file was left by a student in a collaborator's lab and we pretty much only have this bam file (no upstream or downstream derivatives of the file). Samtools view gives this error message:

    [bam_header_read] invalid BAM binary header (this is not a BAM file).
    [main_samview] fail to read the header from "7_2.bam".

    Picard's ValidateSamFile produces only the message:

    ERROR: Read groups is empty
    SAMFormatException on record 01

    along with STDERR that looks like the java code is erroring out. Bamtools stats gives this error:

    bamtools stats ERROR: could not open input BAM file(s)... Aborting.


    This file is of a size that would be appropriate for the mapping done, and somehow the person who created it used it. It is a rather old copy that has now been passed around a bit but if it was truncated from a transfer I'd have expected different errors (I think...).

    Is there some way I can dump just the data from a bam file without dealing with the bad header? Or is there some way I can read in a generic 'dummy' header over the bad one to save the data?


    Thanks,
    John

  • #2
    How's your C programming? I suspect you're going to have to code something with htslib to figure out (A) where the problem is and (B) get around it as best as possible.

    Comment


    • #3
      I've no experience with C coding, I mainly stick to scripting languages (perl & python). But I'll take a look at the htslib methods and see if I can muddle through.

      Thanks,
      John

      Comment


      • #5
        I'll try the GATK ValidateSamFile and see what it says. GATK tools usually do give helpful error messages and I haven't tried that one yet.

        Thanks,
        John

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM
        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        18 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        22 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 09:21 AM
        0 responses
        16 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-04-2024, 09:00 AM
        0 responses
        47 views
        0 likes
        Last Post seqadmin  
        Working...
        X