Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • lre1234
    Senior Member
    • Aug 2011
    • 110

    Potentially corrupted BAM files

    Hi all,
    I have a question about detecting possible corruptions to BAM files. Essentially, we had a serious hard drive issue and failure across our network. While we have been able to restore our hard drives and retrieve our data, we have noticed issues with many files. for example, in some text files, some information has been changed to non ascii data, or lines deleted from them. We still do not yet know the full extend of the issues but are trying to figure it out. One, potential issue is that we have many 1000's of mapped bam files. We do not know if the files have been corrupted in anyway, and are interested in finding out. One approach (I think) would be to try to convert them back to SAM files, and if it breaks then the file is somehow corrupted. But I was wondering if anyone knew of some other method that can check the file.

    I should note that we do have most of this backed up on a tape storage system that we could go back to, but this would take an exceedingly long time to do for our whole system. So we would like to maybe find the 'bad' files and replace those only. Also, I am not a software engineer, so hopefully this all makes sense
  • Richard Finney
    Senior Member
    • Feb 2009
    • 701

    #2
    zcat the bam file and check for errors ... example ...

    rfinney@pigdog:~$ file test.bam
    test.bam: gzip compressed data, extra field

    rfinney@pigdog:~$ samtools index test.bam

    rfinney@pigdog:~$ cat test.bam | tr " " "x" > test2.bam # corrupt the file test2.bam

    rfinney@pigdog:~$ samtools index test2.bam
    [E::bam_hdr_read] invalid BAM binary header
    samtools index: "test2.bam" is corrupted or unsorted

    rfinney@pigdog:~$ zcat test.bam > /dev/null

    rfinney@pigdog:~$ zcat test2.bam > /dev/null
    gzip: test2.bam: invalid compressed data--crc error
    gzip: test2.bam: invalid compressed data--length error
    Last edited by Richard Finney; 09-01-2016, 11:46 AM.

    Comment

    Latest Articles

    Collapse

    • SEQadmin2
      Nine Things a Sample Prep Scientist Thinks About Before Sequencing
      by SEQadmin2


      I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.


      Here are nine questions we think about, in roughly the order they matter, before...
      06-18-2026, 07:11 AM
    • SEQadmin2
      From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
      by SEQadmin2


      Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


      The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
      ...
      06-02-2026, 10:05 AM
    • SEQadmin2
      Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
      by SEQadmin2


      With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


      Introduction

      Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
      05-22-2026, 06:42 AM

    ad_right_rmr

    Collapse

    News

    Collapse

    Topics Statistics Last Post
    Started by SEQadmin2, 06-17-2026, 06:09 AM
    0 responses
    21 views
    0 reactions
    Last Post SEQadmin2  
    Started by SEQadmin2, 06-09-2026, 11:58 AM
    0 responses
    38 views
    0 reactions
    Last Post SEQadmin2  
    Started by SEQadmin2, 06-05-2026, 10:09 AM
    0 responses
    45 views
    0 reactions
    Last Post SEQadmin2  
    Started by SEQadmin2, 06-04-2026, 08:59 AM
    0 responses
    49 views
    0 reactions
    Last Post SEQadmin2  
    Working...