Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Corrupt BAM file

    Hi all, i have come across a strange problem. I am getting a chromosome SAM file from 1K genomes using the following command, for e.g.

    samtools view -h ftp://ftp-trace.ncbi.nih.gov/1000gen...e.20101123.bam 11 > test.sam

    (please note its for the entire chromosome 11).

    Now I want to convert test.sam into a bam file so i use the following command
    samtools view -bS test.sam -o test1.bam

    The problem is in test1.bam, when i open it with gedit the first few characters are

    BAM\A3>\00\00@HD

    and the very end of the file is again full of junk

    T\00\00\00\00\00\001\00=C\DB\00\00\002\00\8D\ED~\00\00\003\00\95\CD \00\00\004\00d\C8d \00\00\005\00<\8C\C8
    \00\00\006\00;3
    \00\00\007\00gC| \00\00\008\00vV\B9\00\00\009\00\F7\BEj\00\00\0010\00\9B\00\00\0011\004 \00\00\0012\00\F7j\FA\00\00\0013\00VZ\DD\00\00\0014\00$f\00\00\0015\00@\81\00\00\0016\00A\B4b\00\00\0017\00\CA\F0\D6\00\00\0018\00@]\A7\00\00\0019\00\97<\86\00\00\0020\00p\B1\C1\00\00\0021\00gg\DE\00\00\0022\00v\D8\00\00\00X\00\A0=A \00\00\00Y\00\FE\F7\89\00\00\00MT\00\B9@\00\00 \00\00\00GL000207.1\00\A6\00\00 \00\00\00GL000226.1\00\A0:\00\00 \00\00\00GL000229.1\00\C9M\00\00 \00\00\00GL000231.1\00\FAj\00\00 \00\00\00GL000210.1\00"l\00\00 \00\00\00GL000239.1\00 \84\00\00 \00\00\00GL000235.1\00\AA\86\00\00 \00\00\00GL000201.1\004\8D\00\00 \00\00\00GL000247.1\00F\8E\00\00 \00\00\00GL000245.1\00+\8F\00\00 \00\00\00GL000197.1\007\91\00\00 \00\00\00GL000203.1\00z\92\00\00 \00\00\00GL000246.1\00
    \95\00\00 \00\00\00GL000249.1\00f\96\00\00 \00\00\00GL000196.1\00\98\00\00 \00\00\00GL000248.1\00j\9B\00\00 \00\00\00GL000244.1\00\F9\9B\00\00 \00\00\00GL000238.1\00\9C\00\00 \00\00\00GL000202.1\00\A7\9C\00\00 \00\00\00GL000234.1\00S\9E\00\00 \00\00\00GL000232.1\00̞\00\00 \00\00\00GL000206.1\00)\A0\00\00 \00\00\00GL000240.1\00ͣ\00\00 \00\00\00GL000236.1\00Σ\00\00 \00\00\00GL000241.1\00\A8\A4\00\00 \00\00\00GL000243.1\00M\A9\00\00 \00\00\00GL000242.1\00\AA\00\00 \00\00\00GL000230.1\00\AB\AA\00\00 \00\00\00GL000237.1\00+\B3\00\00 \00\00\00GL000233.1\00u\B3\00\00 \00\00\00GL000204.1\00\9E=\00 \00\00\00GL000198.1\00\E5_\00 \00\00\00GL000208.1\00j\00 \00\00\00GL000191.1\00\C1\9F\00 \00\00\00GL000227.1\00v\F5\00 \00\00\00GL000228.1\00`\F8\00 \00\00\00GL000214.1\00\F6\00 \00\00\00GL000221.1\00_\00 \00\00\00GL000209.1\00\C1m\00 \00\00\00GL000218.1\00{u\00 \00\00\00GL000220.1\00
    x\00 \00\00\00GL000213.1\00\8F\81\00 \00\00\00GL000211.1\00\A6\8A\00 \00\00\00GL000199.1\00\92\97\00 \00\00\00GL000217.1\00u\A0\00 \00\00\00GL000216.1\00\A1\00 \00\00\00GL000215.1\00\A2\00 \00\00\00GL000205.1\00\FC\A9\00 \00\00\00GL000219.1\00\FE\BB\00 \00\00\00GL000224.1\00\ED\BD\00 \00\00\00GL000223.1\00\E7\C0\00 \00\00\00GL000195.1\00p\CA\00 \00\00\00GL000212.1\00\EA\D9\00 \00\00\00GL000222.1\00\ED\D9\00 \00\00\00GL000200.1\00\9B\DA\00 \00\00\00GL000193.1\00]\E5\00 \00\00\00GL000194.1\00\ED\EB\00 \00\00\00GL000225.1\00\E58\00 \00\00\00GL000192.1\00\A8Z\00

    Can anyone please tell me why this may be occurring?

    Thanks in advance.

    Ashwin

  • #2
    Your first command, "samtools view -h" converts a compressed, binary file into an uncompressed human-readable SAM file.

    "samtools view -bS" turns that human readable file back into a compressed binary, which is no longer human readable. So it makes sense that it look like garbage, it's not supposed to be text.

    I don't think you really mean to be uncompressing the whole chr 1 file locally onto your computer. I think you want to just transfer the .bam file as it is, and use "samtools view" with a chromosome and region if you want to look with your eyes at a subset individual lines. Otherwise, leave it as a bam, unless you are quite sure that your downstream tools can't handle .bam format. Most software; samtools, picard, GATK, etc, work with compressed bam files fine. No one wants to have to uncompress everything into .sam format to work with it, so those programs are designed to work with nice, compressed .bams.

    Comment


    • #3
      Thank you

      Thanks for the insight, this has sorted my problem.

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Essential Discoveries and Tools in Epitranscriptomics
        by seqadmin




        The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
        04-22-2024, 07:01 AM
      • seqadmin
        Current Approaches to Protein Sequencing
        by seqadmin


        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
        04-04-2024, 04:25 PM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, Today, 11:49 AM
      0 responses
      12 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, Yesterday, 08:47 AM
      0 responses
      16 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-11-2024, 12:08 PM
      0 responses
      61 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 10:19 PM
      0 responses
      60 views
      0 likes
      Last Post seqadmin  
      Working...
      X