Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Extract aligned sequence coordinates from SAM or BAM file

    Hi,
    I have a Illumina sequence reads which I mapped to HG19 using BWA. Resulting BAM file is indexed and sorted. Now what I want is to identify the region of alignment. Let me give an example, on chromosome 1 starting from base 1 to 10000, there are many reads aligned and then there is no any read aligned from 10000 to 20000 and then again from 20000 to 30000 there are many reads aligned and so on...

    I want an output like
    chr1 1 10000
    chr1 20000 30000
    etc...etc...

    I searched for tools doing similar functions but didn't find anything.
    Any help will be really appreciated.

    Thanks.

  • #2
    Have a look at a variation of:

    samtools view -L bedfile.bed x.bam

    Comment


    • #3
      Thnx colindaven,

      I looked into samtools view -L option but that is not what I am looking for.

      Let me explain the problem,
      I have targeted DNA sequencing data from Illumina. After aligning to reference using BWA I got BAM files. From BAM files I want to extract a continuous stretch of reference region to which multiple reads are aligned. Assume that reads are aligned in following way:
      read1 aligned to chr1:1-50
      read2 chr1:10-60
      read3 chr1:30-80
      read4 chr1:150-200
      read5 chr1:200-250
      read6 chr1:250-300
      read7 chr1:350-400
      and so on....

      Now I want continuous stretch of reference to which any read is aligned. Output should be:
      chr1:1-80
      chr1:150-300
      chr1:350-400
      and so on...

      This output tells us that there are several number of reads aligned to chr1 position 1 to 80 in continuation of each other and then no any read aligned to position 80 to 150 and so on...

      Is there an easy way to do so..??..

      Thanks.

      Comment


      • #4
        SIMPLE VERSION ... using samtools depth and awk ...
        #ploc = previous location, pchr=previous chromsome
        #Set SAMT and BAMF to samtools and your bamfile.
        export SAMT=/h1/finneyr/samtools-0.1.18/samtools
        export BAMF=98023.bam
        $SAMT depth $BAMF | awk '{ if ($2!=(ploc+1)){if (ploc!=0){printf("%s %d-%d\n",$1,s,ploc);}s=$2} ploc=$2; }


        More complicated version that handles chr1->chr2 transition and flushes at the end. (I think, this is not completely debugged and and is just a one-off. Not all corner conditions maybe addressed.)

        $SAMT depth $BAMF | \
        awk '
        BEGIN{firsttime=1;}
        {
        if (pchr!=$1) { if (firsttime==1) { firsttime = 0;} else { printf("%s %d-%d\n",pchr,s,ploc);}s=$2}
        else { if ($2!=(ploc+1)){if (ploc!=0){printf("%s %d-%d\n",$1,s,ploc);}s=$2} }
        ploc=$2; pchr=$1
        }
        END{ printf("%s %d-%d\n",pchr,s,ploc);}
        '

        CAVEAT: samtools depth doesn't do cigar 'N' quite right so this won't work for RNA in the best way.
        Last edited by Richard Finney; 08-16-2012, 06:58 AM.

        Comment


        • #5
          Originally posted by Richard Finney View Post
          SIMPLE VERSION ... using samtools depth and awk ...
          #ploc = previous location, pchr=previous chromsome
          #Set SAMT and BAMF to samtools and your bamfile.
          export SAMT=/h1/finneyr/samtools-0.1.18/samtools
          export BAMF=98023.bam
          $SAMT depth $BAMF | awk '{ if ($2!=(ploc+1)){if (ploc!=0){printf("%s %d-%d\n",$1,s,ploc);}s=$2} ploc=$2; }


          More complicated version that handles chr1->chr2 transition and flushes at the end. (I think, this is not completely debugged and and is just a one-off. Not all corner conditions maybe addressed.)

          $SAMT depth $BAMF | \
          awk '
          BEGIN{firsttime=1;}
          {
          if (pchr!=$1) { if (firsttime==1) { firsttime = 0;} else { printf("%s %d-%d\n",pchr,s,ploc);}s=$2}
          else { if ($2!=(ploc+1)){if (ploc!=0){printf("%s %d-%d\n",$1,s,ploc);}s=$2} }
          ploc=$2; pchr=$1
          }
          END{ printf("%s %d-%d\n",pchr,s,ploc);}
          '

          CAVEAT: samtools depth doesn't do cigar 'N' quite right so this won't work for RNA in the best way.
          Hi Richard Finney,
          Thank you for very helpful code. First code worked good for me.

          Comment


          • #6
            I think BEDtools should also fit your demand.
            1. convert the bam into bed by bam2bed
            2. mergeBed.

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Essential Discoveries and Tools in Epitranscriptomics
              by seqadmin


              The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
              Yesterday, 07:01 AM
            • seqadmin
              Current Approaches to Protein Sequencing
              by seqadmin


              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
              04-04-2024, 04:25 PM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 04-11-2024, 12:08 PM
            0 responses
            39 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 10:19 PM
            0 responses
            41 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 09:21 AM
            0 responses
            35 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-04-2024, 09:00 AM
            0 responses
            55 views
            0 likes
            Last Post seqadmin  
            Working...
            X