Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • [Bowtie2] CIGAR string calculation.

    Hi All,

    The SAM output gives the 1-based leftmost mapping POSition of the first matching base of the reference. I am wondering if it is possible to calculate the last most mapping POSition of the reference? If yes, how? What should I sum and what should I extract?

    Op BAM Description
    M 0 alignment match (can be a sequence match or mismatch)
    I 1 insertion to the reference
    D 2 deletion from the reference
    N 3 skipped region from the reference
    S 4 soft clipping (clipped sequences present in SEQ)
    H 5 hard clipping (clipped sequences NOT present in SEQ)
    P 6 padding (silent deletion from padded reference)
    = 7 sequence match
    X 8 sequence mismatch

  • #2
    Yes, you can do that. In fact, in the samtools C API there's a function (bam_calend) that does exactly that given a starting position and CIGAR string. The only CIGAR operations you have to worry about are 'M', '=', 'X', 'D', and 'N'. In each of those cases, just increment the position by the length of the operation (so 30M would increment by 30).

    Remember to decrement the value by 1 at some point, or else you'll end up being 1 base off (if you were dealing with a BAM, this wouldn't be needed, since the coordinate is 0-based then and the result would then be correct in 1-based coordinates).

    Comment


    • #3
      BTW, there's also a 'B' operation (value 9, or BAM_CBACK), which I've never actually seen and seems to have been intended for Complete Genomics data. You can likely ignore it, since it's never made its way into actual use.

      Comment


      • #4
        Ok thanks! I'll do that. I do have an other question perhaps you can answer me, otherwise I'll make a new threat.

        I've got Paired-End Illumina data mapped against the Human Hg19. When viewing the SAM output, how can I check if a pair mapped against the forward Hg19 genome sequence or against the reverse Hg19 genome sequence?

        Comment


        • #5
          Is this from strand-specific (or "directional") data? If not, you can't determine the strand of the original fragment. If this is stranded data, it ends up depending on the prep that you did. Most of them that I've seen work such that the orientation of read #1 decides the strand. When in doubt, open things in IGV and just have a look at a couple genes, that'll always clarify things.

          Comment


          • #6
            Originally posted by dpryan View Post
            Is this from strand-specific (or "directional") data? If not, you can't determine the strand of the original fragment. If this is stranded data, it ends up depending on the prep that you did. Most of them that I've seen work such that the orientation of read #1 decides the strand. When in doubt, open things in IGV and just have a look at a couple genes, that'll always clarify things.
            As far as I know this are all the cDNA sequences, forward and reverse data. I was hoping that I could see whenever a pair-end of sequences matches to the forward hg19 genome, or reverse hg19 genome. It matters because I want to look at a few + stranded genes and - stranded genes, and I would be handy if I can sort that during my analysis.

            Comment


            • #7
              You're best off just opening things in IGV and having a look at a couple genes. Then you'll know how the library prep was done and if you can use the 0x10 bit in the flag or not.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Advancing Precision Medicine for Rare Diseases in Children
                by seqadmin




                Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
                12-16-2024, 07:57 AM
              • seqadmin
                Recent Advances in Sequencing Technologies
                by seqadmin



                Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

                Long-Read Sequencing
                Long-read sequencing has seen remarkable advancements,...
                12-02-2024, 01:49 PM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 12-17-2024, 10:28 AM
              0 responses
              33 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 12-13-2024, 08:24 AM
              0 responses
              49 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 12-12-2024, 07:41 AM
              0 responses
              34 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 12-11-2024, 07:45 AM
              0 responses
              46 views
              0 likes
              Last Post seqadmin  
              Working...
              X