Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • [Bowtie2] CIGAR string calculation.

    Hi All,

    The SAM output gives the 1-based leftmost mapping POSition of the first matching base of the reference. I am wondering if it is possible to calculate the last most mapping POSition of the reference? If yes, how? What should I sum and what should I extract?

    Op BAM Description
    M 0 alignment match (can be a sequence match or mismatch)
    I 1 insertion to the reference
    D 2 deletion from the reference
    N 3 skipped region from the reference
    S 4 soft clipping (clipped sequences present in SEQ)
    H 5 hard clipping (clipped sequences NOT present in SEQ)
    P 6 padding (silent deletion from padded reference)
    = 7 sequence match
    X 8 sequence mismatch

  • #2
    Yes, you can do that. In fact, in the samtools C API there's a function (bam_calend) that does exactly that given a starting position and CIGAR string. The only CIGAR operations you have to worry about are 'M', '=', 'X', 'D', and 'N'. In each of those cases, just increment the position by the length of the operation (so 30M would increment by 30).

    Remember to decrement the value by 1 at some point, or else you'll end up being 1 base off (if you were dealing with a BAM, this wouldn't be needed, since the coordinate is 0-based then and the result would then be correct in 1-based coordinates).

    Comment


    • #3
      BTW, there's also a 'B' operation (value 9, or BAM_CBACK), which I've never actually seen and seems to have been intended for Complete Genomics data. You can likely ignore it, since it's never made its way into actual use.

      Comment


      • #4
        Ok thanks! I'll do that. I do have an other question perhaps you can answer me, otherwise I'll make a new threat.

        I've got Paired-End Illumina data mapped against the Human Hg19. When viewing the SAM output, how can I check if a pair mapped against the forward Hg19 genome sequence or against the reverse Hg19 genome sequence?

        Comment


        • #5
          Is this from strand-specific (or "directional") data? If not, you can't determine the strand of the original fragment. If this is stranded data, it ends up depending on the prep that you did. Most of them that I've seen work such that the orientation of read #1 decides the strand. When in doubt, open things in IGV and just have a look at a couple genes, that'll always clarify things.

          Comment


          • #6
            Originally posted by dpryan View Post
            Is this from strand-specific (or "directional") data? If not, you can't determine the strand of the original fragment. If this is stranded data, it ends up depending on the prep that you did. Most of them that I've seen work such that the orientation of read #1 decides the strand. When in doubt, open things in IGV and just have a look at a couple genes, that'll always clarify things.
            As far as I know this are all the cDNA sequences, forward and reverse data. I was hoping that I could see whenever a pair-end of sequences matches to the forward hg19 genome, or reverse hg19 genome. It matters because I want to look at a few + stranded genes and - stranded genes, and I would be handy if I can sort that during my analysis.

            Comment


            • #7
              You're best off just opening things in IGV and having a look at a couple genes. Then you'll know how the library prep was done and if you can use the 0x10 bit in the flag or not.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Current Approaches to Protein Sequencing
                by seqadmin


                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                04-04-2024, 04:25 PM
              • seqadmin
                Strategies for Sequencing Challenging Samples
                by seqadmin


                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                03-22-2024, 06:39 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 04-11-2024, 12:08 PM
              0 responses
              31 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 10:19 PM
              0 responses
              32 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 09:21 AM
              0 responses
              28 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-04-2024, 09:00 AM
              0 responses
              53 views
              0 likes
              Last Post seqadmin  
              Working...
              X