Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • [Bowtie2] CIGAR string calculation.

    Hi All,

    The SAM output gives the 1-based leftmost mapping POSition of the first matching base of the reference. I am wondering if it is possible to calculate the last most mapping POSition of the reference? If yes, how? What should I sum and what should I extract?

    Op BAM Description
    M 0 alignment match (can be a sequence match or mismatch)
    I 1 insertion to the reference
    D 2 deletion from the reference
    N 3 skipped region from the reference
    S 4 soft clipping (clipped sequences present in SEQ)
    H 5 hard clipping (clipped sequences NOT present in SEQ)
    P 6 padding (silent deletion from padded reference)
    = 7 sequence match
    X 8 sequence mismatch

  • #2
    Yes, you can do that. In fact, in the samtools C API there's a function (bam_calend) that does exactly that given a starting position and CIGAR string. The only CIGAR operations you have to worry about are 'M', '=', 'X', 'D', and 'N'. In each of those cases, just increment the position by the length of the operation (so 30M would increment by 30).

    Remember to decrement the value by 1 at some point, or else you'll end up being 1 base off (if you were dealing with a BAM, this wouldn't be needed, since the coordinate is 0-based then and the result would then be correct in 1-based coordinates).

    Comment


    • #3
      BTW, there's also a 'B' operation (value 9, or BAM_CBACK), which I've never actually seen and seems to have been intended for Complete Genomics data. You can likely ignore it, since it's never made its way into actual use.

      Comment


      • #4
        Ok thanks! I'll do that. I do have an other question perhaps you can answer me, otherwise I'll make a new threat.

        I've got Paired-End Illumina data mapped against the Human Hg19. When viewing the SAM output, how can I check if a pair mapped against the forward Hg19 genome sequence or against the reverse Hg19 genome sequence?

        Comment


        • #5
          Is this from strand-specific (or "directional") data? If not, you can't determine the strand of the original fragment. If this is stranded data, it ends up depending on the prep that you did. Most of them that I've seen work such that the orientation of read #1 decides the strand. When in doubt, open things in IGV and just have a look at a couple genes, that'll always clarify things.

          Comment


          • #6
            Originally posted by dpryan View Post
            Is this from strand-specific (or "directional") data? If not, you can't determine the strand of the original fragment. If this is stranded data, it ends up depending on the prep that you did. Most of them that I've seen work such that the orientation of read #1 decides the strand. When in doubt, open things in IGV and just have a look at a couple genes, that'll always clarify things.
            As far as I know this are all the cDNA sequences, forward and reverse data. I was hoping that I could see whenever a pair-end of sequences matches to the forward hg19 genome, or reverse hg19 genome. It matters because I want to look at a few + stranded genes and - stranded genes, and I would be handy if I can sort that during my analysis.

            Comment


            • #7
              You're best off just opening things in IGV and having a look at a couple genes. Then you'll know how the library prep was done and if you can use the 0x10 bit in the flag or not.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Recent Advances in Sequencing Analysis Tools
                by seqadmin


                The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
                05-06-2024, 07:48 AM
              • seqadmin
                Essential Discoveries and Tools in Epitranscriptomics
                by seqadmin




                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                04-22-2024, 07:01 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 05-10-2024, 06:35 AM
              0 responses
              20 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 05-09-2024, 02:46 PM
              0 responses
              26 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 05-07-2024, 06:57 AM
              0 responses
              21 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 05-06-2024, 07:17 AM
              0 responses
              21 views
              0 likes
              Last Post seqadmin  
              Working...
              X