Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • what's in bwa's .sai file

    besides the content that shown in .sam file(alignment of the best match, and number of suboptimal/all hits), seems it also contains some information of the suboptimal hits, is it possible to look at the details of these hits.
    with the command "bwa samse -n INT", I can only get the position where they mapped and number of mismatch.

    For paired-end data, whether "bwa sampe" consider the suboptimal hits? i.e., best-best match violate the distance constrain, but suboptimal-suboptimal or suboptimal-best may be in the rational distance.

  • #2
    Originally posted by mingkunli View Post
    besides the content that shown in .sam file(alignment of the best match, and number of suboptimal/all hits), seems it also contains some information of the suboptimal hits, is it possible to look at the details of these hits.
    with the command "bwa samse -n INT", I can only get the position where they mapped and number of mismatch.

    For paired-end data, whether "bwa sampe" consider the suboptimal hits? i.e., best-best match violate the distance constrain, but suboptimal-suboptimal or suboptimal-best may be in the rational distance.
    .sai file is the output of command aln. .sai file contain the suffix array coordinate of all short reads loaded in. For bwa, sequence alignment is equal to searching for suffix array interval of substring of chromosome that matches the short read. And if knowing the interval in the suffix array, we can get positions of the short read.
    If you wanna know very detailed how bwa algorithm works, you may read "fast and accurate short read alignment with burrows-wheeler transform' (Heng Li, et al), which has been published in bioinformatics.
    I took a couple of days to full track and understand MAQ, and BWA algorithms. ^ ^

    Best

    Jing

    Comment


    • #3
      Originally posted by mingkunli View Post
      besides the content that shown in .sam file(alignment of the best match, and number of suboptimal/all hits), seems it also contains some information of the suboptimal hits, is it possible to look at the details of these hits.
      with the command "bwa samse -n INT", I can only get the position where they mapped and number of mismatch.

      For paired-end data, whether "bwa sampe" consider the suboptimal hits? i.e., best-best match violate the distance constrain, but suboptimal-suboptimal or suboptimal-best may be in the rational distance.
      I haven't used bwa to process paired-end reads. So I have no hand-on experiences yet.
      "BWA first finds the positions of all the good hits, sorts them according to the chromosomal coordinates and then does a linear scan through all the potential hits to pair the two ends." From the description of the BWA algorithm I listed above, when pairing two ends, the distance constraints have been considered, so the phenomena you mentioned will never happen.

      Comment


      • #4
        Originally posted by henry View Post
        I haven't used bwa to process paired-end reads. So I have no hand-on experiences yet.
        "BWA first finds the positions of all the good hits, sorts them according to the chromosomal coordinates and then does a linear scan through all the potential hits to pair the two ends." From the description of the BWA algorithm I listed above, when pairing two ends, the distance constraints have been considered, so the phenomena you mentioned will never happen.
        Sorry, I just misunderstood you. In MAQ, two best hits are kept in the queue for one end. if there are multiple consistent hit pairs, the mapping qualities are set much lower. so suboptimal hits are considered in MAQ. In BWA, there isn't description about the details how bwa pair two ends. It should also be considered.

        best

        Jing

        Comment


        • #5
          hey Jing, thanks for your help.
          I also got the reply from the author, thanks to lh3
          1) Both optimal and suboptimal hits are stored in .sai files, but only
          approximate chromosomal positions are available. Detailed alignments are
          reconstructed by samse and sampe.
          2) Sampe considers suboptimal hits in pairing.

          However, there is no way to generate the detailed alignments for these suboptimal
          hits(in sam format) using samse, sampe.

          Comment


          • #6
            Originally posted by mingkunli View Post
            hey Jing, thanks for your help.
            I also got the reply from the author, thanks to lh3
            1) Both optimal and suboptimal hits are stored in .sai files, but only
            approximate chromosomal positions are available. Detailed alignments are
            reconstructed by samse and sampe.
            2) Sampe considers suboptimal hits in pairing.

            However, there is no way to generate the detailed alignments for these suboptimal
            hits(in sam format) using samse, sampe.
            hi mingkunli,

            Thank you for sharing this. ^ ^

            Best

            Jing

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Techniques and Challenges in Conservation Genomics
              by seqadmin



              The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

              Avian Conservation
              Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
              03-08-2024, 10:41 AM
            • seqadmin
              The Impact of AI in Genomic Medicine
              by seqadmin



              Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
              02-26-2024, 02:07 PM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 03-14-2024, 06:13 AM
            0 responses
            33 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-08-2024, 08:03 AM
            0 responses
            72 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-07-2024, 08:13 AM
            0 responses
            81 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-06-2024, 09:51 AM
            0 responses
            68 views
            0 likes
            Last Post seqadmin  
            Working...
            X