Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • what's in bwa's .sai file

    besides the content that shown in .sam file(alignment of the best match, and number of suboptimal/all hits), seems it also contains some information of the suboptimal hits, is it possible to look at the details of these hits.
    with the command "bwa samse -n INT", I can only get the position where they mapped and number of mismatch.

    For paired-end data, whether "bwa sampe" consider the suboptimal hits? i.e., best-best match violate the distance constrain, but suboptimal-suboptimal or suboptimal-best may be in the rational distance.

  • #2
    Originally posted by mingkunli View Post
    besides the content that shown in .sam file(alignment of the best match, and number of suboptimal/all hits), seems it also contains some information of the suboptimal hits, is it possible to look at the details of these hits.
    with the command "bwa samse -n INT", I can only get the position where they mapped and number of mismatch.

    For paired-end data, whether "bwa sampe" consider the suboptimal hits? i.e., best-best match violate the distance constrain, but suboptimal-suboptimal or suboptimal-best may be in the rational distance.
    .sai file is the output of command aln. .sai file contain the suffix array coordinate of all short reads loaded in. For bwa, sequence alignment is equal to searching for suffix array interval of substring of chromosome that matches the short read. And if knowing the interval in the suffix array, we can get positions of the short read.
    If you wanna know very detailed how bwa algorithm works, you may read "fast and accurate short read alignment with burrows-wheeler transform' (Heng Li, et al), which has been published in bioinformatics.
    I took a couple of days to full track and understand MAQ, and BWA algorithms. ^ ^

    Best

    Jing

    Comment


    • #3
      Originally posted by mingkunli View Post
      besides the content that shown in .sam file(alignment of the best match, and number of suboptimal/all hits), seems it also contains some information of the suboptimal hits, is it possible to look at the details of these hits.
      with the command "bwa samse -n INT", I can only get the position where they mapped and number of mismatch.

      For paired-end data, whether "bwa sampe" consider the suboptimal hits? i.e., best-best match violate the distance constrain, but suboptimal-suboptimal or suboptimal-best may be in the rational distance.
      I haven't used bwa to process paired-end reads. So I have no hand-on experiences yet.
      "BWA first finds the positions of all the good hits, sorts them according to the chromosomal coordinates and then does a linear scan through all the potential hits to pair the two ends." From the description of the BWA algorithm I listed above, when pairing two ends, the distance constraints have been considered, so the phenomena you mentioned will never happen.

      Comment


      • #4
        Originally posted by henry View Post
        I haven't used bwa to process paired-end reads. So I have no hand-on experiences yet.
        "BWA first finds the positions of all the good hits, sorts them according to the chromosomal coordinates and then does a linear scan through all the potential hits to pair the two ends." From the description of the BWA algorithm I listed above, when pairing two ends, the distance constraints have been considered, so the phenomena you mentioned will never happen.
        Sorry, I just misunderstood you. In MAQ, two best hits are kept in the queue for one end. if there are multiple consistent hit pairs, the mapping qualities are set much lower. so suboptimal hits are considered in MAQ. In BWA, there isn't description about the details how bwa pair two ends. It should also be considered.

        best

        Jing

        Comment


        • #5
          hey Jing, thanks for your help.
          I also got the reply from the author, thanks to lh3
          1) Both optimal and suboptimal hits are stored in .sai files, but only
          approximate chromosomal positions are available. Detailed alignments are
          reconstructed by samse and sampe.
          2) Sampe considers suboptimal hits in pairing.

          However, there is no way to generate the detailed alignments for these suboptimal
          hits(in sam format) using samse, sampe.

          Comment


          • #6
            Originally posted by mingkunli View Post
            hey Jing, thanks for your help.
            I also got the reply from the author, thanks to lh3
            1) Both optimal and suboptimal hits are stored in .sai files, but only
            approximate chromosomal positions are available. Detailed alignments are
            reconstructed by samse and sampe.
            2) Sampe considers suboptimal hits in pairing.

            However, there is no way to generate the detailed alignments for these suboptimal
            hits(in sam format) using samse, sampe.
            hi mingkunli,

            Thank you for sharing this. ^ ^

            Best

            Jing

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Current Approaches to Protein Sequencing
              by seqadmin


              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
              04-04-2024, 04:25 PM
            • seqadmin
              Strategies for Sequencing Challenging Samples
              by seqadmin


              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
              03-22-2024, 06:39 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 04-11-2024, 12:08 PM
            0 responses
            25 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 10:19 PM
            0 responses
            28 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 09:21 AM
            0 responses
            24 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-04-2024, 09:00 AM
            0 responses
            52 views
            0 likes
            Last Post seqadmin  
            Working...
            X