Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • BWA rescue of multi-mapping or unmapped reads

    In reading the BWA manual, I have come across a question that I still cannot answer. It reads:

    In paired-end alignment, BWA pairs all hits it found. It further performs Smith-Waterman alignment for unmapped reads with mates mapped to rescue mapped mates, and for high-quality anomalous pairs to fix potential alignment errors.
    So BWA will rescue mates presumably if there is a high-quality, uniquely mapping mate pair. My question is: what are the parameters that the unmapped mate have to meet to be considered correctly placed? For instance, if I have 2x100 reads, and one maps uniquely, does the other one have to map with the same stringency as I specified in the aln stage. Or if say, only 30nt of the unmapped read map, and the other 70nt do not (due to say, an inversion), will the unmapped read be mapped and the 70 non-mapping nucleotides just be soft-clipped in the BAM file?

    Any help would be greatly appreciated.

    Cheers,
    Kevin

  • #2
    In some cases, the source code is the best documentation. I assume you mean in "bwa sampe", so check out the "bwa_paired_sw" function. It looks like unmapped mates are rescued if the mapping quality of the other end is greater than SW_MIN_MAPQ (set to 17).

    Comment


    • #3
      I believe that I found what I was looking for. Per Nils' suggestion, I went to the source code.

      My primary query concerned the length of the match (of an unmapped read that was re-mapped using a Smith-Waterman alignment in the vicinity of a uniquely mapped read) that was required for BWA to place it.

      #define SW_MIN_MATCH_LEN 20
      #define SW_MIN_MAPQ 17
      ...
      bwa_cigar_t *bwa_sw_core(bwtint_t l_pac, const ubyte_t *pacseq, int len, const ubyte_t *seq, int64_t *beg, int reglen, int *n_cigar, uint32_t *_cnt)
      {
      ...
      // check whether there are too many N's
      if (reglen < SW_MIN_MATCH_LEN || (int64_t)l_pac - *beg < len) return 0;
      for (k = 0, x = 0; k < len; ++k)
      if (seq[k] >= 4) ++x;
      if ((float)x/len >= 0.25 || len - x < SW_MIN_MATCH_LEN) return 0;
      ...
      if (x < SW_MIN_MATCH_LEN || y < SW_MIN_MATCH_LEN) { // not good enough
      free(path); free(cigar); free(ref_seq);
      *n_cigar = 0;
      return 0;
      So basically, if there are more then 20 bases that match (in a Smith-Waterman re-alignment) the unmapped (or improperly mapped) mate will be rescued.

      Cheers.

      Comment


      • #4
        Thanks for the advice, Nils.

        Comment


        • #5
          It would be nice to have them as command line options to see what effect there is when varying these values.

          Comment


          • #6
            What do anomalous pairs mean. Samtools mpileup is removing such reads(when compared to seeing these in IGV) unless -A option is added but Im not sure what it means.

            Code:
            Samtools spec says
            Added `mpileup -A' to allow to use reads in anomalous pairs in SNP calling.

            Comment


            • #7
              Originally posted by nilshomer View Post
              It would be nice to have them as command line options to see what effect there is when varying these values.
              Indeed it would be!
              Mendelian Disorder: A blogshare of random useful information for general public consumption. [Blog]
              Breakway: A Program to Identify Structural Variations in Genomic Data [Website] [Forum Post]
              Projects: U87MG whole genome sequence [Website] [Paper]

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Recent Advances in Sequencing Analysis Tools
                by seqadmin


                The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
                Yesterday, 07:48 AM
              • seqadmin
                Essential Discoveries and Tools in Epitranscriptomics
                by seqadmin




                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                04-22-2024, 07:01 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Yesterday, 07:17 AM
              0 responses
              11 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 05-02-2024, 08:06 AM
              0 responses
              19 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-30-2024, 12:17 PM
              0 responses
              20 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-29-2024, 10:49 AM
              0 responses
              28 views
              0 likes
              Last Post seqadmin  
              Working...
              X