Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • [bwa] BWA -a inconsistent results

    I've been puzzling over this issue for a few days, and I'm hoping that someone here can help me out. I have a molecular barcode deduplication process that obtains a reference contig alignment for a set of input reads, and then aligns the input reads back to the contig to determine snp/indel information. I'm using bwa mem to align my reads back to my contig sequence. I'm using the -a option to bwa to provide ALL alignments back to the reference sequences, since many of my reference sequences are very similar to each other (some are even identical).

    What I've found is that bwa does NOT return all alignments when -a is specified - there's some sort of internal logic that is preventing the alignment I'm looking for from appearing in the output. Here's some sample data that reproduces my problem:

    reference sequences:
    Code:
    >GCATACAGATGATGCCCTCAACTCTATTGTGTTCAC
    NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNATGATGCCCTGAACGAAGATCTTGTGCTCATACATGGCGACCAAGGCTCCAAGCATGAATGGTGTGAGCTTGGTGAACACAATAGAGTTGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
    >GCAAACAGATGATGACCTCAACTCTATTGTGTTCAC
    NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNATGATGCCCTGAACGAAGATCTTGTGCTCATACATGGCGACCAAGGCTCCAAGCATGAATGGTGTGAGCTTGGTGAACACAATAGAGTTGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
    >AAACAGAAGGTCAAAGCTCAACTCTATTGTGTTCAC
    NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNGGTCAAAGCTGTTGATGTCCCAGATGATGCCCTGAACGAAGATCTTGTGCTCATACATGGCGACCAAGGCTCCAAGCATGAATGGTGTGAGCTTGGTGAACACAATAGAGTTGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
    >ACAAAGCCAGAAGATGACCAACTCTATTGTGTTCAC
    NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNAGATGATGCCCTGAACGAAGATCTTGTGCTCATACATGGCGACCAAGGCTCCAAGCATGAATGGTGTGAGCTTGGTGAACACAATAGAGTTGAGATGATGCCCTGAACGAAGATCTTGTGCTCATACATGGCNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
    >ACTAACAGATGATGCCCTCAACTCTATTGTGTTCAC
    NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNATGATGCCCTGAACGAAGATCTTGTGCTCATACATGGCGACCAAGGCTCCAAGCATGAATGGTGTGAGCTTGGTGAACACAATAGAGTTGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
    >TAACAGAAGGTCAAAGCTCAACTCTATTGTGTTCAC
    NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNGGTCAAAGCTGTTGATGTCCCAGATGATGCCCTGAACGAAGATCTTGTGCTCATACATGGCGACCAAGGCTCCAAGCATGAATGGTGTGAGCTTGGTGAACACAATAGAGTTGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
    >TCGCACCTCACTGGTTAACAACTCTATTGTGTTCAC
    NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNCACTGGTCAAAGCTGTTGATGTCCCAGATGATGCCCTGAACGAAGATCTTGTGCTCATACATGGCGACCAAGGCTCCAAGCATGAATGGTGTGAGCTTGGTGAACACAATAGAGTTGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
    >TAACAGTAGGTCAACGCTCAACTCTATTGTGTTCAC
    NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNGGTCAAAGCTGTTGATGTCCCAGATGATGCCCTGAACGAAGATCTTGTGCTCATACATGGCGACCAAGGCTCCAAGCATGAATGGTGTGAGCTTGGTGAACACAATAGAGTTGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
    >GACATGAACACAGCTCCACAACTCTATTGTGTTCAC
    NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNCACCCCCACTGGTCAAAGCTGTTGATGTCCCAGATGATGCCCTGAACGAAGATCTTGTGCTCATACATGGCGACCAAGGCTCCAAGCATGAATGGTGTGAGCTTGGTGAACACAATAGAGTTNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
    >GGCCGGACCTTGTGTTCATGCGGGCAAGAGTCCAGA
    NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNCGACCAAGGCTCCAAGCATGAATGGTGTGAGCTTGGTGAACACAATAGAGTTGGTTGGGCGATTTCCTTCAAAGACCTGACCTTATGTGGCAGCAGCCTCTCAAGGTCCTCTGGACTNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
    >TGGCACCTCGAAGATCTTCAACTCTATTGTGTTCAC
    NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNCGAAGATCTTGTGCTCATACATGGCCGACCAAGGCTCCAAGCATGAATGGTGTGAGCTTGGTGAACACAATAGAGTTGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
    read to align to references:
    Code:
    @NS500180:28:H0YNMBGXX:2:22110:22781:12271_TGGCACCTCGAAGATCTTCAACTCTATTGTGTTCAC
    CGAAGATCTTGTGCTCATACATGGCGACCAAGGCTCCAAGCATGAATGGTGTGAGCTTGGTGAACACAATAGAGTTG
    +
    FFFFFFFFFFFFFFFFFFFFFFFFFF<FFFFFFFFFF<FFFFFFFFFFFFFFF7FFFFFFFFF<FFFFFFAFFFFFF
    Now, note that the last reference sequence is the one I want my read to align to. I indexed the reference fasta with bwa index and made a simple bwa mem call with -a as the only flag. While there are 11 reference sequences, only 10 alignments are reported out. If I delete one of the reference sequences and re-align, the alignment I'm looking for appears.

    Can anyone explain why this is happening? I was even getting different results depending on the order of the sequences in the reference.

  • #2
    Only 8 are best matches, so it's not clear to me why it's printing 10 in the first place, or why you would expect it to print 11.

    Comment


    • #3
      Originally posted by Brian Bushnell View Post
      Only 8 are best matches, so it's not clear to me why it's printing 10 in the first place, or why you would expect it to print 11.
      Well, where does it say that they are limited at all?

      -a output all alignments for SE or unpaired PE

      Comment


      • #4
        There are always limits - you can align anything to anything, if you allow arbitrarily low identity. So all alignments really means "all alignments that were found and passed filtering criteria". I don't know what the exact filtering criteria are in this case, but it's common to only consider alignments within some threshold of the score of the best alignment.

        Comment


        • #5
          Unfortunately, none of those thresholds are documented. The alignment score of the one that doesn't show up is better than some of the ones that do, which is why this is so frustrating.

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Strategies for Sequencing Challenging Samples
            by seqadmin


            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
            03-22-2024, 06:39 AM
          • seqadmin
            Techniques and Challenges in Conservation Genomics
            by seqadmin



            The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

            Avian Conservation
            Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
            03-08-2024, 10:41 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, Yesterday, 06:37 PM
          0 responses
          10 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, Yesterday, 06:07 PM
          0 responses
          9 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 03-22-2024, 10:03 AM
          0 responses
          49 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 03-21-2024, 07:32 AM
          0 responses
          67 views
          0 likes
          Last Post seqadmin  
          Working...
          X