Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • [bwa] BWA -a inconsistent results

    I've been puzzling over this issue for a few days, and I'm hoping that someone here can help me out. I have a molecular barcode deduplication process that obtains a reference contig alignment for a set of input reads, and then aligns the input reads back to the contig to determine snp/indel information. I'm using bwa mem to align my reads back to my contig sequence. I'm using the -a option to bwa to provide ALL alignments back to the reference sequences, since many of my reference sequences are very similar to each other (some are even identical).

    What I've found is that bwa does NOT return all alignments when -a is specified - there's some sort of internal logic that is preventing the alignment I'm looking for from appearing in the output. Here's some sample data that reproduces my problem:

    reference sequences:
    Code:
    >GCATACAGATGATGCCCTCAACTCTATTGTGTTCAC
    NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNATGATGCCCTGAACGAAGATCTTGTGCTCATACATGGCGACCAAGGCTCCAAGCATGAATGGTGTGAGCTTGGTGAACACAATAGAGTTGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
    >GCAAACAGATGATGACCTCAACTCTATTGTGTTCAC
    NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNATGATGCCCTGAACGAAGATCTTGTGCTCATACATGGCGACCAAGGCTCCAAGCATGAATGGTGTGAGCTTGGTGAACACAATAGAGTTGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
    >AAACAGAAGGTCAAAGCTCAACTCTATTGTGTTCAC
    NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNGGTCAAAGCTGTTGATGTCCCAGATGATGCCCTGAACGAAGATCTTGTGCTCATACATGGCGACCAAGGCTCCAAGCATGAATGGTGTGAGCTTGGTGAACACAATAGAGTTGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
    >ACAAAGCCAGAAGATGACCAACTCTATTGTGTTCAC
    NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNAGATGATGCCCTGAACGAAGATCTTGTGCTCATACATGGCGACCAAGGCTCCAAGCATGAATGGTGTGAGCTTGGTGAACACAATAGAGTTGAGATGATGCCCTGAACGAAGATCTTGTGCTCATACATGGCNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
    >ACTAACAGATGATGCCCTCAACTCTATTGTGTTCAC
    NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNATGATGCCCTGAACGAAGATCTTGTGCTCATACATGGCGACCAAGGCTCCAAGCATGAATGGTGTGAGCTTGGTGAACACAATAGAGTTGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
    >TAACAGAAGGTCAAAGCTCAACTCTATTGTGTTCAC
    NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNGGTCAAAGCTGTTGATGTCCCAGATGATGCCCTGAACGAAGATCTTGTGCTCATACATGGCGACCAAGGCTCCAAGCATGAATGGTGTGAGCTTGGTGAACACAATAGAGTTGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
    >TCGCACCTCACTGGTTAACAACTCTATTGTGTTCAC
    NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNCACTGGTCAAAGCTGTTGATGTCCCAGATGATGCCCTGAACGAAGATCTTGTGCTCATACATGGCGACCAAGGCTCCAAGCATGAATGGTGTGAGCTTGGTGAACACAATAGAGTTGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
    >TAACAGTAGGTCAACGCTCAACTCTATTGTGTTCAC
    NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNGGTCAAAGCTGTTGATGTCCCAGATGATGCCCTGAACGAAGATCTTGTGCTCATACATGGCGACCAAGGCTCCAAGCATGAATGGTGTGAGCTTGGTGAACACAATAGAGTTGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
    >GACATGAACACAGCTCCACAACTCTATTGTGTTCAC
    NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNCACCCCCACTGGTCAAAGCTGTTGATGTCCCAGATGATGCCCTGAACGAAGATCTTGTGCTCATACATGGCGACCAAGGCTCCAAGCATGAATGGTGTGAGCTTGGTGAACACAATAGAGTTNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
    >GGCCGGACCTTGTGTTCATGCGGGCAAGAGTCCAGA
    NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNCGACCAAGGCTCCAAGCATGAATGGTGTGAGCTTGGTGAACACAATAGAGTTGGTTGGGCGATTTCCTTCAAAGACCTGACCTTATGTGGCAGCAGCCTCTCAAGGTCCTCTGGACTNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
    >TGGCACCTCGAAGATCTTCAACTCTATTGTGTTCAC
    NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNCGAAGATCTTGTGCTCATACATGGCCGACCAAGGCTCCAAGCATGAATGGTGTGAGCTTGGTGAACACAATAGAGTTGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
    read to align to references:
    Code:
    @NS500180:28:H0YNMBGXX:2:22110:22781:12271_TGGCACCTCGAAGATCTTCAACTCTATTGTGTTCAC
    CGAAGATCTTGTGCTCATACATGGCGACCAAGGCTCCAAGCATGAATGGTGTGAGCTTGGTGAACACAATAGAGTTG
    +
    FFFFFFFFFFFFFFFFFFFFFFFFFF<FFFFFFFFFF<FFFFFFFFFFFFFFF7FFFFFFFFF<FFFFFFAFFFFFF
    Now, note that the last reference sequence is the one I want my read to align to. I indexed the reference fasta with bwa index and made a simple bwa mem call with -a as the only flag. While there are 11 reference sequences, only 10 alignments are reported out. If I delete one of the reference sequences and re-align, the alignment I'm looking for appears.

    Can anyone explain why this is happening? I was even getting different results depending on the order of the sequences in the reference.

  • #2
    Only 8 are best matches, so it's not clear to me why it's printing 10 in the first place, or why you would expect it to print 11.

    Comment


    • #3
      Originally posted by Brian Bushnell View Post
      Only 8 are best matches, so it's not clear to me why it's printing 10 in the first place, or why you would expect it to print 11.
      Well, where does it say that they are limited at all?

      -a output all alignments for SE or unpaired PE

      Comment


      • #4
        There are always limits - you can align anything to anything, if you allow arbitrarily low identity. So all alignments really means "all alignments that were found and passed filtering criteria". I don't know what the exact filtering criteria are in this case, but it's common to only consider alignments within some threshold of the score of the best alignment.

        Comment


        • #5
          Unfortunately, none of those thresholds are documented. The alignment score of the one that doesn't show up is better than some of the ones that do, which is why this is so frustrating.

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Essential Discoveries and Tools in Epitranscriptomics
            by seqadmin




            The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
            04-22-2024, 07:01 AM
          • seqadmin
            Current Approaches to Protein Sequencing
            by seqadmin


            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
            04-04-2024, 04:25 PM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, Yesterday, 11:49 AM
          0 responses
          15 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-24-2024, 08:47 AM
          0 responses
          16 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-11-2024, 12:08 PM
          0 responses
          61 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 10:19 PM
          0 responses
          60 views
          0 likes
          Last Post seqadmin  
          Working...
          X