Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Illumina mapping with bwa

    I've been searching the forum for an answer for this, but couldn't find exactly what I was looking for. If a thread answering my question already exists, I would be grateful if someone pointed me to it.
    So, I've been trying out bwa for mapping of 75bp mate-pair Illumina data. It works fine so far, but I was trying out various parameters in bwa aln to find out the best setting and I think I need a little help.
    I've been playing with the parameters -n -q -k and -l and what I basically wanted to achieve is to allow only one or two mismatches per read. I've tried to disable seeding by setting a very high value for -l and then setting -n 1 or 2, but in IGV I can still see some reads that have a lot more mismatches.
    The last I tried out was:
    -n 2 -k 1 -l 10 -q 15
    but still I see reads with more than 2 mismatches...
    I would be very interested in why there are still more mismatches in my reads than I thought I allowed and also if you have some examples of parameters that you use for bwa Illumina mapping. I know it depends on the project etc. (like using -q 15 only if many of your reads have bad quality towards the end) but it would be good to get an idea what other people use concerning seeding/no seeding/number of mismatches etc.
    Thanks in advance!

  • #2
    I have the same problem. Hope someone can help us!

    Comment


    • #3
      The reason you are seeing reads with more mismatches than specified is because you have paired-end reads, and with paired-end read resolution usually you have a situation where one end maps to some location within the mismatch threshold specified, and the other doesn't, so the other end is then aligned using Smith-Waterman algorithm to the region where one would expect to find it, sometimes producing a quite a few mismatches, indels, or even clipping.

      Some mappers allow a user to specify an option that only "independently-mapped" reads should be paired, which would prevent this. Perhaps there is some work-around with BWA, but I would just filter out reads with more mismatches than normal (note that if pair concordance is important to you, the correct approach is to just accept the fact that some reads will contain more mismatches/indels then specified).
      Last edited by n00c; 03-15-2011, 07:22 AM.

      Comment


      • #4
        If you really want to disable the sensitive mate mapping feature you can do that in bwa sampe with the -s option. Then you could go back and filter your reads for only those where both mates are still mapped using the `-F 12` feature in samtools view (-F means ignore reads containing a flag and 12 = 0x4+0x8 which are the flags for read unmapped and mate-unmapped). I think that would result in only mapped mated reads with 2 mismatches if you set -n 2. I don't think you need to mess with seeding or anything else.

        Comment


        • #5
          BWA searches for mappings up to N+1 differences, where N is the # of differences you specified (or calculated by read length given no option). The (N+1) is to guarantee all N differences. You could map each end of the read independently ("bwa aln" then "bwa samse") to see if you still see more than N+1 differences, and report them here.

          Comment


          • #6
            Thank you so much for all your answers! I guess given the reasons for more mismatches than specified, I will keep those alignments. It doesn't seem to affect too many reads (from what I see in IGV), so I think I can live with that.
            Does anyone also have an example of the parameters they use for mapping of 75 bp mate-pair Illumina reads? That would be very helpful!

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Current Approaches to Protein Sequencing
              by seqadmin


              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
              04-04-2024, 04:25 PM
            • seqadmin
              Strategies for Sequencing Challenging Samples
              by seqadmin


              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
              03-22-2024, 06:39 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 04-11-2024, 12:08 PM
            0 responses
            30 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 10:19 PM
            0 responses
            32 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 09:21 AM
            0 responses
            28 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-04-2024, 09:00 AM
            0 responses
            52 views
            0 likes
            Last Post seqadmin  
            Working...
            X