Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Illumina mapping with bwa

    I've been searching the forum for an answer for this, but couldn't find exactly what I was looking for. If a thread answering my question already exists, I would be grateful if someone pointed me to it.
    So, I've been trying out bwa for mapping of 75bp mate-pair Illumina data. It works fine so far, but I was trying out various parameters in bwa aln to find out the best setting and I think I need a little help.
    I've been playing with the parameters -n -q -k and -l and what I basically wanted to achieve is to allow only one or two mismatches per read. I've tried to disable seeding by setting a very high value for -l and then setting -n 1 or 2, but in IGV I can still see some reads that have a lot more mismatches.
    The last I tried out was:
    -n 2 -k 1 -l 10 -q 15
    but still I see reads with more than 2 mismatches...
    I would be very interested in why there are still more mismatches in my reads than I thought I allowed and also if you have some examples of parameters that you use for bwa Illumina mapping. I know it depends on the project etc. (like using -q 15 only if many of your reads have bad quality towards the end) but it would be good to get an idea what other people use concerning seeding/no seeding/number of mismatches etc.
    Thanks in advance!

  • #2
    I have the same problem. Hope someone can help us!

    Comment


    • #3
      The reason you are seeing reads with more mismatches than specified is because you have paired-end reads, and with paired-end read resolution usually you have a situation where one end maps to some location within the mismatch threshold specified, and the other doesn't, so the other end is then aligned using Smith-Waterman algorithm to the region where one would expect to find it, sometimes producing a quite a few mismatches, indels, or even clipping.

      Some mappers allow a user to specify an option that only "independently-mapped" reads should be paired, which would prevent this. Perhaps there is some work-around with BWA, but I would just filter out reads with more mismatches than normal (note that if pair concordance is important to you, the correct approach is to just accept the fact that some reads will contain more mismatches/indels then specified).
      Last edited by n00c; 03-15-2011, 07:22 AM.

      Comment


      • #4
        If you really want to disable the sensitive mate mapping feature you can do that in bwa sampe with the -s option. Then you could go back and filter your reads for only those where both mates are still mapped using the `-F 12` feature in samtools view (-F means ignore reads containing a flag and 12 = 0x4+0x8 which are the flags for read unmapped and mate-unmapped). I think that would result in only mapped mated reads with 2 mismatches if you set -n 2. I don't think you need to mess with seeding or anything else.

        Comment


        • #5
          BWA searches for mappings up to N+1 differences, where N is the # of differences you specified (or calculated by read length given no option). The (N+1) is to guarantee all N differences. You could map each end of the read independently ("bwa aln" then "bwa samse") to see if you still see more than N+1 differences, and report them here.

          Comment


          • #6
            Thank you so much for all your answers! I guess given the reasons for more mismatches than specified, I will keep those alignments. It doesn't seem to affect too many reads (from what I see in IGV), so I think I can live with that.
            Does anyone also have an example of the parameters they use for mapping of 75 bp mate-pair Illumina reads? That would be very helpful!

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Essential Discoveries and Tools in Epitranscriptomics
              by seqadmin




              The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
              04-22-2024, 07:01 AM
            • seqadmin
              Current Approaches to Protein Sequencing
              by seqadmin


              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
              04-04-2024, 04:25 PM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, Yesterday, 11:49 AM
            0 responses
            15 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-24-2024, 08:47 AM
            0 responses
            16 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-11-2024, 12:08 PM
            0 responses
            61 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 10:19 PM
            0 responses
            60 views
            0 likes
            Last Post seqadmin  
            Working...
            X