I've been using bwa mem to align illumina reads to contaminant genomic databases (ribosomal RNAs, gbbct, etc). I've noticed unusually high percentages of reads mapping to my contaminants (10% or more when I expected relatively little contamination) and suspiciously high percentages of reads mapping to multiple contaminant databases (human as well as gbbct, for instance). I blast-ed a few of these "contaminant" bacterial reads online, and they were showing relatively poor alignments to bacterial sequences (e.g., 30-40% of the read mapping with a few mismatches).
Right now I'm using the default parameters, which obviously isn't working. I'm wondering if someone could give me some advice on how to tweak things to restrict myself to better alignments.
Right now I'm using the default parameters, which obviously isn't working. I'm wondering if someone could give me some advice on how to tweak things to restrict myself to better alignments.
Comment