Hi all,
According to the bowtie manual and some posts I've read, the -e/--maqerr <int> option indicates the maximum sum of quality scores allowed at the mismatched bases throughout the entire alignment and as such can control the total number of mismatches over the entire read length.
I understand that the higher this option will be, the higher number of alignments I will obtain. But I still have trouble understanding the logic behind this parameter. Indeed let's say I set -e 70 with --nomaqround.
A read with an overall high quality (for ex. each of its base has a Phred score of 38) and 3 mismatched bases to the reference sequence will be excluded from the alignment, since (38 * 3) > 70. While another read with an overall poor quality (for instance, having a Phred score of 10 for each of its bases) and 5 mismatches will be kept, since (10 * 5) < 70. But if we suppose that bases with low quality have higher chance to be sequencing errors than true variations, I'd rather exclude the latter read and keep the former one... (No ?)
If anyone could help me understand this parameter and its usage I would be very grateful.
Cheers
According to the bowtie manual and some posts I've read, the -e/--maqerr <int> option indicates the maximum sum of quality scores allowed at the mismatched bases throughout the entire alignment and as such can control the total number of mismatches over the entire read length.
I understand that the higher this option will be, the higher number of alignments I will obtain. But I still have trouble understanding the logic behind this parameter. Indeed let's say I set -e 70 with --nomaqround.
A read with an overall high quality (for ex. each of its base has a Phred score of 38) and 3 mismatched bases to the reference sequence will be excluded from the alignment, since (38 * 3) > 70. While another read with an overall poor quality (for instance, having a Phred score of 10 for each of its bases) and 5 mismatches will be kept, since (10 * 5) < 70. But if we suppose that bases with low quality have higher chance to be sequencing errors than true variations, I'd rather exclude the latter read and keep the former one... (No ?)
If anyone could help me understand this parameter and its usage I would be very grateful.
Cheers