Novoalign offers the following strategies to deal with multireads (reads mapping to multiple locations):
None: No alignments will be reported. The read will be reported as a status R with a count of the number of alignments. No alignment locations will be reported.
Random: A single alignment location is randomly chosen from amongst the alignment results. The choice is made using posterior alignment probabilities.
All: All alignment locations are reported. Note, that this is all alignments with a score within 5 points of the best alignment unless you use the R99 option to extend the range.
Exhaustive: This option bypasses the iterative alignment process and the normal repeat alignment detection. It finds all alignments with a score no worse than the threshold (t 99 option) and reports all the locations.
0.99: Sets a posterior probability threshold. Any alignment with a posterior probability, P(Ai| R, G) greater than this value will be reported. Eaxmple: r 0.01 will report all alignments with a probability greater then 0.01.
Which of the options should be used for SNP detection?
If "None" is used then SNPs in some repetitive regions will be completely omitted.
Using the "All" option will avoid it, but can introduce fake SNPs (if reads from slightly different repetitive regions are mapped to the same location).
The –R option can limit the definition of alignments as "identical" based on the alignment score. "This score difference is set by the 'R99' option and defaults to 5 which corresponds to the best alignment being approximately 3 times more probable than the next best alignment. For example, two alignments with probabilities 0.7 (score 1) and 0.3 (score = 5) would be considered as multiple alignments to the read. Two alignments with probabilities 0.8 (Score 0) and 0.2 ( score 7) would be treated as a unique alignment to the location with the higher probability."
So is using "All" with "-R 1" the best setting for SNP detection?
None: No alignments will be reported. The read will be reported as a status R with a count of the number of alignments. No alignment locations will be reported.
Random: A single alignment location is randomly chosen from amongst the alignment results. The choice is made using posterior alignment probabilities.
All: All alignment locations are reported. Note, that this is all alignments with a score within 5 points of the best alignment unless you use the R99 option to extend the range.
Exhaustive: This option bypasses the iterative alignment process and the normal repeat alignment detection. It finds all alignments with a score no worse than the threshold (t 99 option) and reports all the locations.
0.99: Sets a posterior probability threshold. Any alignment with a posterior probability, P(Ai| R, G) greater than this value will be reported. Eaxmple: r 0.01 will report all alignments with a probability greater then 0.01.
Which of the options should be used for SNP detection?
If "None" is used then SNPs in some repetitive regions will be completely omitted.
Using the "All" option will avoid it, but can introduce fake SNPs (if reads from slightly different repetitive regions are mapped to the same location).
The –R option can limit the definition of alignments as "identical" based on the alignment score. "This score difference is set by the 'R99' option and defaults to 5 which corresponds to the best alignment being approximately 3 times more probable than the next best alignment. For example, two alignments with probabilities 0.7 (score 1) and 0.3 (score = 5) would be considered as multiple alignments to the read. Two alignments with probabilities 0.8 (Score 0) and 0.2 ( score 7) would be treated as a unique alignment to the location with the higher probability."
So is using "All" with "-R 1" the best setting for SNP detection?
Comment