![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Yes .. BBMap can do that! | GenoMax | Bioinformatics | 240 | 08-22-2019 02:34 AM |
Introducing BBSplit: Read Binning Tool for Metagenomes and Contaminated Libraries | Brian Bushnell | Bioinformatics | 62 | 10-08-2018 02:48 AM |
BBmap dedupe help | JamesSeward | Bioinformatics | 7 | 07-15-2016 11:20 PM |
BBMap for BitSeq | dietmar13 | Bioinformatics | 1 | 04-30-2015 09:40 AM |
BBMap Error | Phage Hunter | Bioinformatics | 5 | 01-14-2015 05:34 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Junior Member
Location: Switzerland Join Date: Jan 2017
Posts: 1
|
![]()
Several questions regarding BBMap/BBSplit:
1. The ambig flag -- Brian Bushnell states: Set behavior on ambiguously mapped reads (with multiple top-scoring mapping locations) --> How is "ambiguously mapped reads"/"top-scoring" defined exactly? Is it about reads which map "good enough" to be mapped on several positions in the first place, but the mapping quality can differ? Or what is the rule/statistic you use? Do they have to be "significantly better" than some lower-scoring reads (which still would map). Is is there some exact doc / or where in the code would I find it to see what exactly is going on? :-) 2. BBSplit: Are the criteria for ambig2 the same as ambig? (Except the fact that we are talking about different ref. genomes). What would happen if we have the following three scenarios: We have three top-scoring hits for one read (let's say they have score1 to score3, score1 is the best, but all are "very good" hits). We have two hits to ref1 with score1 and score3, one to ref2 with score2 Scenario 1: I have ambig=best and ambig2=best --> which aligments get reported? Scenario 2: I have ambig=all and ambig2=best --> which aligments get reported? Scenario 3: I have ambig=best and ambig2=all --> which aligments get reported? 3. How to intrepret the "XT" flag in the sam file (like shown in IGV): - What does "XT = R" mean? Repeat? - What does the flag "AM" mean? Many thanks for this good tool! Michael Last edited by MSchm; 01-18-2017 at 11:51 PM. |
![]() |
![]() |
![]() |
#2 |
Super Moderator
Location: Walnut Creek, CA Join Date: Jan 2014
Posts: 2,707
|
![]()
Hi Michael,
1) BBMap's scoring is based on an affine-transformed alignment. It's similar to calculating % identity, except that there are different weights for insertions, deletions, and substitutions; and extending an event (like going from a length 1 deletion to a length 2 deletion) has a diminishing penalty. A bonus is also added to the score of reads that are mapped in a properly-paired configuration. The top-scoring site is the one with the top score given the weight matrix (which is hard-coded). Generally, a site with one mismatch scores better than a site with one deletion, or a site with two mismatches, etc. The decision of whether a read is ambiguous depends on the "clearzone" which is by default roughly the penalty you get from 2 mismatches. So, if the best site A has 1 mismatch and the second-best site B has 2 mismatches, they will both be considered top-scoring sites and the read will be classified as ambiguous. If site A has 1 mismatch and site B has 5 mismatches, the read will not be considered ambiguous. The clearzone is variable, though. Reads where the best site is perfect use a smaller clearzone (1.6*substitution penalty) while reads where the best site is a very poor match have a bigger clearzone (up to 8*substitution penalty). So if site A has 0 mismatches and site B has 2 mismatches, that would be unambiguous; but if site A has 20 mismatches and site B has 25 mismatches, that would be ambiguous. 2) The definition of reads for "ambig" and "ambig2" is identical, score-wise. However, ambig2 only considers alignments to different references. If the top site was on ref X with 1 mismatch, and the second-best was on ref Y with 2 mismatches, that would be considered ambig and ambig2. But if both sites were on ref X, that would be considered ambig but not ambig2. Scenario 1: The top-scoring site only will be reported. If there are multiple sites within the clearzone with different scores, it will use the best only. If there is a tie, it will use the reference you specified first (so, it would go to ref1.) Scenario 2-3: Ambig2 overrides ambig. I don't recommend setting ambig if you are using ambig2; just leave it as default. Actually, I don't recommend using BBSplit to produce sam output - the output is always valid, but it can lead to unexpected results, like a sam file that you expect to be full of alignments to the mouse genome, but the alignments reported are actually to the human genome. This will happen for reads that map ambiguously to human and mouse - you will get two sam files; for reads that map uniquely to one organism, the alignments are fine; but for reads that map ambiguously, the alignments in the sam file will be the same for reads that are in both files. So, I suggest people only use BBSplit for fasta / fastq output, then remap the output if needed with BBMap. With ambig2=best or toss it doesn't really matter, since a read will only go to at most one file, but with ambig2=all or split, the output is not what you are expecting. 3) I copied XT:A:R/XT:A:U from some other tool... probably bowtie2 or TopHat, when I was trying to make my output compatible with the Tuxedo pipeline. XT:A:R means the read was considered ambiguous, and XT:A:U means it was considered unambiguous. |
![]() |
![]() |
![]() |
#3 | |
Member
Location: Midwest, USA Join Date: Jan 2016
Posts: 14
|
![]() Quote:
Thanks, MCMC |
|
![]() |
![]() |
![]() |
Tags |
bbmap |
Thread Tools | |
|
|