I've come across a situation using bbduk to mask the k-mers from one genome assembly in another where the resulting number of masked bases is some sequences is shorter than the k value.
I am running the command like this:
In mygenome_masked.fa (a multisequence fasta file), there are a sizable number of sequences with a total of 0 < bases_masked < k(=15). It seems strange to have 3 nucleotides masked when k=15, and I am wondering if anyone can point out what options I should be using to prevent this from happening
I am running the command like this:
Code:
# version 37.something. bbduk.sh in=mygenome.fa out=mygenome_masked.fa ref=ecoli.fasta k=15 qkmask=X maskfullycovered=t maskmiddle=f
Comment