Go Back   SEQanswers > Bioinformatics > Bioinformatics

Similar Threads
Thread Thread Starter Forum Replies Last Post
Trimmomatic vs mslider Bioinformatics 1 04-18-2017 10:10 AM
bbduk and kmer masking cmccabe Bioinformatics 2 10-30-2015 10:16 AM
ion proton. Reads length much shorter than library size? slls9969 Ion Torrent 2 03-10-2015 07:44 AM
K-mer information and minimum contig size in SPAdes Tanner_6984 Bioinformatics 0 09-25-2014 11:33 AM
k-mer size impacts coverage distribution (animated gif inside!) seb567 Bioinformatics 0 11-06-2010 05:20 PM

Thread Tools
Old 08-10-2017, 10:14 AM   #1
Location: Florida, US

Join Date: May 2017
Posts: 14
Default masking shorter than k-mer size

I've come across a situation using bbduk to mask the k-mers from one genome assembly in another where the resulting number of masked bases is some sequences is shorter than the k value.

I am running the command like this:
# version 37.something. in=mygenome.fa out=mygenome_masked.fa ref=ecoli.fasta k=15 
qkmask=X maskfullycovered=t maskmiddle=f
In mygenome_masked.fa (a multisequence fasta file), there are a sizable number of sequences with a total of 0 < bases_masked < k(=15). It seems strange to have 3 nucleotides masked when k=15, and I am wondering if anyone can point out what options I should be using to prevent this from happening
cstack is offline   Reply With Quote
Old 08-10-2017, 12:40 PM   #2
Brian Bushnell
Super Moderator
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,701

That would be the "maskfullycovered" flag. That means only bases covered entirely by reference kmers will be masked.

For example, take this sequence you want to mask:

And this reference:

The ref kmers (ignoring reverse complement, at K=3) are CGT, GTT, and TTG. They line up like:
Every based is covered by 3 kmers, but only the first T is "fully covered" - covered by 3 ref kmers. So it's the only one masked. Whereas without "maskfullycovered", the entire CGTTG would be masked.

Incidentally, it depends on what your goal is, but normally I find K=15 to be very short for masking... typically I use K=31.
Brian Bushnell is offline   Reply With Quote

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

All times are GMT -8. The time now is 03:37 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO