Because your literal adapter is also matching the beginning of the read entire sequence to the right is removed (ktrim=r). If you change the sequence at the beginning by one base you will see that the initial part is retained as you expect.
Unconfigured Ad
Collapse
X
-
Hi Brian,
I would like to switch from solexaqa to BBduk for my read trim and filtering option, however since we are working with bacterial strains typification we would like to have the option to only keep trimmed reads where no individual base has a quality lower than a defined threshold (instead of average region quality). Could this be done with BBduk?
Thanks!
Comment
-
-
Hi cuencam,
It is currently not possible to do this, other than discarding all reads that have any undefined (quality 0) bases with "maxns=0". I never saw a reason to discard all reads with a single base below a specified cutoff. It would be simple enough to add (or implement via a custom script), but can you explain why you're doing it? The "average region quality" method is the best at maximizing coverage while minimizing the total number of errors.
Edit - anyway, it was quick to add, so it will be in the next release as the "mbq" ("minbasequality") flag.Last edited by Brian Bushnell; 08-14-2017, 10:44 AM.
Comment
-
-
Hi Brian,
Thanks for such a quick response, and implementing this so fast!
Our interest is that during SNV calling in low coverage regions or low abundance taxa (in metagenomics) base quality can be more important than coverage. This way we can assess properly different alleles and avoid the creation of artifacts
Cheers!
Edit - Do you have an estimated next release date?
Comment
-
-
Hi Brian,
In the same lines of my previous question, what is the rationale of using maq=10? We are interesting in de novo assembly of metagenomic data and we were worried that low quality bases at the ends of the reads might feed artificial k-mers in to the assembler (SPADES). I read that you recommend read normalization, but since our coverage is highly unequal (due to unequal species abundance, not because sequencing artifacts) we are worried that this might introduce more biases than the ones it solves.
We were thinking on using your newly implemented option "mbq" to secure that all bases have 20 as minimum quality. Do you believe that this is a good alternative?
Comment
-
-
"maq=10" is to throw away really junky reads. The only way to really verify whether a setting is beneficial is to actually test it, unfortunately. But personally, I think "mbq=20" would be too aggressive (particularly if your sequencing run had a single low-quality cycle, in which case it would discard all of the data)... if you really want to get rid of the low-quality trailing bases, I'd suggest quality-trimming instead (qtrim=r trimq=14 or something like that). Spades is pretty robust with respect to low-quality data anyway; the biggest problem is that it low quality reads balloon the kmer-space which can make it run out of memory.
The main advantage of normalization with metagenomes, in fact, is that it removes a lot of data which allows Spades to run on datasets that it can't otherwise handle. It's not strictly beneficial and if you can assemble a metagenome without normalization, that may be better - sometimes normalization improves the assembly, sometimes it doesn't.
Comment
-
-
Thanks for this response! I'm pretty sure that your excellent user support is only comparable to the high quality of your tools!
I will implement quality-trimming at a higher threshold and then test. I do agree that mbq=20 is hard for assembly (but probably useful for SNV).
Cheers
Comment
-
-
Hi Brian,
I tried to filter reads longer 10bp. I used the following command:
However, nothing happens, I get the same amount of reads as in the input. But all reads are longer 10bp.Code:bbduk.sh -in=input.fq -out=output.fq -maxlength=10
I used the latest version of bbduk 37.53
Test Input:
Code:@test ACTGGACTTGGAGTCAGAAGGC + b\\[\ZZ[][a]_]]cbbbabc
Code:Input: 1 reads 22 bases. Total Removed: 0 reads (0.00%) 0 bases (0.00%) Result: 1 reads (100.00%) 22 bases (100.00%)
Comment
-
-
Hi EssigSchurke
The flag is minlength=10
The whole command is
bbduk.sh in=input.fq out=output.fq minlength=10
Edit:
I misread your question. The command provided by jazz710 is the appropriate, and works on my computer. You want to remove the big reads, correct?Last edited by cuencam; 09-15-2017, 05:33 AM.
Comment
-
-
Actually, all the BBTools strip off the leading "-" so you can put as many of them as you want
This is a bug. Thanks for the report! It looks like BBDuk only removes reads under minlen or over maxlen if they were trimmed; untrimmed sequences will pass regardless of their length. Sorry about that! Reformat actually works correctly in this case, though:
I'll fix BBDuk ASAP. Thanks again!Code:reformat.sh in=x.fq out=y.fq minlen=A maxlen=B
Comment
-
Latest Articles
Collapse
-
by GATTACATLove this - good data definitely starts from good input, and poor input can only give relatively poor data. I particularly like the mention of Nanodrop/absorbance based methods for quantification. It's such a toss up if you'll get an accurate reading or what amounts to a randomly generated number, and a lot of library/sequencing related issues can be traced back to poor quant.
-
Channel: Articles
07-01-2026, 11:43 AM -
-
by SEQadmin2
I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.
Here are nine questions we think about, in roughly the order they matter, before...-
Channel: Articles
-
ad_right_rmr
Collapse
News
Collapse
| Topics | Statistics | Last Post | ||
|---|---|---|---|---|
|
Started by SEQadmin2, Yesterday, 11:08 AM
|
0 responses
6 views
0 reactions
|
Last Post
by SEQadmin2
Yesterday, 11:08 AM
|
||
|
Started by SEQadmin2, 06-30-2026, 05:37 AM
|
0 responses
11 views
0 reactions
|
Last Post
by SEQadmin2
06-30-2026, 05:37 AM
|
||
|
Started by SEQadmin2, 06-26-2026, 11:10 AM
|
0 responses
19 views
0 reactions
|
Last Post
by SEQadmin2
06-26-2026, 11:10 AM
|
||
|
Whole-Genome Sequencing Traces Faroe Islands Ancestry to a North Atlantic Founder Population
by SEQadmin2
Started by SEQadmin2, 06-17-2026, 06:09 AM
|
0 responses
53 views
0 reactions
|
Last Post
by SEQadmin2
06-17-2026, 06:09 AM
|
Comment