SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Quality-, adapter- and RRBS-trimming with Trim Galore! fkrueger Bioinformatics 132 04-18-2017 01:04 AM
Adapter trimming figo1019 RNA Sequencing 1 04-07-2014 10:58 AM
Adapter trimming and trimming by quality question alisrpp Bioinformatics 5 04-08-2013 04:55 PM
adapter trimming - help a_mt Bioinformatics 6 11-12-2012 07:36 PM
3' Adapter Trimming caddymob Bioinformatics 0 05-27-2009 12:53 PM

Reply
 
Thread Tools
Old 09-12-2017, 05:50 AM   #261
cuencam
Junior Member
 
Location: Zurich

Join Date: Aug 2017
Posts: 5
Default

Hi Brian,
In the same lines of my previous question, what is the rationale of using maq=10? We are interesting in de novo assembly of metagenomic data and we were worried that low quality bases at the ends of the reads might feed artificial k-mers in to the assembler (SPADES). I read that you recommend read normalization, but since our coverage is highly unequal (due to unequal species abundance, not because sequencing artifacts) we are worried that this might introduce more biases than the ones it solves.

We were thinking on using your newly implemented option "mbq" to secure that all bases have 20 as minimum quality. Do you believe that this is a good alternative?
cuencam is offline   Reply With Quote
Old 09-12-2017, 12:43 PM   #262
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,668
Default

"maq=10" is to throw away really junky reads. The only way to really verify whether a setting is beneficial is to actually test it, unfortunately. But personally, I think "mbq=20" would be too aggressive (particularly if your sequencing run had a single low-quality cycle, in which case it would discard all of the data)... if you really want to get rid of the low-quality trailing bases, I'd suggest quality-trimming instead (qtrim=r trimq=14 or something like that). Spades is pretty robust with respect to low-quality data anyway; the biggest problem is that it low quality reads balloon the kmer-space which can make it run out of memory.

The main advantage of normalization with metagenomes, in fact, is that it removes a lot of data which allows Spades to run on datasets that it can't otherwise handle. It's not strictly beneficial and if you can assemble a metagenome without normalization, that may be better - sometimes normalization improves the assembly, sometimes it doesn't.
Brian Bushnell is offline   Reply With Quote
Old 09-13-2017, 03:19 AM   #263
cuencam
Junior Member
 
Location: Zurich

Join Date: Aug 2017
Posts: 5
Default

Thanks for this response! I'm pretty sure that your excellent user support is only comparable to the high quality of your tools!

I will implement quality-trimming at a higher threshold and then test. I do agree that mbq=20 is hard for assembly (but probably useful for SNV).
Cheers
cuencam is offline   Reply With Quote
Old 09-15-2017, 04:45 AM   #264
EssigSchurke
Junior Member
 
Location: Germany

Join Date: Jul 2013
Posts: 5
Default

Hi Brian,

I tried to filter reads longer 10bp. I used the following command:

Code:
bbduk.sh -in=input.fq -out=output.fq -maxlength=10
However, nothing happens, I get the same amount of reads as in the input. But all reads are longer 10bp.
I used the latest version of bbduk 37.53

Test Input:

Code:
@test
ACTGGACTTGGAGTCAGAAGGC
+
b\\[\ZZ[][a]_]]cbbbabc
Code:
Input:                  	1 reads 		22 bases.
Total Removed:          	0 reads (0.00%) 	0 bases (0.00%)
Result:                 	1 reads (100.00%) 	22 bases (100.00%)
EssigSchurke is offline   Reply With Quote
Old 09-15-2017, 05:11 AM   #265
jazz710
Member
 
Location: Iowa

Join Date: Oct 2012
Posts: 41
Default

The BBDuk commands don't have '-' before them. Your command should read:

bbduk.sh in=input.fq out=output.fq maxlength=10

Give that a shot?
jazz710 is offline   Reply With Quote
Old 09-15-2017, 05:15 AM   #266
EssigSchurke
Junior Member
 
Location: Germany

Join Date: Jul 2013
Posts: 5
Default

With or without "-" does not matter, I get same results.
EssigSchurke is offline   Reply With Quote
Old 09-15-2017, 05:30 AM   #267
cuencam
Junior Member
 
Location: Zurich

Join Date: Aug 2017
Posts: 5
Default

Hi EssigSchurke
The flag is minlength=10

The whole command is
bbduk.sh in=input.fq out=output.fq minlength=10

Edit:

I misread your question. The command provided by jazz710 is the appropriate, and works on my computer. You want to remove the big reads, correct?

Last edited by cuencam; 09-15-2017 at 05:33 AM.
cuencam is offline   Reply With Quote
Old 09-15-2017, 05:35 AM   #268
EssigSchurke
Junior Member
 
Location: Germany

Join Date: Jul 2013
Posts: 5
Default

Hi cuencam,

minlength=10 filters only reads shorter 10bp. I want to filter reads longer 10bp, whereas 10bp is only a dummy for my test case.
EssigSchurke is offline   Reply With Quote
Old 09-15-2017, 06:31 AM   #269
EssigSchurke
Junior Member
 
Location: Germany

Join Date: Jul 2013
Posts: 5
Default

Yes, I want to exclude large reads, but I tested the command provided by jazz710. It produces the same result, the test read is still in the output.
EssigSchurke is offline   Reply With Quote
Old 09-15-2017, 09:31 AM   #270
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,668
Default

Actually, all the BBTools strip off the leading "-" so you can put as many of them as you want

This is a bug. Thanks for the report! It looks like BBDuk only removes reads under minlen or over maxlen if they were trimmed; untrimmed sequences will pass regardless of their length. Sorry about that! Reformat actually works correctly in this case, though:

Code:
reformat.sh in=x.fq out=y.fq minlen=A maxlen=B
I'll fix BBDuk ASAP. Thanks again!
Brian Bushnell is offline   Reply With Quote
Old 09-17-2017, 10:01 PM   #271
EssigSchurke
Junior Member
 
Location: Germany

Join Date: Jul 2013
Posts: 5
Default

Thanks for the fast response. I will try reformat.
EssigSchurke is offline   Reply With Quote
Reply

Tags
adapter, bbduk, bbtools, cutadapt, trimmomatic

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 06:00 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO