SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Quality-, adapter- and RRBS-trimming with Trim Galore! fkrueger Bioinformatics 138 11-12-2020 04:58 PM
Adapter trimming figo1019 RNA Sequencing 2 07-17-2018 05:00 AM
Adapter trimming and trimming by quality question alisrpp Bioinformatics 5 04-08-2013 05:55 PM
adapter trimming - help a_mt Bioinformatics 6 11-12-2012 08:36 PM
3' Adapter Trimming caddymob Bioinformatics 0 05-27-2009 01:53 PM

Reply
 
Thread Tools
Old 08-26-2021, 11:54 PM   #341
emortiz
Junior Member
 
Location: Germany

Join Date: Feb 2015
Posts: 3
Default

I have repeated Brian Bushnell's comparison among adaptor trimmers, this time including the most recent versions of Cutadapt, Trimmomatic, and fastp. Here are my commands:

Code:
# Cutadapt 3.4:
time cutadapt -m 21 -j 0 -b "file:gruseq.fa" -B "file:gruseq.fa" -o cutadapt_R1.fq.gz -p cutadapt_R2.fq.gz dirty_R1.fq.gz dirty_R2.fq.gz

# Trimmomatic 0.39:
time trimmomatic PE -phred33 dirty_R1.fq.gz dirty_R2.fq.gz trimmomatic_R1.fq.gz trimmomatic_U1.fq.gz trimmomatic_R2.fq.gz trimmomatic_U2.fq.gz ILLUMINACLIP:gruseq.fa:2:28:10:2:keepBothReads MINLEN:21

# fastp 0.22.0:
time fastp -w 8 -Q -l 21 --adapter_fasta gruseq.fa --detect_adapter_for_pe --in1 dirty_R1.fq.gz --in2 dirty_R2.fq.gz --out1 fastp_R1.fq.gz --out2 fastp_R2.fq.gz

# bbduk.sh 38.92:
time bbduk.sh in=dirty_R#.fq.gz out=bbduk_R#.fq.gz ref=gruseq.fa ktrim=r mink=12 hdist=1 minlen=21 tpe tbo

# bbduk.sh 38.92 (x2):
time bbduk.sh ktrim=r minlength=21 interleaved=f tpe tbo ref=gruseq.fa in=dirty_R#.fq.gz out=stdout.fq k=21 mink=11 hdist=2 | bbduk.sh ktrim=r minlength=21 interleaved=f tpe tbo ref=gruseq.fa in=stdin.fq out=bbduk_x2_R#.fq.gz k=19 mink=9 hdist=1
And these were the results:
MetricdirtyCutadaptTrimmomaticfastpbbdukbbduk(x2)
Time to cleanNA3m42.848s1m11.250s1m42.455s0m9.574s0m15.249s
Reads retained100.00093.34592.51492.99793.00292.994
Bases retained100.00074.5374.43673.97074.26874.186
Perfectly correct (Reads)49.97097.3580.92296.25694.84995.784
Perfectly correct (Bases)49.97096.9286.42696.03593.90095.099
Incorrect (Reads)50.0302.6519.0783.7445.1514.216
Incorrect (Bases)50.0303.0813.5743.9656.1004.901
Adaptors remaining (Reads)50.0302.415.8461.8303.8662.798
Adaptors remaining (Bases)25.1820.280.4220.0490.1930.105
Non-adaptor removed (Reads)0.0001.5313.2311.9141.2851.418
Non-adaptor removed (Bases)0.0000.040.2180.5660.3080.325

I still prefer bbdu.sh for its speed and high accuracy. fastp had slightly higher accuracy but it sometimes mistakes genomic sequence for adaptor (see this post). However Cutadapt now is clearly more accurate (but the slowest by far). I wonder if anybody can recommend some settings that could increase bbduk's accuracy a little more?
emortiz is offline   Reply With Quote
Old 09-27-2021, 10:52 PM   #342
lituan
Junior Member
 
Location: shenzhen,china

Join Date: Jan 2017
Posts: 2
Default bbduk hdist=0 does not work

Hi

I want to search CCGG in reads, so I tried following command


bbduk.sh in=test.fq literal=CCGG k=4 hdist=0 overwrite=t out=test.result.fq

in test.fq, only one read, no CCGG

@A00582:707:HJ3NTDSX2:2:1101:5267:1658 1:N:0:AACCTC
TATTAAGCAGAAGGGCAGGCTGGAAAATCCTCTTCAGCAGAACGGTGGACTGAGGCTCACTGCTATCAAGGTGGACAGGCTTCTCTGCTCAGCAAACCAGGCTCACCCAGGGGTGCTCTACACAGACTCGGGCTCGCTAA
+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF

but , result this read is matched

BBDuk version 37.62
Initial:
Memory: max=210091m, free=200226m, used=9865m

Added 1 kmers; time: 0.151 seconds.
Memory: max=210091m, free=192553m, used=17538m

Input is being processed as unpaired
Started output streams: 0.269 seconds.
Processing time: 0.116 seconds.

Input: 1 reads 140 bases.
Contaminants: 1 reads (100.00%) 140 bases (100.00%)
Total Removed: 1 reads (100.00%) 140 bases (100.00%)
Result: 0 reads (0.00%) 0 bases (0.00%)

Time: 0.564 seconds.
Reads Processed: 1 0.00k reads/sec
Bases Processed: 140 0.00m bases/sec

it seems hdist does not work

Could you give some advice
lituan is offline   Reply With Quote
Old 09-28-2021, 04:13 AM   #343
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,127
Default

@lituan you will need to use k=2 or less with such a short pattern (CCGG). Even then it may not work well.

This may be a case where you would want to use a different package called Seqkit. Specific tool would be "seqkit grep".
GenoMax is offline   Reply With Quote
Old 09-28-2021, 05:59 PM   #344
lituan
Junior Member
 
Location: shenzhen,china

Join Date: Jan 2017
Posts: 2
Default

Quote:
Originally Posted by GenoMax View Post
@lituan you will need to use k=2 or less with such a short pattern (CCGG). Even then it may not work well.

This may be a case where you would want to use a different package called Seqkit. Specific tool would be "seqkit grep".


Thank you , I tried seqkit grep , it works as expected
lituan is offline   Reply With Quote
Old 10-13-2021, 07:58 AM   #345
dhrpat
Junior Member
 
Location: UK

Join Date: Oct 2021
Posts: 2
Default

Hi Brian, Thank you for the detailed post, it is very helpful. I have a basic question, can bbduk.sh be used for adapter trimming and host contamination removal at the same time? Because in the manual there is only one ref option. Can we provide one adapter file and one host contaminant database at the same time?
Also can we provide the same Illumina nextera adapter file which is also used for trimmomatic?

Any help would be appreciated.
DP
dhrpat is offline   Reply With Quote
Old 10-13-2021, 12:47 PM   #346
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,127
Default

DP: You should use `bbsplit.sh` to do read-binning to remove host data contamination. There is a thread here that describes how to use that tool.

Use bbduk for just adapter removal. Using it in filter mode may work but you may still need to do two runs (one to remove adapter and other to filter).
GenoMax is offline   Reply With Quote
Old 10-13-2021, 02:41 PM   #347
popo55
Junior Member
 
Location: US

Join Date: May 2016
Posts: 1
Default ecco=t trims reads?

Why does the ecco option trim the reads? I thought it would just change the sequence and quality scores. For example this command:

bbduk.sh in1=<read1> in2=<read2> out1=<outread1> out2=<outread2> ecco=t kmask=lc ref="phix"

if run without ecco=t has no trimmed reads, as expected. But, with ecco=t, some reads are trimmed. Why? (Does it have to do with ecco changing bp to Ns when they disagree and quality is the same?? But, I am not sure how to prevent this)
popo55 is offline   Reply With Quote
Old 10-14-2021, 02:02 AM   #348
dhrpat
Junior Member
 
Location: UK

Join Date: Oct 2021
Posts: 2
Default

Thank you Geno Max, so I will try use bbduk for adapter removal and bbsplit.sh for host contaminant removal. It is okay to use bbsplit for host contaminat removal using a database of host sequences rather than individual sequences as seen in the example right?

Would you be able to explain as to why bbsplit will work better to remove host contaminant as compared to bbduk?

Many thanks,
DP
dhrpat is offline   Reply With Quote
Reply

Tags
adapter, bbduk, bbtools, cutadapt, trimmomatic

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 09:24 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO