SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Quality-, adapter- and RRBS-trimming with Trim Galore! fkrueger Bioinformatics 132 04-18-2017 02:04 AM
Adapter trimming figo1019 RNA Sequencing 1 04-07-2014 11:58 AM
Adapter trimming and trimming by quality question alisrpp Bioinformatics 5 04-08-2013 05:55 PM
adapter trimming - help a_mt Bioinformatics 6 11-12-2012 08:36 PM
3' Adapter Trimming caddymob Bioinformatics 0 05-27-2009 01:53 PM

Reply
 
Thread Tools
Old 01-12-2017, 06:45 AM   #181
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,583
Default

Quote:
Originally Posted by SylvainL View Post
I am using BBDuk version 36.84, I guess that's the main difference...
Well .. let us remove that difference then

Code:
$ bbmap/bbduk.sh in=nn.fq out=stdout.fq mm=f hdist=0 edist=0 ktrim=l rcomp=f k=29 literal=CAACAGCAATATACCTTCTCGAGAGGTCT
java -Djava.library.path=/path_to/bbmap/jni/ -ea -Xmx19498m -Xms19498m -cp /path_to/bbmap/current/ jgi.BBDukF in=nn.fq out=stdout.fq mm=f hdist=0 edist=0 ktrim=l rcomp=f k=29 literal=CAACAGCAATATACCTTCTCGAGAGGTCT
Executing jgi.BBDukF [in=nn.fq, out=stdout.fq, mm=f, hdist=0, edist=0, ktrim=l, rcomp=f, k=29, literal=CAACAGCAATATACCTTCTCGAGAGGTCT]

BBDuk version 36.84
Initial:
Memory: max=19594m, free=19185m, used=409m

Added 1 kmers; time:    0.043 seconds.
Memory: max=19594m, free=18469m, used=1125m

Input is being processed as unpaired
Started output streams: 0.018 seconds.

@test
CTGTCCACAGATCGGAAGAGCACACGTCTGAACTCCAGTCACGTTTCGGAATCTC
+
FGGGCEGGGGIIIIIIIIIIIIIIIIGIIIIIIIIIIIIIGIIGFIIFIIIIIII
Processing time:                0.006 seconds.

Input:                          1 reads                 100 bases.
KTrimmed:                       1 reads (100.00%)       45 bases (45.00%)
Total Removed:                  0 reads (0.00%)         45 bases (45.00%)
Result:                         1 reads (100.00%)       55 bases (55.00%)

Time:                           0.076 seconds.
Reads Processed:           1    0.01k reads/sec
Bases Processed:         100    0.00m bases/sec
GenoMax is offline   Reply With Quote
Old 01-12-2017, 06:55 AM   #182
SylvainL
Senior Member
 
Location: Geneva

Join Date: Feb 2012
Posts: 174
Default

Ok, thanks for all this effort. I really don't catch it. Now, it's running with maxlength=1 and I get the expected results.
Quite weird
SylvainL is offline   Reply With Quote
Old 01-12-2017, 07:00 AM   #183
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,583
Default

Just for the record what OS/Java version are you using?

I am not using maxlength=1 and still get the correct answer. Strange indeed.
GenoMax is offline   Reply With Quote
Old 01-12-2017, 07:50 AM   #184
SylvainL
Senior Member
 
Location: Geneva

Join Date: Feb 2012
Posts: 174
Default

Ubuntu 12.04 server...
java version "1.7.0_121"
OpenJDK Runtime Environment (IcedTea 2.6.8) (7u121-2.6.8-1ubuntu0.12.04.1)
OpenJDK 64-Bit Server VM (build 24.121-b00, mixed mode)
SylvainL is offline   Reply With Quote
Old 01-12-2017, 09:22 AM   #185
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,695
Default

Sounds like, possibly, an intermittent filesystem problem / system problem? Please let me know if it happens again.
Brian Bushnell is offline   Reply With Quote
Old 01-12-2017, 09:27 AM   #186
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,583
Default

Quote:
Originally Posted by Brian Bushnell View Post
Sounds like, possibly, an intermittent filesystem problem / system problem? Please let me know if it happens again.
It appears to go away for @SylvainL by using maxlength=1 option, which is odd.

PS: I get the correct answer without the need of maxlength.

Last edited by GenoMax; 01-12-2017 at 09:49 AM.
GenoMax is offline   Reply With Quote
Old 01-12-2017, 09:43 AM   #187
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,695
Default

Quote:
Originally Posted by GenoMax View Post
It appears to go away by using maxlength=1 option, which is odd.
Oh - my impression was that the problem occurred once, but then was not replicable either with or without "maxlength=1" (which, actually, should make it so that there is no output at all in this case).

@SylvainL Sorry for the confusion, can you please clarify what output you are currently getting with and without "maxlength=1"? Currently, I got:

Code:
bbduk.sh in=nn.fq out=stdout.fq mm=f hdist=0 edist=0 ktrim=l rcomp=f k=29 literal=CAACAGCAATATACCTTCTCGAGAGGTCT

output:
@test
CTGTCCACAGATCGGAAGAGCACACGTCTGAACTCCAGTCACGTTTCGGAATCTC
+
FGGGCEGGGGIIIIIIIIIIIIIIIIGIIIIIIIIIIIIIGIIGFIIFIIIIIII
Code:
bbduk.sh in=nn.fq out=stdout.fq mm=f hdist=0 edist=0 ktrim=l rcomp=f k=29 literal=CAACAGCAATATACCTTCTCGAGAGGTCT maxlen=1

output:
...which is what I expect.
Brian Bushnell is offline   Reply With Quote
Old 01-12-2017, 10:11 PM   #188
SylvainL
Senior Member
 
Location: Geneva

Join Date: Feb 2012
Posts: 174
Default

Hi Brian,

if I do NOT set maxlength=1, the output is only reads shorter than 10 after trimming
if I set maxlength=1, I can get the normal output, i-e reads around 40-50bp after trimming...

I also set trimq=0 to be sure that it wasn't a problem of quality trimming ...

Here is my command:

Code:
~/Applications/bbmap/bbduk.sh in=./out/${MissMatch}_MM_${Indel}_Indel/${Samplename}.fastq outm=./out/${MissMatch}_MM_${Indel}_Indel/${Samplename}_left.fastq literal=$(cat ./out/Barcodes_with_adapters/${Samplename}) k=$(wc -m ./out/Barcodes_with_adapters/${Samplename} | awk '{print $1-1}') mm=f hdist=${MissMatch} edist=${Indel} ktrim=l rcomp=f maxlength=1 trimq=0;

Last edited by SylvainL; 01-12-2017 at 10:14 PM.
SylvainL is offline   Reply With Quote
Old 02-02-2017, 07:53 AM   #189
netasha
Junior Member
 
Location: USA

Join Date: Jul 2011
Posts: 8
Default

Hi Brian,

Do you think it is possible for bbduk2.sh to trim both 5' end and 3' end with each mapping to a 22 nt reference and and a 3 nt reference, respectively?
To be more specific, the 5' end primer is: GCGGAGATCTACCTACGTACTT and the 3' end primer is: TGT

Thanks for your great tool!
netasha is offline   Reply With Quote
Old 02-02-2017, 07:57 AM   #190
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,583
Default

Quote:
Originally Posted by netasha View Post
Hi Brian,

Do you think it is possible for bbduk2.sh to trim both 5' end and 3' end with each mapping to a 22 nt reference and and a 3 nt reference, respectively?
To be more specific, the 5' end primer is: GCGGAGATCTACCTACGTACTT and the 3' end primer is: TGT

Thanks for your great tool!
Pretty sure the answer is yes but you may want to do following instead. A similar question recently came up on Biostars and this was @Brian's recommendation.

Code:
bbduk.sh in=file.fq out=stdout.fq ktrim=r k=3 mm=f literal=TGT rcomp=f ktrimexclusive | bbduk.sh in=stdin.fq out=trimmed.fq ktrim=l k=22 mm=f literal=GCGGAGATCTACCTACGTACTT rcomp=f ktrimexclusive
GenoMax is offline   Reply With Quote
Old 02-02-2017, 08:16 AM   #191
netasha
Junior Member
 
Location: USA

Join Date: Jul 2011
Posts: 8
Default

Quote:
Originally Posted by GenoMax View Post
Pretty sure the answer is yes but you may want to do following instead. A similar question recently came up on Biostars and this was @Brian's recommendation.

Code:
bbduk.sh in=file.fq out=stdout.fq ktrim=r k=3 mm=f literal=TGT rcomp=f ktrimexclusive | bbduk.sh in=stdin.fq out=trimmed.fq ktrim=l k=22 mm=f literal=GCGGAGATCTACCTACGTACTT rcomp=f ktrimexclusive

Thanks for your quick reply!
I was thinking doing it sequentially as well. But what is the "ktrimexclusive"?
netasha is offline   Reply With Quote
Old 02-02-2017, 09:40 AM   #192
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,695
Default

Quote:
Originally Posted by netasha View Post
Thanks for your quick reply!
I was thinking doing it sequentially as well. But what is the "ktrimexclusive"?
Hi Netasha,

BBDuk2 can trim the left and right end at the same time, but it can only use a single kmer length, and as a result it won't work in your case. So, 2 passes with BBDuk using 2 different kmer lengths is better. "TGT" is super short, though, which will lead to overtrimming due to coincidental matches. Is there any more fixed sequence following that?

BBDuk's normal trimming behavior when matching a kmer is to trim the kmer itself and everything to the right/left of it. "ktrimexclusive" tells BBDuk to only trim to the right/left, but not to trim the matched kmer itself (so TGT or whatever would still remain in the read). Whether or not you should use that flag depends on whether the sequences you want to trim are genomic. For adapters, which are artificial, the ktrimexclusive flag should not be used, but in some cases it should.
Brian Bushnell is offline   Reply With Quote
Old 02-02-2017, 10:42 AM   #193
netasha
Junior Member
 
Location: USA

Join Date: Jul 2011
Posts: 8
Default

That's why I was looking for a parameter which can restrict this short 3mer to be anchored at the rightmost position. I thought you provided such parameters already: restrictright=3. Am I right?

Because I was using cutadapt and it can manage to trim it by adding a "$" at the end of the 3 mer: TGT$. If I'm wrong, would it be a lot of efforts to add such function to bbduk2.sh?

Thanks for the explanation of the "ktrimexclusive".
netasha is offline   Reply With Quote
Old 02-02-2017, 11:33 AM   #194
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,695
Default

Quote:
Originally Posted by netasha View Post
That's why I was looking for a parameter which can restrict this short 3mer to be anchored at the rightmost position. I thought you provided such parameters already: restrictright=3. Am I right?
Oh, yes, if you know the position then use restrictright=3. If you already know that 100% of the time you want to trim the last 3 bases, then you can just use "ftr2=3" instead of kmer-matching.
Brian Bushnell is offline   Reply With Quote
Old 02-26-2017, 12:15 PM   #195
JVGen
Member
 
Location: East Coast

Join Date: Jul 2016
Posts: 37
Default

Hi Brian,

I started using an HPC (36 CPUs @ 3.5 GHz each & 60 GB RAM) to processes NGS data. I noticed that while using BBDuk, neither the RAM nor processors are being challenged, yet BBDuk takes quite awhile to process the reads (about 10 minutes for 90,000 150bp x 2 paired reads). BBDuk Parameters:

Adapters
Trim: Right End Only
Kmer Length: 27
Max Substitutions: 3
Max Substitutions + INDELs: 0
Trim partial adapaters with kmer length: Yes, 7

Trim Low Quality - Yes
Both Ends
Minimum Quality: 20

Discard Short Reads - Yes
Minimum Length: 75 bp (changed to 75 because I found that primer dimers contributed to assembled reads when cutoff was set at 50)

Keep Original Order - Yes


I tried using the t=36 flag, but still don't get all of my processors utilized, and the RAM is set to 45 GB and only about 10 GB is utilized. BBMap on the other hand can and does cap out the processors during mapping on the HPC. I'm using BBDuk inside of Geneious, so if you think this is abnormal performance for BBDuk, I can reach out to them to inquire.

Thanks
Jake
JVGen is offline   Reply With Quote
Old 02-26-2017, 05:28 PM   #196
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,695
Default

Hi Jake,

BBDuk takes exponential time to process the reference when you increase the maximum number of substitutions. Specifically, allowing 3 substitutions with K=27 requires 82^3 or 551,368 times as long as normal for loading the reference, so may be the issue (it depends how big the reference is). During that time it will use roughly 7 threads. Processing 90,000 reads after the reference is loaded would take under 0.1 seconds.

I don't know all of the flags exposed by Geneious, but from what you've posted, I recommend these settings:

Adapters
Trim: Right End Only
Kmer Length: 23
Max Substitutions: 1 (I only go to 2 if the reads are very low quality, but never 3 for adapter trimming of paired reads)
Max Substitutions + INDELs: 0
Trim partial adapaters with kmer length: Yes, 11

Trim Low Quality - Yes
Right End Only
Minimum Quality: 12 (This varies; with very high coverage, this can be higher)

Discard Short Reads - Yes
Minimum Length: 75 bp (Again, this varies)

Keep Original Order - Yes


Additionally, you should add the flags "tbo" "tpe" to the custom flags area if they are not exposed in the GUI. So, please try that and let me know if the speed is still odd; with those settings it should take ~1 second, and result in less overtrimming.
Brian Bushnell is offline   Reply With Quote
Old 02-27-2017, 09:31 AM   #197
JVGen
Member
 
Location: East Coast

Join Date: Jul 2016
Posts: 37
Default

Quote:
Originally Posted by Brian Bushnell View Post
Hi Jake,

BBDuk takes exponential time to process the reference when you increase the maximum number of substitutions. Specifically, allowing 3 substitutions with K=27 requires 82^3 or 551,368 times as long as normal for loading the reference, so may be the issue (it depends how big the reference is). During that time it will use roughly 7 threads. Processing 90,000 reads after the reference is loaded would take under 0.1 seconds.

I don't know all of the flags exposed by Geneious, but from what you've posted, I recommend these settings:

Adapters
Trim: Right End Only
Kmer Length: 23
Max Substitutions: 1 (I only go to 2 if the reads are very low quality, but never 3 for adapter trimming of paired reads)
Max Substitutions + INDELs: 0
Trim partial adapaters with kmer length: Yes, 11

Trim Low Quality - Yes
Right End Only
Minimum Quality: 12 (This varies; with very high coverage, this can be higher)

Discard Short Reads - Yes
Minimum Length: 75 bp (Again, this varies)

Keep Original Order - Yes


Additionally, you should add the flags "tbo" "tpe" to the custom flags area if they are not exposed in the GUI. So, please try that and let me know if the speed is still odd; with those settings it should take ~1 second, and result in less overtrimming.
Hi Brian,

Thanks for the explanation. In general, I have pretty high coverage which is why I had the trimming parameters set so strict - a minimum of ~1000x coverage in 150 x 2 PE reads (I'm sequencing PCR amplicons). The only worry would be trimming non-adapter sequence from my reads by allowing too many substitutions.

What's a good number for minimum overlap when using the "tbo" flag?

I'll give this a try later today and see how it performs.

Thanks again!
Jake
JVGen is offline   Reply With Quote
Old 02-27-2017, 11:22 AM   #198
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,695
Default

The default minoverlap for "tbo" is 14bp, and the default mininsert is 40bp, which are fine except for uRNA (which needs mininsert=17) or a handful of other rare uses.
Brian Bushnell is offline   Reply With Quote
Old 02-28-2017, 04:18 AM   #199
JVGen
Member
 
Location: East Coast

Join Date: Jul 2016
Posts: 37
Default

Thanks Brian.

If I intend to merge reads using BBMerge, should I only Adapter Trim, Merge, and then Trim with the other parameters?

I'm hoping to use "mix=t" with BBMerge, if downstream assemblers (Tadpole, Spades, CLC) are capable of handling the mix of merged and unmerged reads as input. I assume reads that don't merge aren't poor quality - is that true? If unmerged reads are poor quality then I'll leave them unmixed and throw the unmerged reads out...

Jake
JVGen is offline   Reply With Quote
Old 02-28-2017, 10:19 AM   #200
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,695
Default

Hi Jake,

"mix" is primarily for error-correct via overlap (ecco) mode when you want both overlapping and non-overlapping pairs output in the same file. I don't recommend it for merging because then you get singletons and pairs in the same file which messes up pairing. This won't affect Tadpole's assembly, but it will affect other tools like Spades.

If you assemble using a pipeline like this:

Code:
#Trim adapters
bbduk.sh in=r1.fq in2=r2.fq out=trimmed.fq ktrim=r k=23 mink=11 hdist=1 ref=/bbmap/resources/adapters.fa tbo tpe maxns=0 trimq=10 qtrim=r maq=12

#Remove small contaminants
bbduk.sh in=trimmed.fq out=filtered.fq k=31 ref=/bbmap/resources/sequencing_artifacts.fa.gz,/bbmap/resources/phix174_ill.ref.fa.gz

#Error-correct 1
bbmerge.sh in=filtered.fq out=ecco.fq ecco mix vstrict adapters=default

#Error-correct 2
clumpify.sh in=ecco.fq out=eccc.fq ecc passes=6

#Error-correct 3
tadpole.sh in=eccc.fq out=ecct.fq ecc

#Merge
bbmerge.sh in=eccc.fq out=merged.fq outu=unmerged.fq rem k=62 extend2=50 adapters=default

#Assemble with Spades
spades.py -k 21,41,71,101,127 -o out -12 unmerged.fq -s merged.fq
#Or Tadpole
tadpole.sh in=merged.fq,unmerged.fq k=93 out=tadpole.fa

#Evaluate
stats.sh in=/out/scaffolds.fasta

#Quantify coverage
bbmap.sh ref=/out/scaffolds.fasta in=trimmed.fq out=mapped.sam covstats=covstats.txt
"mix" was used for the first error-correction phase with BBMerge. When BBMerge was used the second time, for merging, "mix" was not used, and instead the output was split into merged.fq (singletons) and unmerged.fq (pairs). These can be fed in to Spades or Tadpole as shown. In this case, if you want to quality-trim to Q20, that should be done only unmerged.fq because quality-trimming the raw reads to a high level prior to merging will reduce the merge rate.

Reads that don't merge might be low quality. But they might also have a large insert size so that they don't overlap, or come from a region that is low-complexity so it's not clear whether they overlap. Therefore, I recommend NOT throwing away the non-overlapping reads.

Last edited by Brian Bushnell; 02-28-2017 at 10:21 AM.
Brian Bushnell is offline   Reply With Quote
Reply

Tags
adapter, bbduk, bbtools, cutadapt, trimmomatic

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 10:46 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO