SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Quality-, adapter- and RRBS-trimming with Trim Galore! fkrueger Bioinformatics 132 04-18-2017 02:04 AM
Adapter trimming figo1019 RNA Sequencing 1 04-07-2014 11:58 AM
Adapter trimming and trimming by quality question alisrpp Bioinformatics 5 04-08-2013 05:55 PM
adapter trimming - help a_mt Bioinformatics 6 11-12-2012 08:36 PM
3' Adapter Trimming caddymob Bioinformatics 0 05-27-2009 01:53 PM

Reply
 
Thread Tools
Old 03-01-2017, 05:25 AM   #201
JVGen
Member
 
Location: East Coast

Join Date: Jul 2016
Posts: 37
Default

Thanks Brian. It doesn't look like clumpify has been built into Geneious as of yet, so I'll have to skip that trimming step.

Also, I was wondering why there were 71 sequences in the Nextera Adapters set? The only adapter sequence that I know of for Nextera is "CTGTCTCTTATACACATCT". Will BBDuk search for the adapter and trim everything upstream (5') of it? That would take care of the indexes as well.

Jake
JVGen is offline   Reply With Quote
Old 03-01-2017, 03:55 PM   #202
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,695
Default

Nextera adapters are much longer than what you've posted, which I have listed as the "Nextera Transposon". Longer sequences are more specific. The reason I have so many Nextera sequences is that all of Illumina's different indexes are included as well. And adapter trimming goes toward the 3' end, not the 5' end. But yes, once BBDuk identifies an adapter kmer, it trims that and everything downstream.
Brian Bushnell is offline   Reply With Quote
Old 03-02-2017, 05:42 AM   #203
JVGen
Member
 
Location: East Coast

Join Date: Jul 2016
Posts: 37
Default

Hi Brian,

I always mix the placement of the adapters up. I'm not sure what the location I'm referring to is called, but it is the sequence that is added by the transposase, and the location where the primers bind during library amplification. If we use that sequence, shouldn't we be able to trim that and everything 5' of it (that should remove the indexes and the adapter sequence)?

Illumina expanded their indexes. You can now run 384 samples per flow cell using Nextera XT Indexes with the original Nextera enzyme. I can share the sequences if you're interested.

Jake
JVGen is offline   Reply With Quote
Old 03-02-2017, 05:51 AM   #204
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,550
Default

Just use the adapters.fa file included in "bbmap/resources" directory. That should cover all common adapters etc.
GenoMax is offline   Reply With Quote
Old 03-02-2017, 09:53 AM   #205
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,695
Default

Quote:
Originally Posted by JVGen View Post
If we use that sequence, shouldn't we be able to trim that and everything 5' of it (that should remove the indexes and the adapter sequence)?
To be clear, I like to use "left" and "right" rather than 3' and 5'. In a fastq file, reads have a left and right end, so...

A read's adapter is to the left of its left end. It does not get sequenced and thus does not need to be trimmed. When the insert size is shorter than read length, a read will run out of genomic sequence and go into the adapter on the opposite end of the molecule. This shows up as adapter sequence on its right side. So, adapter-trimming finds adapter sequence on the right end of a read and trims to the right.

Quote:
Illumina expanded their indexes. You can now run 384 samples per flow cell using Nextera XT Indexes with the original Nextera enzyme. I can share the sequences if you're interested.
They're constantly expanding... sigh. Thanks for your offer, but I'll download their "customer service letter" and manually add the new sequences. If Illumina was truly interested in customer service they would provide all of their adapters in a fasta file, but they really are not - they're much more interested in protecting their IP (note that Illumina makes most of its money from selling reagents), so they make it as difficult as possible.
Brian Bushnell is offline   Reply With Quote
Old 04-11-2017, 08:44 AM   #206
cnicolas
Junior Member
 
Location: leipzig

Join Date: Aug 2016
Posts: 4
Default

Dear Brian,

I would like to try your bioinformatic tools but I have some troubles with the files which are generated.

I just want to filter my MiSeq sequences by reads which score at >30 wit BBduk.
so i do this command

/data/umb/cichocki/bbmap/bbduk.sh in=/data/umb/cichocki/project2/bbduckdu11avril/project2_R2.fastq in=/data/umb/cichocki/project2/bbduckdu11avril/project2_R2.fastq out1=clean11avril.fastq out2=clean211avril.fastq qtrim=rl trimq=30

I have a result of 6% of the sequences removed. Fine !

The next steps have to be done in MOTHUR... how can I proceed ?

my first command would be in MOTHUR

make.contigs(ffastq=clean11avril30.fastq, rfastq=clean211avril30.fastq, oligos=myko.txt)

The problems arrive few seconds later by a ...not nice comment !

M04654_34_000000000-AVVJL_1_1102_21707_1905 is in your forward fastq file and not in your reverse file, please remove it using the remove.seqs command before proceeding.
Making contigs...
....(core dumped)
and stop here for sure...

I thought that your software did the cleaning process in both files R1 and R2 ?

am I right or not ?
there is a bug ? or did i do something wrong ?

thank you in advance for your answer.

Best regards,

Nicolas
cnicolas is offline   Reply With Quote
Old 04-11-2017, 08:56 AM   #207
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,550
Default

That is strange. Since you trimmed R1/R2 files together they should never go out of sync.

It is possible that the files you originally got from your sequence provider are out of sync (since they were likely trimmed on Sequencer or in BaseSpace). I suggest that you first run "repair.sh" on your original files and then do quality trim.

By the way, trimming at trimq=30 is very severe. You should not need to do that.
GenoMax is offline   Reply With Quote
Old 04-11-2017, 09:02 AM   #208
cnicolas
Junior Member
 
Location: leipzig

Join Date: Aug 2016
Posts: 4
Default

Dear GenoMax,

Thank for your answer.

I do it and come back here asap.

Best regards,

Nicolas

P.S. Yes actually I try with 25, 27 and 30.
cnicolas is offline   Reply With Quote
Old 04-11-2017, 09:59 AM   #209
cnicolas
Junior Member
 
Location: leipzig

Join Date: Aug 2016
Posts: 4
Default

results :
java -ea -Xmx13352m -cp /data/umb/cichocki/bbmap/current/ jgi.SplitPairsAndSingles rp in1=/data/umb/cichocki/project2/bbduckdu11avril/project2_R1.fastq in2=/data/umb/cichocki/project2/bbduckdu11avril/project2_R2.fastq out1=fixed1.fastq out2=fixed2.fastq outsingle=singletons.fastq
Executing jgi.SplitPairsAndSingles [rp, in1=/data/umb/cichocki/project2/bbduckdu11avril/project2_R1.fastq, in2=/data/umb/cichocki/project2/bbduckdu11avril/project2_R2.fastq, out1=fixed1.fastq, out2=fixed2.fastq, outsingle=singletons.fastq]

Set INTERLEAVED to false
Started output stream.

Input: 43045566 reads 12956715366 bases.
Result: 43045566 reads (100.00%) 12956715366 bases (100.00%)
Pairs: 43045566 reads (100.00%) 12956715366 bases (100.00%)
Singletons: 0 reads (0.00%) 0 bases (0.00%)

no errors if I understand it well !

I re-trimmed by 27... and now it's apparently working in MOTHUR... let's see if it works well !

Thank you anyway for this command which can be useful for me in the future or perhaps for someone else !

enjoy your working aftertoon on the east cost !

Nico
cnicolas is offline   Reply With Quote
Old 04-11-2017, 10:08 AM   #210
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,550
Default

Interesting. Well as long as it is working. You did not lose any reads in the process either :-)
GenoMax is offline   Reply With Quote
Old 04-11-2017, 10:25 AM   #211
cnicolas
Junior Member
 
Location: leipzig

Join Date: Aug 2016
Posts: 4
Default

just 2.4% by trimming at 27. i do not know if it's a good average for a 2x300bp, MiSeq 16s and a metaprofilig of another marker
cnicolas is offline   Reply With Quote
Old 04-11-2017, 11:16 PM   #212
WhatsOEver
Senior Member
 
Location: Germany

Join Date: Apr 2012
Posts: 215
Default

Is the command you posted, when you got the error the first time copy/paste from your terminal? In that case you simply made the mistake to only load the reverse reads
That may also fit to the error message you got as you would have lost all mates. Interesting, however, that bbduk didn't complain and even produced two fastqs as output?!

Quote:
Originally Posted by cnicolas View Post
/data/umb/cichocki/bbmap/bbduk.sh in=/data/umb/cichocki/project2/bbduckdu11avril/project2_R2.fastq in=/data/umb/cichocki/project2/bbduckdu11avril/project2_R2.fastq out1=clean11avril.fastq out2=clean211avril.fastq qtrim=rl trimq=30
WhatsOEver is offline   Reply With Quote
Old 04-12-2017, 04:35 PM   #213
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,695
Default

Quote:
Originally Posted by WhatsOEver View Post
Is the command you posted, when you got the error the first time copy/paste from your terminal? In that case you simply made the mistake to only load the reverse reads
That may also fit to the error message you got as you would have lost all mates. Interesting, however, that bbduk didn't complain and even produced two fastqs as output?!
Ah! Good eye...

OK, so here's what's happening:

Code:
in=x.fq in=x.fq
In1 is set as x.fq. Then, in1 is set as x.fq again (you can do this as many times as you want; BBTools all just overwrite the previous setting with the latest setting). Then, since 2 output files are specified, BBDuk assumes that the input file is interleaved and forces interleaved mode to true. That's a feature, by the way! But, I guess one that could potentially cause problems.
Brian Bushnell is offline   Reply With Quote
Old 04-18-2017, 04:31 AM   #214
mslider
Junior Member
 
Location: france

Join Date: Sep 2010
Posts: 24
Default

--Hi,

i have a big difference between results using bbduk.sh and trimmomatic trimming single-end reads, i have used the commands below, trimmomatic kept 99.78% survival reads whereas bbduk 91.76%. I don't know which to consider good or not.
Which parameters do you use to use to trim in a good way single-reads ?

thank you --

java -Xmx10g -jar trimmomatic-0.36.jar SE -threads 8 -phred33 D3_464_S2_L001_R1_001.fastq.gz Out_D3_464_S2_L001_R1_001.fastq.gz ILLUMINACLIP:TruSeq3-SE.fa:2:40:15:8:true LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36

TrimmomaticSE: Started with arguments:
-threads 8 -phred33 D3_464_S2_L001_R1_001.fastq.gz Out_D3_464_S2_L001_R1_001.fastq.gz ILLUMINACLIP:/home/jtazi/save/Trimmomatic-0.36/adapters/TruSeq3-SE.fa:
2:40:15:8:true LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36
Using Long Clipping Sequence: 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA'
Using Long Clipping Sequence: 'AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC'
ILLUMINACLIP: Using 0 prefix pairs, 2 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences
Input Reads: 49512700 Surviving: 49401731 (99.78%) Dropped: 110969 (0.22%)

bbduk.sh Xmx8g in=D3_464_S2_L001_R1_001.fastq.gz out=D3_464_S2_L001_R1_001_trimmed.fastq.gz ref=resources/adapters.fa threads=8 k=13 ktrim=r useshortkmers=t mink=5 qtrim=rl minlength=36 trimq=27

BBDuk version 36.11
Set threads to 8
maskMiddle was disabled because useShortKmers=true
Initial:
Memory: max=8232m, free=7974m, used=258m

Added 2017 kmers; time: 0.570 seconds.
Memory: max=8232m, free=7545m, used=687m

Input is being processed as unpaired
Started output streams: 0.449 seconds.
Processing time: 118.446 seconds.

Input: 49512700 reads 2475635000 bases.
QTrimmed: 7288815 reads (14.72%) 218452414 bases (8.82%)
KTrimmed: 3125475 reads (6.31%) 23413723 bases (0.95%)
Total Removed: 4077445 reads (8.24%) 241866137 bases (9.77%)
Result: 45435255 reads (91.76%) 2233768863 bases (90.23%)

Time: 119.548 seconds.
Reads Processed: 49512k 414.16k reads/sec
Bases Processed: 2475m 20.71m bases/sec
mslider is offline   Reply With Quote
Old 04-18-2017, 11:08 AM   #215
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,695
Default

The difference is primarily because you are quality-trimming to Q27, which is too high for almost any purpose. I'd suggest a command more like this:

Code:
bbduk.sh -Xmx8g in=D3_464_S2_L001_R1_001.fastq.gz out=D3_464_S2_L001_R1_001_trimmed.fastq.gz ref=resources/adapters.fa threads=8 k=19 mink=5 hdist=1 hdist2=0 ktrim=r qtrim=r minlength=36 trimq=14
Brian Bushnell is offline   Reply With Quote
Old 04-18-2017, 01:51 PM   #216
mslider
Junior Member
 
Location: france

Join Date: Sep 2010
Posts: 24
Default

--Hi,

thank you for your answer, just a question about quality check:
trimq=14 means an average quality in a sliding window such as in Trimmomatic with SLIDINGWINDOW:4:15 or not ?

best -
mslider is offline   Reply With Quote
Old 04-18-2017, 03:10 PM   #217
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,695
Default

BBDuk supports a sliding window; the flags "qtrim=w,4 trimq=15" will give similar behavior to Trimmomatic. But I don't recommend that; the Phred trimming method used by default is optimal, whereas sliding-window trimming is non-optimal.
Brian Bushnell is offline   Reply With Quote
Old 04-19-2017, 04:30 AM   #218
mslider
Junior Member
 
Location: france

Join Date: Sep 2010
Posts: 24
Default

okay good, thanks for your help.
mslider is offline   Reply With Quote
Old 05-05-2017, 12:10 PM   #219
BrianS
Junior Member
 
Location: East Coast, USA

Join Date: May 2017
Posts: 2
Default

I am using bbduk.sh to trim fastqs to a given length using the force trim capability. I noticed that the character # is being changed to ! in the Q score line of the trimmed fastq. I was unable to find documentation describing whether this is expected behavior. Would you be able to provide some insight into this? I ran the following command:

../../tools/bbmap/bbduk.sh in=<sample>.fastq.gz out=<trimmed_sample>.fastq.gz ftr=50 ordered=t

Original fastq:
@SN1131:915:HFYN7ADXY:1:1101:21364:2052 1:N:0: CAACCACA
TTTCNCCACCACCACGTCGTTCTTGCGCCTCTTCTTGGCTTTCCGCTTGCGCTTGGGTATCTGGCTTGGGGGGCGGAGTGGATCCTGCTTTCTGGCGGAAA
+
@@@B#2=BFHFHHII<GHIIIHIIIBHIIIIIIIIIIIIGIIIGIIIIIIIHHEEEB;[email protected]>BBBBBB?BCCCCCCCCCCCCCCCB<9>B


bbduk.sh output:
@SN1131:915:HFYN7ADXY:1:1101:21364:2052 1:N:0: CAACCACA
TTTCNCCACCACCACGTCGTTCTTGCGCCTCTTCTTGGCTTTCCGCTTGCG
+
@@@B!2=BFHFHHII<GHIIIHIIIBHIIIIIIIIIIIIGIIIGIIIIIII


Thank you,
Brian
BrianS is offline   Reply With Quote
Old 05-05-2017, 12:19 PM   #220
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,695
Default

Hi Brian,

That's intentional. The 5th base call is an N, which means the quality score should be 0 (!) not 2 (#). Some versions of Illumina software have bugs causing some Ns to be assigned quality scores above 0, or called bases to be assigned a quality score of 0. Neither of these cases should happen as they are mathematically contradictory, and can cause problems with downstream tools, so BBDuk automatically fixes both of them.

You can add the flag "changequality=f" to disable this behavior, but I don't recommend it.
Brian Bushnell is offline   Reply With Quote
Reply

Tags
adapter, bbduk, bbtools, cutadapt, trimmomatic

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 02:22 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO