SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Quality-, adapter- and RRBS-trimming with Trim Galore! fkrueger Bioinformatics 136 01-18-2019 01:24 AM
Adapter trimming figo1019 RNA Sequencing 2 07-17-2018 04:00 AM
Adapter trimming and trimming by quality question alisrpp Bioinformatics 5 04-08-2013 04:55 PM
adapter trimming - help a_mt Bioinformatics 6 11-12-2012 07:36 PM
3' Adapter Trimming caddymob Bioinformatics 0 05-27-2009 12:53 PM

Reply
 
Thread Tools
Old 02-07-2019, 03:46 AM   #321
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,982
Default

We are making progress!

How much memory do you have on this Mac? I suggest using 85% of the maximum memory you have available with BBTools. You also need to specify "in1= and out1=" to go with in2= out2= etc. Since you are using IUPAC bases in your literal sequence you also need to run this option on

copyundefined=f (cu) Process non-AGCT IUPAC reference bases by making all
possible unambiguous copies.

You are also trimming to the left side of the read. Is that correct?

Can you try this command?

Code:
bbduk.sh -Xmx4g in1=sample14_S14_R1.fastq.gz in2=sample14_S14_R2.fastq.gz out1=sample14_S14_R1_btrimmed.fastq.gz out2=sample14_S14_R1_btrimmed.fastq.gz literal=GTACACAMCGCCCGTCGC,TGATCCTTCTGCVGGTTCWCCTACG k=10 ordered=t mink=2 ktrim=l rcomp=f minlength=50 maxlength=155 copyundefined=t tbo tpe
GenoMax is offline   Reply With Quote
Old 02-25-2019, 11:01 AM   #322
aushev
Member
 
Location: Europe

Join Date: Nov 2009
Posts: 21
Default

Hello,
I was trying to filter out rRNA reads using the command like this:
Quote:
bbduk.sh -Xmx3g in=in.fastq out=nonribo.fastq outm=ribo.fastq ref=ribokmers.fa.gz k=31 minlen=3
where ribokmers.fa.gz is taken from Brian's googledrive link posted at https://www.biostars.org/p/159959/
I noticed that my most abundant rRNA reads (CGCGACCTCAGATCAGACGTGGCGACCCGCTGAATTT) are not filtered. Can anyone explain how this ribokmers.fa was created? What will be the difference if I use, for example, "Human ribosomal DNA complete repeating unit" from GenBank (U13369)? Is there any other recommended source of rRNA sequences for this purpose?

Last edited by aushev; 02-25-2019 at 11:08 AM.
aushev is offline   Reply With Quote
Old 02-25-2019, 11:12 AM   #323
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,982
Default

@aushev: That k-mers file is likely for non-human genomes since it was made from SILVA database.

You could use U13369 fasta sequence and then bin the reads that map to it using bbsplit.sh.
GenoMax is offline   Reply With Quote
Old 02-25-2019, 01:30 PM   #324
aushev
Member
 
Location: Europe

Join Date: Nov 2009
Posts: 21
Default

Quote:
Originally Posted by GenoMax View Post
@aushev: That k-mers file is likely for non-human genomes since it was made from SILVA database.

You could use U13369 fasta sequence and then bin the reads that map to it using bbsplit.sh.
Thank you @GenoMax!
What would be the main advantage of using bbsplit instead of bbduk? As I understand, BBSplit internally uses BBMap, unlike BBDuk - but what would it practically mean? In my scenario, I want to filter out all rRNA reads before doing any further mapping.
aushev is offline   Reply With Quote
Old 02-25-2019, 04:11 PM   #325
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,982
Default

Any reads that align to the ribosomal repeat will be identified and separated in a file. Isn't that what you are looking to do?
GenoMax is offline   Reply With Quote
Old 02-25-2019, 04:14 PM   #326
aushev
Member
 
Location: Europe

Join Date: Nov 2009
Posts: 21
Default

Quote:
Originally Posted by GenoMax View Post
Any reads that align to the ribosomal repeat will be identified and separated in a file. Isn't that what you are looking to do?
yes, that's what I wanted - but I just wanted also to understand what is the difference between bbduk and bbsplit for this purpose.
aushev is offline   Reply With Quote
Old 02-25-2019, 05:32 PM   #327
aushev
Member
 
Location: Europe

Join Date: Nov 2009
Posts: 21
Default

sorry for another dummy question, but I really want to understand how bbduk works and currently I'm having troubles with that... Below I list example of 11 reads containing adapter sequence which I all expected to be detected with the following parameters:
Code:
bbduk.sh in=falseneg.fastq literal=AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC ktrim=r mink=10 hdist=1 edist=1 hdist2=1 edist2=1
So, reference adapter sequence is `AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC`, and all those reads have a match of at least 10 nt (mink=10) and no more than 1 mismatch:

***_(ref)_______________________________AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC
1_______________________________________AGATCGGAAGAG_ACACGTCTGAACTCCAGTCACTCGAAGATCTCGTATGC
2_______________AGCAGCATTGTACAGGGCTATGACAGATCGGAAGAGCACACGTC_GAACTC
3________AGCAGTTGAACATGGGTCAGTCGGTCCTGAGAGATCGGAAGAGCACACAT
4______________________________CCTGAGGCTAGATCGGAAGAGCACACGTCTGAAC_CCAGTCACTCGAAGAT
5__CGCGACCTCAGATCAGACGTGGCGACCCGCTGAATTTAGATCGGAAGAGT
6_________GCATGGGTGGTTCAGTGGTAGAATTCTCGCAGATCGGAAGAGCACACCGT
7_______GCATTGGTGGTTCAGTGGTAGAATTCTCGCCTAGATCGGAAGAGCACTCG
8________________TAGCTTATCAGACTGATGTTGACAGATCGGAAGAGCACACGTCTGA_CTCC
9______TCCCTGGTGGTCTAGTGGTTAGGATTCGGCGCTAGATCGGAAGAGCACAG
10______TCCCTGTGGTCTAGTGGTTAGGATTCGGCGCTAGATCGGAAGAGCACGCG
11_______________TCGGATCCGTCTGAGCTTGGCTAAGATCGGAAGAGCACACGTCTGGACTC
***_(ref)_______________________________AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC


Can you give a hint why none of those reads are matched?
Thanks in advance!

P.S. Adding qhdist=1 made correct matching, but I still don't understand why edist=1 did not work...
Attached Files
File Type: txt falseneg.fastq.txt (1.7 KB, 1 views)

Last edited by aushev; 02-25-2019 at 05:47 PM.
aushev is offline   Reply With Quote
Old 02-26-2019, 04:42 AM   #328
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,982
Default

@aushev: Unfortunately Brian no longer has time to participate on this forum. He would really be the only person who can authoritatively answer your questions. You could try to create a ticket on SF site to see if he responds.

"edist" directive is for indels so perhaps that is the reason it did not work. I have never had a need to use that directive. Many options for BBTools programs may be applicable in very specific use cases so unless you know for sure you need that option I would go with the defaults. That is all I can offer.

Use the smallest/core sequence you are trying to match if you intend to remove all sequence to the right of where that core sequence is found.
GenoMax is offline   Reply With Quote
Old 09-09-2019, 07:41 PM   #329
necrosnake91
Junior Member
 
Location: Mexico City

Join Date: Sep 2019
Posts: 2
Default

Hi to everyone.

I have an issue with bbduk.sh and I,m looking for some help. At this moment I'm calculating the percentage of reads with a quality equal or higher than 30. Using bbduk.sh, I'm running this command:

```../bbmap/bbduk.sh in1=../t01-R1.fastq in2=../t01-R2.fastq trimq=30 qtrim=rl ````

and the output looks like:

````

Max memory cannot be determined. Attempting to use 1400 MB.
If this fails, please add the -Xmx flag (e.g. -Xmx24g) to your command,
or run this program qsubbed or from a qlogin session on Genepool, or set ulimit to an appropriate value.
java -ea -Xmx1400m -Xms1400m -cp /Users/rodolfochavez/RNASeq/bbmap/current/ jgi.BBDuk in1=../t01-R1.fastq in2=../t01-R2.fastq trimq=30 qtrim=rl
Executing jgi.BBDuk [in1=../t01-R1.fastq, in2=../t01-R2.fastq, trimq=30, qtrim=rl]
Version 38.59

No output stream specified. To write to stdout, please specify 'out=stdout.fq' or similar.
0.019 seconds.
Initial:
Memory: max=1468m, total=1468m, free=1439m, used=29m

Input is being processed as paired
Processing time: 2.250 seconds.

Input: 3000000 reads 225505684 bases.
QTrimmed: 2167934 reads (72.26%) 96158343 bases (42.64%)
Total Removed: 444188 reads (14.81%) 96158343 bases (42.64%)
Result: 2555812 reads (85.19%) 129347341 bases (57.36%)

Time: 2.252 seconds.
Reads Processed: 3000k 1331.91k reads/sec
Bases Processed: 225m 100.12m bases/sec

```
And the result of the analysis is printed in the screen of my terminal. My question is, how can I save the statistics Results from this analysis in a file?

Thanks a lot
necrosnake91 is offline   Reply With Quote
Old 09-09-2019, 07:57 PM   #330
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,982
Default

You can simply capture the STDERR output (where this is bring written) to a file.
GenoMax is offline   Reply With Quote
Old 09-11-2019, 07:10 PM   #331
necrosnake91
Junior Member
 
Location: Mexico City

Join Date: Sep 2019
Posts: 2
Default

GenoMax,

Thanks a lot, you solved my issue!
necrosnake91 is offline   Reply With Quote
Reply

Tags
adapter, bbduk, bbtools, cutadapt, trimmomatic

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 04:49 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO