SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Quality-, adapter- and RRBS-trimming with Trim Galore! fkrueger Bioinformatics 136 01-18-2019 01:24 AM
Adapter trimming figo1019 RNA Sequencing 2 07-17-2018 04:00 AM
Adapter trimming and trimming by quality question alisrpp Bioinformatics 5 04-08-2013 04:55 PM
adapter trimming - help a_mt Bioinformatics 6 11-12-2012 07:36 PM
3' Adapter Trimming caddymob Bioinformatics 0 05-27-2009 12:53 PM

Reply
 
Thread Tools
Old 02-07-2019, 03:46 AM   #321
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,888
Default

We are making progress!

How much memory do you have on this Mac? I suggest using 85% of the maximum memory you have available with BBTools. You also need to specify "in1= and out1=" to go with in2= out2= etc. Since you are using IUPAC bases in your literal sequence you also need to run this option on

copyundefined=f (cu) Process non-AGCT IUPAC reference bases by making all
possible unambiguous copies.

You are also trimming to the left side of the read. Is that correct?

Can you try this command?

Code:
bbduk.sh -Xmx4g in1=sample14_S14_R1.fastq.gz in2=sample14_S14_R2.fastq.gz out1=sample14_S14_R1_btrimmed.fastq.gz out2=sample14_S14_R1_btrimmed.fastq.gz literal=GTACACAMCGCCCGTCGC,TGATCCTTCTGCVGGTTCWCCTACG k=10 ordered=t mink=2 ktrim=l rcomp=f minlength=50 maxlength=155 copyundefined=t tbo tpe
GenoMax is online now   Reply With Quote
Old 02-25-2019, 11:01 AM   #322
aushev
Member
 
Location: Europe

Join Date: Nov 2009
Posts: 18
Default

Hello,
I was trying to filter out rRNA reads using the command like this:
Quote:
bbduk.sh -Xmx3g in=in.fastq out=nonribo.fastq outm=ribo.fastq ref=ribokmers.fa.gz k=31 minlen=3
where ribokmers.fa.gz is taken from Brian's googledrive link posted at https://www.biostars.org/p/159959/
I noticed that my most abundant rRNA reads (CGCGACCTCAGATCAGACGTGGCGACCCGCTGAATTT) are not filtered. Can anyone explain how this ribokmers.fa was created? What will be the difference if I use, for example, "Human ribosomal DNA complete repeating unit" from GenBank (U13369)? Is there any other recommended source of rRNA sequences for this purpose?

Last edited by aushev; 02-25-2019 at 11:08 AM.
aushev is offline   Reply With Quote
Old 02-25-2019, 11:12 AM   #323
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,888
Default

@aushev: That k-mers file is likely for non-human genomes since it was made from SILVA database.

You could use U13369 fasta sequence and then bin the reads that map to it using bbsplit.sh.
GenoMax is online now   Reply With Quote
Old 02-25-2019, 01:30 PM   #324
aushev
Member
 
Location: Europe

Join Date: Nov 2009
Posts: 18
Default

Quote:
Originally Posted by GenoMax View Post
@aushev: That k-mers file is likely for non-human genomes since it was made from SILVA database.

You could use U13369 fasta sequence and then bin the reads that map to it using bbsplit.sh.
Thank you @GenoMax!
What would be the main advantage of using bbsplit instead of bbduk? As I understand, BBSplit internally uses BBMap, unlike BBDuk - but what would it practically mean? In my scenario, I want to filter out all rRNA reads before doing any further mapping.
aushev is offline   Reply With Quote
Old 02-25-2019, 04:11 PM   #325
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,888
Default

Any reads that align to the ribosomal repeat will be identified and separated in a file. Isn't that what you are looking to do?
GenoMax is online now   Reply With Quote
Old 02-25-2019, 04:14 PM   #326
aushev
Member
 
Location: Europe

Join Date: Nov 2009
Posts: 18
Default

Quote:
Originally Posted by GenoMax View Post
Any reads that align to the ribosomal repeat will be identified and separated in a file. Isn't that what you are looking to do?
yes, that's what I wanted - but I just wanted also to understand what is the difference between bbduk and bbsplit for this purpose.
aushev is offline   Reply With Quote
Old 02-25-2019, 05:32 PM   #327
aushev
Member
 
Location: Europe

Join Date: Nov 2009
Posts: 18
Default

sorry for another dummy question, but I really want to understand how bbduk works and currently I'm having troubles with that... Below I list example of 11 reads containing adapter sequence which I all expected to be detected with the following parameters:
Code:
bbduk.sh in=falseneg.fastq literal=AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC ktrim=r mink=10 hdist=1 edist=1 hdist2=1 edist2=1
So, reference adapter sequence is `AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC`, and all those reads have a match of at least 10 nt (mink=10) and no more than 1 mismatch:

***_(ref)_______________________________AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC
1_______________________________________AGATCGGAAGAG_ACACGTCTGAACTCCAGTCACTCGAAGATCTCGTATGC
2_______________AGCAGCATTGTACAGGGCTATGACAGATCGGAAGAGCACACGTC_GAACTC
3________AGCAGTTGAACATGGGTCAGTCGGTCCTGAGAGATCGGAAGAGCACACAT
4______________________________CCTGAGGCTAGATCGGAAGAGCACACGTCTGAAC_CCAGTCACTCGAAGAT
5__CGCGACCTCAGATCAGACGTGGCGACCCGCTGAATTTAGATCGGAAGAGT
6_________GCATGGGTGGTTCAGTGGTAGAATTCTCGCAGATCGGAAGAGCACACCGT
7_______GCATTGGTGGTTCAGTGGTAGAATTCTCGCCTAGATCGGAAGAGCACTCG
8________________TAGCTTATCAGACTGATGTTGACAGATCGGAAGAGCACACGTCTGA_CTCC
9______TCCCTGGTGGTCTAGTGGTTAGGATTCGGCGCTAGATCGGAAGAGCACAG
10______TCCCTGTGGTCTAGTGGTTAGGATTCGGCGCTAGATCGGAAGAGCACGCG
11_______________TCGGATCCGTCTGAGCTTGGCTAAGATCGGAAGAGCACACGTCTGGACTC
***_(ref)_______________________________AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC


Can you give a hint why none of those reads are matched?
Thanks in advance!

P.S. Adding qhdist=1 made correct matching, but I still don't understand why edist=1 did not work...
Attached Files
File Type: txt falseneg.fastq.txt (1.7 KB, 1 views)

Last edited by aushev; 02-25-2019 at 05:47 PM.
aushev is offline   Reply With Quote
Old 02-26-2019, 04:42 AM   #328
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,888
Default

@aushev: Unfortunately Brian no longer has time to participate on this forum. He would really be the only person who can authoritatively answer your questions. You could try to create a ticket on SF site to see if he responds.

"edist" directive is for indels so perhaps that is the reason it did not work. I have never had a need to use that directive. Many options for BBTools programs may be applicable in very specific use cases so unless you know for sure you need that option I would go with the defaults. That is all I can offer.

Use the smallest/core sequence you are trying to match if you intend to remove all sequence to the right of where that core sequence is found.
GenoMax is online now   Reply With Quote
Reply

Tags
adapter, bbduk, bbtools, cutadapt, trimmomatic

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 08:21 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO