SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
bbtools: how to properly set up rqcfilter sghignone Bioinformatics 2 11-06-2017 03:24 AM
Error correction in BBTools Sven Bioinformatics 3 06-12-2017 09:54 AM
Additional question regarding BBtools steepale Bioinformatics 3 10-06-2016 11:53 AM
How to remove reads contaminants? Guigra Bioinformatics 7 07-03-2013 08:04 AM

Reply
 
Thread Tools
Old 07-16-2019, 12:00 PM   #1
tamu_anand
Junior Member
 
Location: us

Join Date: May 2011
Posts: 7
Default Has anyone used BBTools package to remove contaminants

I would like to remove mitochondrial, chloroplast and rDNA contaminants from read data using BBTools - I was wondering if anyone has used BBTools to do this and can share the generic pipeline.

Thanks in advance.
tamu_anand is offline   Reply With Quote
Old 07-17-2019, 03:04 AM   #2
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,982
Default

There is no pipeline needed. You can provide contaminants you want to remove as fasta sequence in a file.

While you could use `bbduk.sh` you may want to use `bbsplit.sh` in this case to bin the reads into ones you don't want (based on the list above) and the rest. You can find bbsplit thread here.

Ask if you need information/run into issues.
GenoMax is offline   Reply With Quote
Old 07-17-2019, 08:30 PM   #3
tamu_anand
Junior Member
 
Location: us

Join Date: May 2011
Posts: 7
Default

Quote:
Originally Posted by GenoMax View Post
There is no pipeline needed. You can provide contaminants you want to remove as fasta sequence in a file.

While you could use `bbduk.sh` you may want to use `bbsplit.sh` in this case to bin the reads into ones you don't want (based on the list above) and the rest. You can find bbsplit thread here.

Ask if you need information/run into issues.
Thanks @GenoMax for pointing me to the bbsplit thread

To elaborate, I would like to remove chloroplast/rDNA/mito contaminants and I was thinking I would do something like

Code:
bbmap.sh in=read_1.fq.gz ref=rRNA_Chlor_Mito.fa maxindel=1 minid=0.95 outu=clean_read_1.fq.gz nodisk
I am basing the above command on a post by Brian here: https://www.biostars.org/p/143019/#210890

The strategy here is to use the rRNA+Mito+Chloroplast file and map the reads using bbmap, then collect the unmapped reads (clean_read_1.fq.gz) for my downstream analysis.

I would appreciate any inputs/suggestions to the above bbmap command line options.

Also, Is it better to use bbsplit instead of bbmap?
tamu_anand is offline   Reply With Quote
Old 07-21-2019, 01:44 PM   #4
tamu_anand
Junior Member
 
Location: us

Join Date: May 2011
Posts: 7
Default

Hi all

Any inputs/recommendations on the bbmap command line I am using - Post 3

Thanks in advance
tamu_anand is offline   Reply With Quote
Old 07-21-2019, 04:02 PM   #5
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,982
Default

You can certainly use Brian's recommendation above. If you wish to find out how many reads actually map to those individual references (e.g. rRNA, mito etc) then using bbsplit.sh would be useful since it will give you that statistics.
GenoMax is offline   Reply With Quote
Reply

Tags
bbduk, bbsplit, bbtools

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 10:29 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO