SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Yes .. BBMap can do that! GenoMax Bioinformatics 240 08-22-2019 02:34 AM
Dedupe on assembled RNA-Seq? DrYak Bioinformatics 2 04-07-2016 08:05 AM
bbmap aligner cmccabe Bioinformatics 22 11-03-2015 01:03 PM
BBMap for BitSeq dietmar13 Bioinformatics 1 04-30-2015 09:40 AM
BBMap Error Phage Hunter Bioinformatics 5 01-14-2015 05:34 AM

Reply
 
Thread Tools
Old 07-13-2016, 12:08 PM   #1
JamesSeward
Member
 
Location: Boone, NC

Join Date: Jul 2016
Posts: 10
Question BBmap dedupe help

Greetings,

I am currently running the dedupe command and while I have not had too much trouble with it, I am having what seems to be an input reading error. Rather than putting in my files one at a time separated by commas (which works but takes a lot of time if Im running a lot of files). How can I run a directory that contains all my files? I have used ${line} but it doesn't seem to give me a correct output. It doesn't seem to read through all the files in the folder.

Thanks!
JamesSeward is offline   Reply With Quote
Old 07-13-2016, 12:44 PM   #2
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,989
Default

What is the reason for deduplicating the data, if I may ask? What kind of data is this? Generally you should not need to dedupe the data upfront.

I know @Brian had allowed ref= to be a directory for BBSplit but I don't know if a similar option exists for dedupe.sh.
GenoMax is offline   Reply With Quote
Old 07-13-2016, 03:51 PM   #3
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

Hi... sorry, there's no such option right now. But if you want to deduplicate a bunch of files together, you can do this:

cat *.fasta | dedupe.sh in=stdin.fasta out=deduped.fasta

I'm also curious as to the nature of the data; are you deduplicating multiple assemblies?
Brian Bushnell is offline   Reply With Quote
Old 07-14-2016, 08:53 AM   #4
JamesSeward
Member
 
Location: Boone, NC

Join Date: Jul 2016
Posts: 10
Default

Thank you both for the reply. Brian I will give that a try and see if it works! I am currently running raw peatland microbial data but am currently working with just a few of the files to get some practice. I am also looking to see the differences in output between BBmap and pandaseq.
JamesSeward is offline   Reply With Quote
Old 07-14-2016, 09:23 AM   #5
JamesSeward
Member
 
Location: Boone, NC

Join Date: Jul 2016
Posts: 10
Default

Hello again, I still seem to be running into some problems. When I attempt to run my files at once, using ${line} (which is a loop correct?) my output file is only 4.1MB. When I put my fasta files in one at a time separated by commas, my dedupe file is 12MB, which is the roughly the correct size it should be. This way of doing it works well for the moment, but I am only using 4 files for practice and will be using much more in the future, so putting them in one at a time may not be a great way of doing it. @Brian I have tried the method you had requested earlier but I may be formatting my command line incorrectly, Id be happy to show you if that helps in anyway.
Any advice is appreciated!

Thank you!
JamesSeward is offline   Reply With Quote
Old 07-14-2016, 03:08 PM   #6
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

Hi James,

Please post the exact command you used and the complete error message.

Also, I'm not really very good at command-line one-liners, but I'm sure it's possible to combine "ls *.fasta" with sed or awk to get a comma-delimited list of files.
Brian Bushnell is offline   Reply With Quote
Old 07-15-2016, 08:23 AM   #7
JamesSeward
Member
 
Location: Boone, NC

Join Date: Jul 2016
Posts: 10
Default

for line in $(cat /Users/jamesseward/Desktop/Canada/MappingFiles/plate10_map2.txt);do sh /Users/jamesseward/Desktop/Canada/bbmap/dedupe.sh -Xmx1g in=/Users/jamesseward/Desktop/Canada/bbDuk/Final_Fastq/Merge/${line}_Merge.fasta out=/Users/jamesseward/Desktop/Canada/bbDuk/dedupe/Dereplicated.fasta; done

while I am not getting an error message, this way produces an output file that is much smaller than when I use commas for my input.

Thank you very much for the help!

James
JamesSeward is offline   Reply With Quote
Old 07-15-2016, 11:20 PM   #8
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

Since you are outputting all of the files to the same destination, the output keeps getting overwritten, so the final result is just the deduplicated version of the last file.

Note that even if you appended subsequent output to the file instead of overwriting it (with the flags "ow=f append=t"), you'd still get a different output than using all of the files at once with commas. Running dedupe on multiple files at once will deduplicate them together; you are deduplicating them independently.
Brian Bushnell is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 01:42 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO