View Single Post
Old 02-27-2014, 09:57 AM   #1
Brian Bushnell
Super Moderator
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,695
Default Introducing BBSplit: Read Binning Tool for Metagenomes and Contaminated Libraries

BBSplit is a tool that bins reads by mapping to multiple references simultaneously, using BBMap. The reads go to the bin of the reference they map to best. There are also disambiguation options, such that reads that map to multiple references can be binned with all of them, none of them, one of them, or put in a special "ambiguous" file for each of them. Paired reads will always be kept together.

For example, if you had a library of something that was contaminated with e.coli and salmonella, you could do this: in=reads.fq ref=ecoli.fa,salmonella.fa basename=out_%.fq outu=clean.fq int=t

This will produce 3 output files:
out_ecoli.fq (ecoli reads)
out_salmonella.fq (salmonella reads)
clean.fq (unmapped reads)

In this case, "int=t" means that the input file is paired and interleaved. For single-end reads you would leave that out. For paired reads in 2 files, you would do this: in1=reads1.fq in2=reads2.fq ref=ecoli.fa,salmonella.fa basename=out_%.fq outu1=clean1.fq outu2=clean2.fq

You can get more information about parameters by running with no arguments, or reading /bbmap/docs/readme.txt. But I will mention here the inter-reference ambiguity modes, which decide what to do with reads that map to multiple references and pairs where each read maps to a different reference:

Default. Ambiguous reads go to the first best site.

Ambiguous reads are considered unmapped.

Write a copy to the output for each reference to which it maps.

Write a copy to the AMBIGUOUS_ output file for each reference to which it maps.

If your OS cannot process bash shellscripts, replace "" with "java -Xmx29g -cp /path/to/current align2.BBSplitter", where /path/to/current is the location of the 'current' directory (a subdirectory of bbmap), and -Xmx29g specifies the amount of memory to use (so this would be the command line for a 32GB computer). This should be set to about 85% of physical memory.

BBSplit is extremely fast and highly sensitive, using BBMap for the mapping. So, all flags and features supported by BBMap can be used with BBSplit (aside from sam output).

BBSplit is available here:

P.S. Some people have asked why BBSplit has a lower alignment rate than BBMap. That is because it has a lower default sensitivity, as the original intent was to bin reads using known assemblies. The sensitivity can be raised to be equivalent to BBMap with these flags: "minratio=0.56 minhits=1 maxindel=16000"

Last edited by Brian Bushnell; 09-16-2014 at 09:29 AM.
Brian Bushnell is offline   Reply With Quote