View Single Post
Old 03-13-2015, 02:10 PM   #20
Brian Bushnell
Super Moderator
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707

Well... the easiest way to use BBSplit is to have it split input into one output file per reference file. Once you concatenate them, you lose that ability. It's technically possible to specify sets of named sequences within a file but that way is too complicated. Using a tiered approach is possible (though a pain):

for references a, b, c, d, e, f:

abc=cat a b c
def=cat d e f

bbsplit ref=abc,def

then re-split the abc pile between a, b, and c, and the def pile between d, e, and f. That will basically cut the command line lengths (and open file handles) by a factor of the square root of the number of references.

But, I will add this feature quickly since it's hard to do otherwise.

Note that if all you care about are statistics of which reads mapped to which scaffolds, then if all the ref sequences have unique names, you can just run BBMap on the concatenated file with the "scafstats" output flag. But for actually splitting the files easily, the references currently need to be in individual files.
Brian Bushnell is offline   Reply With Quote