View Single Post
Old 04-15-2014, 09:30 AM   #20
HGV
Junior Member
 
Location: Bremen, Germany

Join Date: Nov 2011
Posts: 4
Default problems with bbmap switches

Hi Brian,
Very decent mapper that you wrote, and supergreat that it is finally
available .

I was playing around with bbmap, sooo many cool features and an impressive
speed! But I could not figure out a couple of things:
1) How to point to directories with path=
I would like to be able to use a set of indices that I created previously
and that are stored in a specific 'databases' path mounted on all nodes of
our cluster. But I did not get this to work setting path= during the
mapping process because it changes where bbmap searches for the reads. My
second trial was to call bbmap in this folder of the references and set
path to the folder of the reads, but then I always got an error that the
read file was not found. The only thing that worked for me is calling bbmap
from a folder which also includes the /ref folder, but this means copying
both reads and refs accross the filesystem wherever I need them. We were mostly using bowtie2 up to now and in bowtie2 I can point to absolute paths for references, reads and outputs. Would be cool to be able to handle files in
bbmap similarly
2) using outm
I am mapping shotgun metagenomic illumina paired end reads to references
that are gene databases. I was expecting to get different output for out=
and outm= but the files produced are identical. I would expect to see some
read pairs where only one of the reads maps to the gene database and the
other not, and as far as I understood out= gives me the mapping pairs and
outm= gives me the mapping pairs and the single mapping reads with their
pairs.
3) how can I get sorted unmapped pairs written to a file?
While outm1=reads.f.fq and outm2=reads.r.fq gives me the mapped pairs, outu always writes everything to a single file (no outu2= possible)
4) I was trying to limit the insert sizes allowed in the paired end mapping with e.g. pairlen=1000, but the output still reported exactly the same mappings with insert sizes way higher, often in the multi kb range for PE-reads. This also greatly affects the average insert size reported... What am I doing wrong? It would also be very cool to get the standard deviation put out, as well as the median. One can calculate these things from the very useful histogram files that inserthistogram=file outputs, but that is not as convenient.

Keep up the good work
Harald
HGV is offline   Reply With Quote