Go Back   SEQanswers > Bioinformatics > Bioinformatics

Similar Threads
Thread Thread Starter Forum Replies Last Post
Bowtie, an ultrafast, memory-efficient, open source short read aligner Ben Langmead Bioinformatics 513 05-14-2015 02:29 PM
Introducing BBMap, a new short-read aligner for DNA and RNA Brian Bushnell Bioinformatics 24 07-07-2014 09:37 AM
Miso's open source joyce kang Bioinformatics 1 01-25-2012 06:25 AM
Targeted resequencing - open source stanford_genome_tech Genomic Resequencing 3 09-27-2011 03:27 PM
EKOPath 4 going open source dnusol Bioinformatics 0 06-15-2011 01:10 AM

Thread Tools
Old 07-25-2018, 04:24 PM   #641
Junior Member
Location: San Francisco, CA

Join Date: Apr 2014
Posts: 1
Default Add hg19 masked reference to distribution

I'm using BBTools via bioconda and the corresponding docker container. The image has the necessary resources, e.g. the adapters fasta file:

 Wed 25 Jul - 17:10  ~/code/tick-genome/reflow   origin ☊ master 9☀ 1● 
  docker run -it -v $PWD:/data bash
bash-4.2# find . -name adapters.fa
bash-4.2# cd ./usr/local/opt/bbmap-38.06/resources
bash-4.2# ll
bash: ll: command not found
bash-4.2# ls 
adapters.fa                          blacklist_silva_species_500.sketch   lambda.fa.gz                         nextera_LMP_linker.fa.gz             primes.txt.gz                        sequencing_artifacts.fa.gz
adapters_no_transposase.fa.gz        contents.txt                         lfpe.linker.fa.gz                    pJET1.2.fa                           remote_files.txt                     short.fa
blacklist_img_species_300.sketch     crelox.fa.gz                         mtst.fa                              phix174_ill.ref.fa.gz                remote_files_old.txt                 truseq.fa.gz
blacklist_nt_species_1000.sketch     favicon.ico                          nextera.fa.gz                        phix_adapters.fa.gz                  sample1.fq.gz                        truseq_rna.fa.gz
blacklist_refseq_species_250.sketch  kapatags.L40.fa                      nextera_LMP_adapter.fa.gz            polyA.fa.gz                          sample2.fq.gz
However, the script uses a hardcoded path for the masked human genome posted in the RemoveHuman thread.

	local CMD="java -Djava.library.path=$NATIVELIBDIR $EA $z -cp $CP align2.BBMap minratio=0.9 maxindel=3 bwr=0.16 bw=12 quickmatch fast minhits=2 path=/global/projectb/sandbox/gaag/bbtools/hg19 pigz unpigz zl=6 qtrim=r trimq=10 untrim idtag usemodulo printunmappedcount usejni ztd=2 kfilter=25 maxsites=1 k=14 [email protected]
Can the masked genome be included in the distribution?

Thank you!
olgabot is offline   Reply With Quote
Old 08-07-2018, 07:45 AM   #642
Location: Canada

Join Date: Apr 2013
Posts: 17

Hello Brian,
After running, how can I combine the sequence of the same ID?
for example I want to combine the sequences as following:
m151006_234406_42219_c100867912550000001823195203031665_s1_p0/110457/57769_70466 id=3_0_part_2_6
m151006_234406_42219_c100867912550000001823195203031665_s1_p0/110457/57769_70466 id=3_0_part_3

sunnycqcn is offline   Reply With Quote
Old 08-09-2018, 05:56 AM   #643
Location: Bethesda, MD

Join Date: Oct 2010
Posts: 47
Default pull out sequences with matching primers

Hi Brian,
I was wondering if bbmap has a tool that will pull out reads matching a particular primer sequences? I have fastq files with amplicons from 12 different primers in the same file so i want to make subsets of the reads having specific primers of interest from this.

i have used your tool for other tasks so i figured I would ask if it also has this capability?

Thank you,
JenBarb is offline   Reply With Quote
Old 08-09-2018, 06:08 AM   #644
Senior Member
Location: Bethesda MD

Join Date: Oct 2009
Posts: 498

@JenBarb see this thread in Biostars.
HESmith is offline   Reply With Quote
Old 08-09-2018, 07:09 AM   #645
Location: Bethesda, MD

Join Date: Oct 2010
Posts: 47

Thank you! Love the tool!
JenBarb is offline   Reply With Quote
Old 08-14-2018, 09:05 PM   #646
Location: Japan

Join Date: Sep 2017
Posts: 24

Hoping somebody can help me with this.

I used BBMap and now I would like to extract the reads from by .bam file that are split (/chimeric?) ie. reads that indicate a deletion.

I tried to use samblaster, but it doesn't recognize any reads as split...
(samtools view -h in.bam | samblaster -a -s split.sam -o /dev/null)
Are the split reads marked differently in BBMap compared to other aligners causing samblaster to fail?

IGV shows a good amount of reads with deletions and I can also call deletions using BBTools - so I know they are in there. I just have a feeling callvariants is calling fewer deletions and with lower coverage than what IGV suggests, so I want to check up on it.
Meyana is offline   Reply With Quote
Old 08-15-2018, 09:45 AM   #647
Location: Bethesda, MD

Join Date: Oct 2010
Posts: 47
Default mkf argument in (bbmap tool)

I am trying to use the flag mkf (minkmerfraction) and I am getting an error that that argument does not exist.
sh /data/barbj/bbmap/ in=./../Stool_001-01.fastq outm=v2fstoolfq.fa literal=CTCAAACTTGGGTAATTAAACC k=17 mkf=0.8
java -Djava.library.path=/data/barbj/bbmap/jni/ -ea -Xmx39767m -Xms39767m -cp /data/barbj/bbmap/current/ jgi.BBDukF in=./../Stool_001-01.fastq outm=v2fstoolfq.fa literal=CTCAAACTTGGGTAATTAAACC k=17 mkf=0.8
Executing jgi.BBDukF [in=./../Stool_001-01.fastq, outm=v2fstoolfq.fa, literal=CTCAAACTTGGGTAATTAAACC, k=17, mkf=0.8]

Exception in thread "main" java.lang.RuntimeException: Unknown parameter mkf=0.8
at jgi.BBDukF.<init>(
any ideas why this is not working?

JenBarb is offline   Reply With Quote
Old 08-23-2018, 06:44 AM   #648
Junior Member
Location: Europe

Join Date: Oct 2016
Posts: 2
Default bbmap aborts after mapping some reads

Hello Brian,

we are using bbmap to see in how far it is possible to quantify gene expression by mapping Illumina RNA-seq reads to the genome of a closely related species, e.g. map chimpanzee reads to human or as in this example Macaque reads.

To this end, we generated Macaque Illumina SE reads using flux-simulator and map them to
hg38 and for comparison we were also trying also Mmul8, downloaded from ensembl (wget

Everything mapped fine to hg38, but not to Mmul8.

Exception in thread "Thread-12" java.lang.AssertionError
at align2.BBIndex.extendScore(
at align2.BBIndex.slowWalk3(
at align2.BBIndex.find(
at align2.BBIndex.find(
at align2.BBIndex.findAdvanced(
at align2.AbstractMapThread.quickMap(
at align2.BBMapThread.processRead(

I tried to run on one thread, increased memory to 101G, removed small contigs of <100kb ... but the error message remains the same.

We are running a Debian system with java version "1.8.0_181" and have BBMap version 38.02 -- the detailed error output is in the attached file.

The false Mapping Rates of bbmap are so much better than for STAR & GSNAP, that we definitely want to use bbmap for our paper and we are nearly done all other species (marmoset, gorilla, chimpanzee and orangutan) and the simulations ran through -- the only missing piece is the mapping to the Mmul8.

Any help would be greatly appreciated.

Best, Ines
Attached Files
File Type: txt Mmul1.701837.txt (4.0 KB, 0 views)
ellybelly is offline   Reply With Quote
Old 09-07-2018, 09:24 AM   #649
Location: BC

Join Date: Aug 2010
Posts: 18
Default bbmap for demultiplexing dual barcodes.

I need it if possible to use dual indexes.

For example: In bold dual barcode

#R1 read

#R2 read

Here are 16 possible in the file I am working on.

The first four nts are the barcode like our example before would be:

But you would need both reads to tell you that it's GACT-CTGA and not something else.
What would the command look like for this? Does this demux script do the dual barcoding?
raw937 is offline   Reply With Quote
Old 09-25-2018, 08:01 AM   #650
Junior Member
Location: Europe

Join Date: Sep 2018
Posts: 2
Default ref input for BBMap and paired ends

I am sorry if this question is very basic but I am getting a low percentage of mapping reads to the reference genome, about the 36% of the pct reads mapped. Any clue what this is the case?

I am using as the reference genome the genome in scaffolds and paired-end reads...
juanita is offline   Reply With Quote
Old 09-25-2018, 09:47 AM   #651
Registered Vendor
Location: Eugene, OR

Join Date: May 2013
Posts: 451

Originally Posted by juanita View Post
I am sorry if this question is very basic but I am getting a low percentage of mapping reads to the reference genome, about the 36% of the pct reads mapped. Any clue what this is the case?

I am using as the reference genome the genome in scaffolds and paired-end reads...
Have you trimmed adapters away from the reads (short fragments will create reads that are part genomic and part adapter and may not map). You could use the related BBmap tool sendsketch to get a sense of what is in your reads (after trimming). When we do genotyping of samples, many samples have contaminating using sendsketch can help figure out what is in there. You can input the entire fastq file with sendsketch, or go to read mose and get a result on a per read basis.

You can also grab 100 reads, turn them into fasta format and do blastn with them (if online use the blastn rather than megablast option) and see read by read what is in there.

Other options...your sample is not highly related to the reference, the reference may be incomplete and missing regions, the reference is lacking high copy repeat content like mtDNA or chloroplast and many reads go to those.
Providing nextRAD genotyping and PacBio sequencing services.
SNPsaurus is offline   Reply With Quote
Old 10-11-2018, 07:36 AM   #652
Junior Member
Location: Denver, CO

Join Date: Mar 2009
Posts: 1
Default usejni and compiled C code in BBTools

I just installed the latest version of the BBTools (38.26), and I notice that the C code provided by the usejni=t flag for some tools has been depreciated / disabled.

I found this in the changelog:
Removed JNI path flag from BBMerge, BBMap, and RQCFilter shell scripts.
and this in docs/compiling.txt:
3) C code. This was developed by Jonathan Rood to accelerate BBMap, BBMerge, and Dedupe, but is currently disabled.
Sure enough, it is commented out in the code:
        #local CMD="java -Djava.library.path=$NATIVELIBDIR $EA $z -cp $CP align2.BBMap build=1 overwrite=true fastareadlen=500 [email protected]"
        local CMD="java $EA $z -cp $CP align2.BBMap build=1 overwrite=true fastareadlen=500 [email protected]"
If I revert to the previous version of the CMD, with the java.library.path set, then the command runs with the compiled C code just fine.

Why was this disabled? Does this affect previous analyses that used this C code? That is, does the C code contain an error that means usejni=t in previous versions will produce different output than the java-only code? Or was this purely a performance or compatibility issue, or something else?

Sorry if I've missed this already posted somewhere, and thanks in advance for any help.

csmiller is offline   Reply With Quote

bbmap, metagenomics, rna-seq aligners, short read alignment

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

All times are GMT -8. The time now is 03:26 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO