SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
BBMap (aligner for DNA/RNAseq) is now open-source and available for download. Brian Bushnell Bioinformatics 674 06-03-2019 12:46 AM
BBMap for BitSeq dietmar13 Bioinformatics 1 04-30-2015 08:40 AM
Please help my BBMap cannot remove Illumina adapter TofuKaj Bioinformatics 4 04-28-2015 08:53 AM
BBMap Error Phage Hunter Bioinformatics 5 01-14-2015 04:34 AM
Introducing BBMap, a new short-read aligner for DNA and RNA Brian Bushnell Bioinformatics 24 07-07-2014 09:37 AM

Reply
 
Thread Tools
Old 01-22-2018, 12:57 PM   #201
sdriscoll
I like code
 
Location: San Diego, CA, USA

Join Date: Sep 2009
Posts: 438
Default

Hello-

I'm trying to understand this example in the original post:

Code:
bbmap.sh in=reads.fq outu=unmapped.fq int=f
repair.sh in=unmapped.fq out=paired.fq fint outs=singletons.fq
It is provided as an example of how to deal with the non-deterministic behavior of bbmap when aligning paired-end reads. However what is described (map reads as single-end and then re-pair them using repair.sh) doesn't seem to be what these code lines show. If the goal is to get bbmap to produce deterministic paired-end alignments wouldn't the final outcome of the method be an alignment file and not just a FASTQ? Why would I be using the un-aligned reads as part of the final product?
__________________
/* Shawn Driscoll, Gene Expression Laboratory, Pfaff
Salk Institute for Biological Studies, La Jolla, CA, USA */
sdriscoll is offline   Reply With Quote
Old 01-23-2018, 12:46 AM   #202
milan.molbio
Junior Member
 
Location: Switzerland

Join Date: Jan 2018
Posts: 3
Default

Hi guys,

I'm trying to get bbmap to find all hits up to some identity threshold for a 20bp read, but it only finds the perfect hit. This should be possible right? So I must be doing something wrong. Here's the test scenario:

Code:
# file: sample.genome.fa
> 10 GRCz10
TTTCCTGAGGATGTGAGATTACTTCCGGGTTTTACAACAACATGGCAGCGCCCTTAGGTCTGCATCTGGAGTTTGGGTTTGTATTTATTGTTTCCTTTATCATTAATCTATTTTTTTTTTCTTTCTCGTGTTTTTTAAGAACATGGCAGTTTTGTTTTACATCAACTTGGCAGTAAT

# file sample.reads.fa:
> r1
GTTTTACAACAACATGGCAG
GTTTTTTAAGAACATGGCAG

# run.bash
bbmap.sh ref=sample.genome.fa k=3
bbmap.sh minid=0.7 k=3 maxindel=0 strictmaxindel  ref=sample.genome.fa ambiguous=all  in=sample.read.fa out=mapped.sam vslow

# results: mapped.sam
mapped.sam
@HD	VN:1.4	SO:unsorted
@SQ	SN: 10 GRCz10	LN:177
@PG	ID:BBMap	PN:BBMap	VN:37.85	CL:java -Djava.library.path=/Users/milan/Projects/geneassassin/code/ga-pipeline/bin/bbmap/jni/ -ea -Xmx3200m align2.BBMap build=1 overwrite=true fastareadlen=500 minid=0.7 k=3 maxindel=0 strictmaxindel ref=sample.genome.fa ambiguous=all in=sample.read.fa out=mapped.sam vslow
 r1	0	 10 GRCz10	29	40	20=	*	0	0	GTTTTACAACAACATGGCAG	*	NM:i:0	AM:i:40	NH:i:1
expected output:
  1. 1 perfect hit (position 28)
  2. 1 hit with 2 substitutions (position 153, 8:T>A,13:T>A - GTTTTACATCAACTTGGCAG)
  3. 1 hit with 3 substitutions (position 129, 5:T>A,6:T>C,9:G>C - GTTTTTTAAGAACATGGCAG)


Code:
# stdout:

Max memory cannot be determined.  Attempting to use 3200 MB.
If this fails, please add the -Xmx flag (e.g. -Xmx24g) to your command,
or run this program qsubbed or from a qlogin session on Genepool, or set ulimit to an appropriate value.
java -Djava.library.path=/Users/milan/Projects/geneassassin/code/ga-pipeline/bin/bbmap/jni/ -ea -Xmx3200m -cp /Users/milan/Projects/geneassassin/code/ga-pipeline/bin/bbmap/current/ align2.BBMap build=1 overwrite=true fastareadlen=500 minid=0.7 k=3 maxindel=0 strictmaxindel ref=sample.genome.fa ambiguous=all in=sample.read.fa out=mapped.sam vslow
Executing align2.BBMap [tipsearch=150, minhits=1, minratio=0.25, rescuemismatches=50, rescuedist=3000, build=1, overwrite=true, fastareadlen=500, minid=0.7, k=3, maxindel=0, strictmaxindel, ref=sample.genome.fa, ambiguous=all, in=sample.read.fa, out=mapped.sam]
Version 37.85 [tipsearch=150, minhits=1, minratio=0.25, rescuemismatches=50, rescuedist=3000, build=1, overwrite=true, fastareadlen=500, minid=0.7, k=3, maxindel=0, strictmaxindel, ref=sample.genome.fa, ambiguous=all, in=sample.read.fa, out=mapped.sam]

Set MINIMUM_ALIGNMENT_SCORE_RATIO to 0.250
Retaining all best sites for ambiguous mappings.
Set MINIMUM_ALIGNMENT_SCORE_RATIO to 0.449
NOTE:	Ignoring reference file because it already appears to have been processed.
NOTE:	If you wish to regenerate the index, please manually delete ref/genome/1/summary.txt
Set genome to 1

Loaded Reference:	0.075 seconds.
Loading index for chunk 1-1, build 1
Generated Index:	0.004 seconds.
Analyzed Index:   	0.002 seconds.
Started output stream:	0.021 seconds.
Cleared Memory:    	0.116 seconds.
Processing reads in single-ended mode.
Started read stream.
Started 8 mapping threads.
Detecting finished threads: 0, 1, 2, 3, 4, 5, 6, 7

   ------------------   Results   ------------------

Genome:                	1
Key Length:            	3
Max Indel:             	0
Minimum Score Ratio:  	0.4486
Mapping Mode:         	normal
Reads Used:           	1	(20 bases)

Mapping:          	0.208 seconds.
Reads/sec:       	4.81
kBases/sec:      	0.10


Read 1 data:      	pct reads	num reads 	pct bases	   num bases

mapped:          	100.0000% 	        1 	100.0000% 	          20
unambiguous:     	100.0000% 	        1 	100.0000% 	          20
ambiguous:       	  0.0000% 	        0 	  0.0000% 	           0
low-Q discards:  	  0.0000% 	        0 	  0.0000% 	           0

perfect best site:	100.0000% 	        1 	100.0000% 	          20
semiperfect site:	100.0000% 	        1 	100.0000% 	          20

Match Rate:      	      NA 	       NA 	100.0000% 	          20
Error Rate:      	  0.0000% 	        0 	  0.0000% 	           0
Sub Rate:        	  0.0000% 	        0 	  0.0000% 	           0
Del Rate:        	  0.0000% 	        0 	  0.0000% 	           0
Ins Rate:        	  0.0000% 	        0 	  0.0000% 	           0
N Rate:          	  0.0000% 	        0 	  0.0000% 	           0

Total time:     	0.571 seconds.
milan.molbio is offline   Reply With Quote
Old 01-23-2018, 11:46 AM   #203
sdriscoll
I like code
 
Location: San Diego, CA, USA

Join Date: Sep 2009
Posts: 438
Default

@milan.molbio-

bbmap looks for the best alignment and reports only that one by default. If there are multiple "best" then you can get them all by setting 'ambig=all'.

What you're looking for should be possible by enabling 'secondary=t' and then setting both 'sssr' and 'maxsites' to your liking. "secondary" alignments are those with lower mapping scores relative to the best alignment. "sssr" controls the ratio of the best score that is acceptable (default is 0.95 to allow secondary alignments with mapping scores as low as 95% of the best score). Finally "maxsites" sets a cap on the number of secondary alignment sites to report. You can probably just set that to a very high value to be sure you have them "all" - or at least all that bbmap can find.
__________________
/* Shawn Driscoll, Gene Expression Laboratory, Pfaff
Salk Institute for Biological Studies, La Jolla, CA, USA */
sdriscoll is offline   Reply With Quote
Old 01-24-2018, 05:03 AM   #204
milan.molbio
Junior Member
 
Location: Switzerland

Join Date: Jan 2018
Posts: 3
Default

@sdriscoll thanks for the tip! with these two options bbmap does report one 100% semiperfect site read but it's not in the output file. Anything else I'm missing?

the exact command was:
Code:
bbmap.sh secondary=t sssr=0.80  k=3 maxindel=0 strictmaxindel  ref=sample.genome.fa ambiguous=all   maxsites=10  in=sample.read.fa out=mapped.sam vslow
milan.molbio is offline   Reply With Quote
Old 01-24-2018, 05:41 AM   #205
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,978
Default

I am not sure why BBMap is not reporting the two sites in the output SAM. Perhaps that is by design.

Unfortunately @Brian has been missing from online forums for last couple of months. I pinged him again yesterday with ref to this question. We will need him to chime in on these questions.
GenoMax is offline   Reply With Quote
Old 01-24-2018, 11:31 AM   #206
sdriscoll
I like code
 
Location: San Diego, CA, USA

Join Date: Sep 2009
Posts: 438
Default

Yeah I've come to terms with the fact that BBMap wasn't really designed for this type of use. When I want to work with an alignment strategy where the goal is to find "all" alignments of a read up to some maximum error threshold I turn to bowtie (1 or 2) or BWA.
__________________
/* Shawn Driscoll, Gene Expression Laboratory, Pfaff
Salk Institute for Biological Studies, La Jolla, CA, USA */
sdriscoll is offline   Reply With Quote
Old 01-25-2018, 12:13 AM   #207
milan.molbio
Junior Member
 
Location: Switzerland

Join Date: Jan 2018
Posts: 3
Default

Quote:
Originally Posted by sdriscoll View Post
Yeah I've come to terms with the fact that BBMap wasn't really designed for this type of use. When I want to work with an alignment strategy where the goal is to find "all" alignments of a read up to some maximum error threshold I turn to bowtie (1 or 2) or BWA.
i did give them both a try. bowtie seems to reliably find and report them but it's got a limit of maximum 3 mismatches (and I need 4 bwa is also struggling to find all the matches, and i've filed an issue: https://github.com/lh3/bwa/issues/176
milan.molbio is offline   Reply With Quote
Old 03-07-2018, 02:35 PM   #208
Alex Lee
Junior Member
 
Location: N. Cal

Join Date: Apr 2014
Posts: 10
Default

Hi Brian, I have a problem whereby I have a about 12 fastq files and I'm not completely convinced that the barcode for each of the samples are as named in the key. If I have 12 unique 6-mer barcodes eg GTCCGC, etc is there a way to definitively say which samples have the following barcode with bbmerge? thanks!
Alex Lee is offline   Reply With Quote
Old 03-08-2018, 03:20 AM   #209
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,978
Default

I assume you are referring to a standard barcode/index from Illumina tech. If so those are never part of actual read. If they are present in the fastq header then you could use "demuxbyname.sh" to separate the reads you are interested in.

If your index is "inline" (and at a specific location in the read e.g. beginning) then you could use bbduk.sh to match/separate those reads.

I am not sure bbmerge.sh is going to help in your case.
GenoMax is offline   Reply With Quote
Old 03-21-2018, 03:18 PM   #210
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

Quote:
Originally Posted by Alex Lee View Post
Hi Brian, I have a problem whereby I have a about 12 fastq files and I'm not completely convinced that the barcode for each of the samples are as named in the key. If I have 12 unique 6-mer barcodes eg GTCCGC, etc is there a way to definitively say which samples have the following barcode with bbmerge? thanks!
BBMerge might be able to help in this case, if you have paired reads with a sufficient number of short inserts. You can run it like this:

Code:
bbmerge.sh in1=read1.fq in2=read2.fq outa=adapters.fa
This will indicate the adapter sequence for the library, based on read-through from short-insert reads. The barcode is typically part of the adapter, so if you compare the adapters from 2 multiplexed libraries, the location where they differ is the barcode region.
Brian Bushnell is offline   Reply With Quote
Old 03-21-2018, 03:21 PM   #211
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

Quote:
Originally Posted by milan.molbio View Post
Hi guys,

I'm trying to get bbmap to find all hits up to some identity threshold for a 20bp read, but it only finds the perfect hit. This should be possible right? So I must be doing something wrong.
While BBMap is not originally designed for this purpose; I made a version that does a much better job at finding all mappings above some identity threshold, bbmapskimmer.sh. The usage is the same as BBMap; just add the ambig=all flag.
Brian Bushnell is offline   Reply With Quote
Old 03-21-2018, 04:59 PM   #212
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,978
Default

Quote:
Originally Posted by Brian Bushnell View Post
While BBMap is not originally designed for this purpose; I made a version that does a much better job at finding all mappings above some identity threshold, bbmapskimmer.sh. The usage is the same as BBMap; just add the ambig=all flag.
Does this mean that bbmap functions identically but does not report all the alignments in the output file but includes them in the statistics for alignment?
GenoMax is offline   Reply With Quote
Old 06-01-2018, 12:14 PM   #213
darthsequencer
Member
 
Location: california

Join Date: Feb 2012
Posts: 35
Default

Is there a multi-sample pileup? I have a bunch of samples mapped to the same reference. It would be awesome to have that as a table with the coverage for each sample per reference sequence.
darthsequencer is offline   Reply With Quote
Old 10-16-2018, 11:06 PM   #214
SDH
Junior Member
 
Location: Western Australia

Join Date: Sep 2016
Posts: 5
Default

Hello

My sequencing output largely consists of Illumina (NextSeq and Miseq) NON-overlapping paired end reads (the occasional reads will overlap, as the DNA is not all fragmented evenly). For ease of storage and to simplify input data at the start of a workflow, I would like to combine the files.

If I understand correctly, bbmerge is specifically for merging overlapping reads? In which case I am best off simply using the bbmap reformat.sh to combine the reads in to one interleaved file?

Code:
reformat.sh in1=read1.fq in2=read2.fq out=reads.fq

Last edited by SDH; 10-16-2018 at 11:14 PM. Reason: typo.
SDH is offline   Reply With Quote
Old 02-21-2019, 01:06 PM   #215
Grammatonotus
Junior Member
 
Location: California

Join Date: Aug 2016
Posts: 4
Default shredding and mapping long reads to reference error

Hello,

I have a reference and I have an additional genome sequenced via 10x that contains 150k contigs from 1k to 7.5Mb. I am attempting to align the second to the first with BBMap by using the flag maxlen=2000 as there are extensive rearrangements. This morning I found this error message. Can anyone tell me what might have happened? Thanks.
Code:
'BBMap failed with error code 1
Executing align2.BBMapPacBio [maxlen=2000, maxindel=4000, ambiguous=random, k=15, saa=f, maxlen=6000, minratio=0.40, minscaf=100, startpad=10000, stoppad=10000, midpad=6000, ref=ref.fasta, out=bbmap.sam, in=reads1.fasta]

BBMap version 37.64
Set MINIMUM_ALIGNMENT_SCORE_RATIO to 0.400
Choosing a site randomly for ambiguous mappings.
Writing reference.
Executing dna.FastaToChromArrays2 [ref.fasta, 1, writeinthread=false, genscaffoldinfo=true, retain, waitforwriting=false, gz=true, maxlen=536670912, writechroms=true, minscaf=100, midpad=6000, startpad=10000, stoppad=10000, nodisk=false]

Set genScaffoldInfo=true
Writing chunk 1
Writing chunk 2
Writing chunk 3
Writing chunk 4
Writing chunk 5
Writing chunk 6
Writing chunk 7
Set genome to 1

Loaded Reference: 0.007 seconds.
Loading index for chunk 1-7, build 1
No index available; generating from reference genome: /Users/wilfordbrimley/Geneious Prime/temp/17/ref/index/1/chr1-3_index_k15_c2_b1.block
No index available; generating from reference genome: /Users/wilfordbrimley/Geneious Prime/temp/17/ref/index/1/chr4-7_index_k15_c2_b1.block
Indexing threads started for block 0-3
Indexing threads started for block 4-7
Indexing threads finished for block 0-3
Indexing threads finished for block 4-7
Generated Index: 249.271 seconds.
Analyzed Index: 71.551 seconds.
Started output stream: 0.096 seconds.
Cleared Memory: 0.195 seconds.
Processing reads in single-ended mode.
Started read stream.
Started 24 mapping threads.
Exception in thread "Thread-29" java.lang.AssertionError: -30, -30
89764, 2,0,149134394,149142098,0,00,971,475770,460506,0,389847,149134394~149139756~149140058~149142098,mmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmSmmmmmmmmmmmmmmmmmmmmmmSmmmmmmmmmmmmmmmmmSmmmmmmmmmmmmmmmmSmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmSmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmSmmmmmmmmmmmmmmmmmmmmmmmmmSmmmmmmmmmmmmmmmmmmmmmmSmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmSmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmSmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmSmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmSmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmSmmmmmmmmmmmmmmmmmmmSmmmmmmmmmmmmmmmmmmmmmSmmmmmmmmmmmmmmmSmmmmmmmmmmmSmmmSmmmmmmmmmmmmmmmmmmmmmmmmSmmmmmmmmmmmmmmmmmmmSmmmmmmmmmmSmmmmmmmmmmmmmmmSmmmmmmmmmmmmmmmmmmmmmmmmmmSmmmSmmmmmmmmmmmmmmmSm
...
GCATTCAGACAAGGGTAGCAATATCAATCAACTCTATCTGCTTACAGGAATTGAGCTAGCATAAAGTGCTAAACTCATACTATCAGGCATAACCTAGTCAGGCTATCCTACTTACTGTCTACTCAATATAGGGGTGAGCATTTGGACCGGTGGACCGGACCGGACCGGTGAACCGGACCGGACCGAACCGGATCGGACCGTTTTTTTGGACCCCCGGGCCGGTTCTCGGTTCTAAGATTTCTCGCCGGCCCGGTCTGGTTTTTTTCCCGGTCCGGTCCGGTTTTTACCGGGTTTAGAACCGGTTGGACATGATTAAGATTTTTGTGACGTTTTTTAAACGCACATAACTCATTTGTTTTAAGTTGGATTGACCTGCGGTTTTTCCAACATTTAGTTTACAATTTGTAGATGAATTTAGACTATAATTTTGTTTATTTTGGTATTATATTCACCGTGTTACAATCCTTCAAAGTAGACCTAACAAAGCTGAAAAAATTCAGTTTTTCAGCAACATTTTGTAAGCTGTGACGTTTTTTAAACATCCATAACTCATTCGTTTTAATTCGGATTGACCTCCGATTTTTTCCAACATTTAGTTTACAGTTTGTAGATGAGTTTAGACTATAATTTTGTTTATTTTGGTATTATATTCACTGAGCTACAATCCTTTAAAGTAGAGCTATCAAAGTTGAAAAAATTTAGCTTTTTAGCAAAATTTTATAATCTGTGACGTTTTTTAAACACCCATAACTCATTCGTTTTAAGTTGGATTGACCTACGGTTTTTTCCAACATTTAGTTTACAATTTGTAGATGAATTTAGACTATAATTTTGTTTATTTTGGTATTATATGCATTGAGTTACAATCCTTCAAAGTAGAGCTATCAAAGCTGAAATATTTCAGCTTTTTAGTATTATTTTATACTATAATCCCGAGTCCGCGTTTAGCACTTGTGGTAGAGTTGTAAGTGAGTATCAAACTAGTTTATCGACATTGATAGTTGAAGCTTTGTTATGCACCCAAGATTGGGTGATAAAGTTTACAAATCCGATAGTTGACAACGTTGGTGACATCTTGAACGATGATGATATAACTTTGGGTAAAAAAATTACAATCTTTATTCTTTTTTTATACAATTATTACTTTTTATTAAATACTAACAACATTTTCATTTGTGTTTATGTAGAGATTGCACAAGCTTTAAATATGTTACATATGGATGACAATGATATGGGGAAGAGGCCAATAGAAGATTAGATTATGAGTTTATGAAATTTGTTAGTTTGATTGATTTTAAAGTTTAAACTATGATTTTTGTTGTTACGGTAATACTTTTGTATGTTTGTCTTAAAATTGTTTAACTTTTGTTGGTGAACTTGATTTATATGTTAGTTGCTTTTATTTGGAAAATTAATTTTTGCAAATTGCAAGTTATAATATTATACATCAATGTAGATTAGTTAGGCATGAAACGGAAACATATTAATTTTGCAAGTTATAATATTATACATCAATGTAGATTACTTATTATACTCTATGCAATTAAAGTTGTACCGGTCCAAACGAGTCCAAAAACCGGAAAACCCGGACCTAGACCCGGACACGGACACGGACACGGACACGACTAATCCGGTCCGGTCCATGGTCCACAAAATTCTCCATAGCCGGTCCGGTCCGATTCGGTCCGGGCCTGTCCAAATGCTCACCCCTAACTTAATATCTAGCATGCAATTCTTATACAACTGAAACACATAAAGCAGGCACATAAGGCATCACTCCTAGATCCTTAGTCCTATTCTAGCATGCTGTTCTACAAAAGCTGATAAACATAACATAAACTTGTAAGGGTACTTTGGGGAATACTTACTTGAGCTCGGCCGGTCGCGCACATCACACACTTCGTTCTTTCTCGAAACTCTTTTCTTAGTTTTTTAGAAAACATTTTCTTTTTAGAAAATCTTTTAATCCCTTGATTTGAGTTCAGACACCCCCGAAGGTGTGTCCGAATCCCTCAAACCAGGGCTCTGATACCAACTTGTAACGACCCAAATTTCACGTTCAAAAATTTCGTTTTAAAATGTTACTTTAGAAAACATTAATAATTAAAACATTGTTTGATCAAACCATAACACAACGTAAACCATGTCTAAAAGCCAAATCACACAACCAATCAAACATCAGAGTATAAAACCCAAAGAATCTCG
at align2.MultiStateAligner9PacBio.makeGref(MultiStateAligner9PacBio.java:1388)
at align2.MultiStateAligner9PacBio.fillLimited(MultiStateAligner9PacBio.java:96)
at align2.TranslateColorspaceRead.realign_new(TranslateColorspaceRead.java:565)
at align2.TranslateColorspaceRead.realign_new(TranslateColorspaceRead.java:641)
at align2.AbstractMapThread.genMatchStringForSite(AbstractMapThread.java:1006)
at align2.AbstractMapThread.genMatchString(AbstractMapThread.java:902)
at align2.BBMapThreadPacBio.processRead(BBMapThreadPacBio.java:563)
at align2.AbstractMapThread.run(AbstractMapThread.java:502)
Detecting finished threads: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23

**************************************************************************
Warning! 2 mapping threads did not terminate normally.
Check the error log; the output may be corrupt or incomplete.
Please submit the full stderr output as a bug report, not just this message.
**************************************************************************




------------------ Results ------------------

Genome: 1
Key Length: 15
Max Indel: 4000
Minimum Score Ratio: 0.4
Mapping Mode: normal
Reads Used: 581616 (2969452913 bases)

Mapping: 177826.639 seconds.
Reads/sec: 3.27
kBases/sec: 16.70


Read 1 data: pct reads num reads pct bases num bases

mapped: 81.2921% 472808 78.5245% 2331747518
unambiguous: 76.0459% 442295 75.7328% 2248849209
ambiguous: 5.2462% 30513 2.7917% 82898309
low-Q discards: 12.7684% 74263 15.0015% 445463533

perfect best site: 4.9337% 28695 4.2346% 125743838
semiperfect site: 5.0009% 29086 4.2700% 126794871

Match Rate: NA NA 84.9205% 2173341879
Error Rate: 87.8894% 442044 13.2182% 338289025
Sub Rate: 84.8676% 426846 1.8940% 48472068
Del Rate: 67.4695% 339341 8.8901% 227520440
Ins Rate: 68.5198% 344624 2.4342% 62296517
N Rate: 30.3789% 152792 1.8614% 47637054
Exception in thread "main" java.lang.AssertionError:
The number of reads out does not add up to the number of reads in.
This may indicate that a mapping thread crashed.
If you submit a bug report, include the entire console output, not just this error message.
238640+234168+0+34543+74263 = 581614 != 581616
at align2.AbstractMapper.printOutputStats(AbstractMapper.java:1892)
at align2.AbstractMapper.printOutput(AbstractMapper.java:1022)
at align2.BBMapPacBio.testSpeed(BBMapPacBio.java:462)
at align2.BBMapPacBio.main(BBMapPacBio.java:37)
'

Last edited by Grammatonotus; 02-22-2019 at 08:10 PM.
Grammatonotus is offline   Reply With Quote
Old 02-21-2019, 01:40 PM   #216
Grammatonotus
Junior Member
 
Location: California

Join Date: Aug 2016
Posts: 4
Default

A little bit more information: the DNA sequence in the error message maps to the middle of a contig about 2.8 Mb, starts 402,000 bases in. I tried running this both locally and on a cluster and got the same error message with almost exactly the same sequence in each-- 2045 bases in one, 2205 bases in the other, with ~99% overlap. Any insights appreciated.
Grammatonotus is offline   Reply With Quote
Old 02-21-2019, 02:45 PM   #217
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,978
Default

@grammtonotus: How much memory are you allocating for this job? With that large sized fragments you are likely going to need plenty.

I also suggest that you look into using `minimap2` from Heng Li as an alternate option to BBMap.
GenoMax is offline   Reply With Quote
Old 02-21-2019, 03:47 PM   #218
Grammatonotus
Junior Member
 
Location: California

Join Date: Aug 2016
Posts: 4
Default

When I run it locally about 52 Gb (I ran out of system memory when I had it set to 58), on the cluster 439 Gb. I hope that's not the problem!
Grammatonotus is offline   Reply With Quote
Old 02-21-2019, 05:26 PM   #219
Grammatonotus
Junior Member
 
Location: California

Join Date: Aug 2016
Posts: 4
Default

a colleague suggests that the mapper may have come to a read that hits too many places in the reference. I have it set to place this at Random, would changing to First Best, None, or All be likely to give a different result?

Also wondering if setting maxlen=4000 might change anything. I don't know how diverged the two are so kind of just guessing really. Thanks.
Grammatonotus is offline   Reply With Quote
Old 02-22-2019, 06:21 AM   #220
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,978
Default

Let us know if using more memory made a difference.

You should really be using a different tool designed for long read alignments. Even LASTZ may be appropriate since you have query sequences as long as 7.5Mb.
GenoMax is offline   Reply With Quote
Reply

Tags
bbmap

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 10:38 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO