Seqanswers Leaderboard Ad

**junaruga** · 11-30-2019, 12:44 PM

Hi Brain,

I am trying to run bbmap.sh with your data like this on my laptop (Linux Fedora 30, memory 8GB). Then I got following error "/usr/local/bin/bbmap.sh: line 349: 9039 Killed java ...". Do you have any idea to fix it? Thank you.

Code:

$ bbmap.sh --version
java -Djava.library.path=/home/root/local/bbmap/jni/ -ea -Xmx1124m -cp /home/root/local/bbmap/current/ align2.BBMap build=1 overwrite=true fastareadlen=500 --version
Executing align2.BBMap [build=1, overwrite=true, fastareadlen=500, --version]

BBMap version 37.10

$ java -version
openjdk version "1.8.0_232"
OpenJDK Runtime Environment (build 1.8.0_232-b09)
OpenJDK 64-Bit Server VM (build 25.232-b09, mixed mode)

$ bbmap.sh -Xmx24G ref=hg19_main_mask_ribo_animal_allplant_allfungus.fa.gz
...
Loaded Reference:	0.020 seconds.
Loading index for chunk 1-7, build 1
No index available; generating from reference genome: /home/jaruga/data/yamp/ref/index/1/chr1-3_index_k13_c2_b1.block
Indexing threads started for block 0-3
Indexing threads finished for block 0-3
No index available; generating from reference genome: /home/jaruga/data/yamp/ref/index/1/chr4-7_index_k13_c2_b1.block
Indexing threads started for block 4-7
/usr/local/bin/bbmap.sh: line 349:  9039 Killed                  java -Djava.library.path=/home/root/local/bbmap/jni/ -ea -Xmx24G -cp /home/root/local/bbmap/current/ align2.BBMap build=1 overwrite=true fastareadlen=500 -Xmx24G ref=hg19_main_mask_ribo_animal_allplant_allfungus.fa.gz

$ echo $?
137

Detail: https://gist.github.com/junaruga/902...file-bbmap-L42

**junaruga** · 11-30-2019, 12:50 PM

/usr/local/bin/bbmap.sh: line 349: 9039 Killed

Hi Brain,

I am trying to run bbmap.sh with your data like this on my laptop (Linux Fedora 30, memory 8GB). Then I got following error "/usr/local/bin/bbmap.sh: line 349: 9039 Killed java ...". Do you have any idea to fix it? Thank you.

Code:

$ bbmap.sh --version
java -Djava.library.path=/home/root/local/bbmap/jni/ -ea -Xmx1124m -cp /home/root/local/bbmap/current/ align2.BBMap build=1 overwrite=true fastareadlen=500 --version
Executing align2.BBMap [build=1, overwrite=true, fastareadlen=500, --version]

BBMap version 37.10

$ java -version
openjdk version "1.8.0_232"
OpenJDK Runtime Environment (build 1.8.0_232-b09)
OpenJDK 64-Bit Server VM (build 25.232-b09, mixed mode)

$ bbmap.sh -Xmx24G ref=hg19_main_mask_ribo_animal_allplant_allfungus.fa.gz
...
Loaded Reference: 0.020 seconds.
Loading index for chunk 1-7, build 1
No index available; generating from reference genome: /home/jaruga/data/yamp/ref/index/1/chr1-3_index_k13_c2_b1.block
Indexing threads started for block 0-3
Indexing threads finished for block 0-3
No index available; generating from reference genome: /home/jaruga/data/yamp/ref/index/1/chr4-7_index_k13_c2_b1.block
Indexing threads started for block 4-7
/usr/local/bin/bbmap.sh: line 349:  9039 Killed                  java -Djava.library.path=/home/root/local/bbmap/jni/ -ea -Xmx24G -cp /home/root/local/bbmap/current/ align2.BBMap build=1 overwrite=true fastareadlen=500 -Xmx24G ref=hg19_main_mask_ribo_animal_allplant_allfungus.fa.gz

$ echo $?
137

Detail: https://gist.github.com/junaruga/902...file-bbmap-L42

**junaruga** · 12-02-2019, 03:35 PM

/usr/local/bin/bbmap.sh: line 349: 9039 Killed java ...

Hi Brain,

I am trying to run bbmap.sh with your data like this on my laptop (Linux Fedora 30, memory 8GB). Then I got following error "/usr/local/bin/bbmap.sh: line 349: 9039 Killed java ...". Do you have any idea to fix it? Thank you.

Code:

$ bbmap.sh --version
java -Djava.library.path=/home/root/local/bbmap/jni/ -ea -Xmx1124m -cp /home/root/local/bbmap/current/ align2.BBMap build=1 overwrite=true fastareadlen=500 --version
Executing align2.BBMap [build=1, overwrite=true, fastareadlen=500, --version]

BBMap version 37.10

$ java -version
openjdk version "1.8.0_232"
OpenJDK Runtime Environment (build 1.8.0_232-b09)
OpenJDK 64-Bit Server VM (build 25.232-b09, mixed mode)

$ bbmap.sh -Xmx24G ref=hg19_main_mask_ribo_animal_allplant_allfungus.fa.gz
...
Loaded Reference: 0.020 seconds.
Loading index for chunk 1-7, build 1
No index available; generating from reference genome: /home/jaruga/data/yamp/ref/index/1/chr1-3_index_k13_c2_b1.block
Indexing threads started for block 0-3
Indexing threads finished for block 0-3
No index available; generating from reference genome: /home/jaruga/data/yamp/ref/index/1/chr4-7_index_k13_c2_b1.block
Indexing threads started for block 4-7
/usr/local/bin/bbmap.sh: line 349:  9039 Killed                  java -Djava.library.path=/home/root/local/bbmap/jni/ -ea -Xmx24G -cp /home/root/local/bbmap/current/ align2.BBMap build=1 overwrite=true fastareadlen=500 -Xmx24G ref=hg19_main_mask_ribo_animal_allplant_allfungus.fa.gz

$ echo $?
137

Detail: https://gist.github.com/junaruga/902...file-bbmap-L42

**uloeber** · 01-05-2020, 03:10 AM

how may I create a masked genome reference for mice?

Hi Brian,
I would love to have a masked version of Mus musculus genome. Could you probably share the details how you created the human masked genome, such that I can replicate this for mice?
Bests,
Ulrike

**uloeber** · 01-05-2020, 03:27 AM

Originally posted by Brian Bushnell View Post

... and the entire silva ribosomal database. .

Since then there were some SILVA updates, latest (V138) end of last year. Would you mind to update the human masked hg19? Much appreciated your effort in the community!
Thank you so much!

**GenoMax** · 01-05-2020, 05:01 AM

@uloeber: @Brian described the method of masking in first post of this thread. It uses "bbmask.sh" which is a component of BBMap suite. Take a look at in-line help for bbmask.sh to get an idea of what parameters you can use.

1) Areas containing multiple repeat short kmers.
2) Windows of low entropy, calculated using pentamer frequencies.
3) Via mapping from sam files:
The mapping is the more interesting approach. I used every fungal assembly on JGI's Mycocosm, every plant genome on JGI's phytozome, and a handful of others, including zebra danio (the closest relative to human I included), and the entire silva ribosomal database. First, I completely removed the assemblies that contained human contamination (a handful). Then, I shredded the remaining data into 70-80bp pieces, mapped it to human requiring a minimum of ~85% identity, and used BBMask to mask everything the shreds hit.

**hjruscheweyh** · 04-15-2020, 04:25 AM

Hi Everyone!

Thank you for existing masked databases and the guidance on how to create your own database. I think it would be good if there would be a more formalised guide that includes the actual commands to execute so that there are no misunderstandings. I will try to write it up and would be happy if you would correct me if something is missing or wrong.

The goal is to mask all bases from a host that could also come from the microbiome. I will focus only on Archaea and Prokaryotes.

1. Input:
- A genome/gene database with a as complete as possible representation of bacterial/archaeal species. E.g. 100k species from Progenomes2 (now referred to as progrenomes.fasta)
- The host genome (now referred to as host.fasta)
- The silva 138 database (now referred to as silva.fasta)
- bbmap.sh/bbmask.sh

2. Output:
- A masked host genome (now referred to as host.masked.fasta)

How to get there:

- Take progenomes.fasta + silva.fasta and split all sequences into subsequences of 80bp. Should they be overlapping sequences extracted like in a sliding window or do you just chop each sequence into subsequences of equal length? (Output now referred to as progrenomes.silva.chopped.fasta). Does bbmap have a tool for this shredding?
- Align the progenomes.silva.chopped.fasta against the host genome and store as sam file. command: bbmap.sh ref=host.fasta in=progrenomes.silva.chopped.fasta minid=0.85 out=progenomes.silva.chopped.sam mappedonly=t secondary=t idfilter=0.85 cigar=t
- Use bbmask.sh with standard parameters and the progenomes.silva.chopped.sam as input. command: bbmask.sh in=host.fasta sam=progenomes.silva.chopped.sam out=host.masked.fasta

Does this sound about right?

Thanks a lot,

Hans

**tejus.shinde** · 01-05-2023, 11:42 AM

Hello everyone,

I have a small problem in understanding the working of 'removehuman.sh' script from the BBTools (Version 39.01). Maybe I did something wrong.

I was trying to remove human reads from paired-end fastq.gz files with a cmd similar to this:

bash $BBMAP_dir/removehuman.sh in1=${R1_File} in2=${R2_File} out1=${sample_name}_cleaned_1.fastq.gz out2=${sample_name}_cleaned_2.fastq.gz t=12

It invoked various commands in the backend and I got 2 output files as expected. The output files are lesser in size than the original files, so I assumed the difference is due to human read removal.

For eg. one of the file sizes were as was such:

266K SRR1X_1.fastq.gz 252K SRR1X_cleaned_1.fastq.gz
249K SRR1X_2.fastq.gz 235K SRR1X_cleaned_2.fastq.gz

But when I looked into the read count of all 4 files (2 input & 2 output files) with FASTQC (Version 11.9) s/w, they were all the same.
I confirmed the read count with a simple cmd to count the number of lines and divide it by 4. It was the same read count even then.

Cmd to count reads: echo $(zcat ${FILENAME}.fastq.gz|wc -l)/4|bc

So my question is,
a. Is this normal behavior? (Are the reads not removed but retained & just masked in the original file somehow?) OR
b. Did the code not work the way I was expecting it to work? (i.e, to remove human reads and give me all other reads only)

Clearly, I am overthinking this thing and ran the code in a wrong way. But still, can you help me understand what went wrong?

The only difference that I saw that
1. The reads in the final files were in a different order than before
2. The id variable and other things (from the 3rd line of each read) were removed from the output file.

For eg. The green highlighted part was missing form the output file.
One read from the original file:
@SRR13867754.1 SN640:172:HT7GYBCX2:1:1108:7955:3638 length=34
GAACGCTGGGCAGGGGGGGCGTCATGAGGCTACC
+SRR13867754.1 SN640:172:HT7GYBCX2:1:1108:7955:3638 length=34
DDDDDIIIIIIIIIIIIIIIIIIIIIIIIIIIII

The same read from the output file:
@SRR13867754.1 SN640:172:HT7GYBCX2:1:1108:7955:3638 length=34
GAACGCTGGGCAGGGGGGGCGTCATGAGGCTACC
+
DDDDDIIIIIIIIIIIIIIIIIIIIIIIIIIIII

I think the code ran fine without any errors. But maybe it was the wrong way of running it.

java -ea -Xmx15000m -Xms15000m -cp /home/gpfs/o_shinde/Softwares_Tejus/bbmap/current/ align2.BBMap minratio=0.9 maxindel=3 bwr=0.16 bw=12 quickmatch fast minhits=2 path=$path_human_genome/GRCh38_human_genome pigz unpigz zl=6 qtrim=r trimq=10 untrim idtag usemodulo printunmappedcount ztd=2 kfilter=25 maxsites=1 k=14 bloomfilter in1=$path/SRR1X_1.fastq.gz
in2=$path/SRR1X_2.fastq.gz
out1=SRR1X_cleaned_1.fastq.gz out2=SRR1X_cleaned_2.fastq.gz t=12

Executing align2.BBMap [tipsearch=20, maxindel=80, minhits=2, bwr=0.18, bw=40, minratio=0.65, midpad=150, minscaf=50, quickmatch=t, rescuemismatches=15, rescuedist=800, maxsites=3, maxsites2=100, minratio=0.9, maxindel=3, bwr=0.16, bw=12, quickmatch, minhits=2, path=/home/gpfs/o_shinde/References_local/GRCh38_human_genome, pigz, unpigz, zl=6, qtrim=r, trimq=10, untrim, idtag, usemodulo, printunmappedcount, ztd=2, kfilter=25, maxsites=1, k=14, bloomfilter,
in1=$path/SRR1X_1.fastq.gz,
in2=$path/SRR1X_2.fastq.gz,
out1=SRR1X_cleaned_1.fastq.gz, out2=SRR1X_cleaned_2.fastq.gz, t=12]

Version 39.00

P.S. Attaching the fast files for reference.

Attached Files

**fconstancias** · 06-01-2023, 11:20 AM

Any chance someone generated a bacterial genome masked human genome database using the latest human genome?

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 25 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 29 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 24 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 52 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News