SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Contaminant in dscDNA sample? vanillasky Sample Prep / Library Generation 2 06-19-2014 10:17 PM
Heuristic search for contaminant motifs k-gun12 Bioinformatics 0 03-14-2013 09:50 AM
rRNA contaminant screening aligning small RNA honey Bioinformatics 0 05-31-2012 07:32 PM
18bp DNA contaminant in ChIP input sample? sinapius Sample Prep / Library Generation 0 02-09-2011 11:39 AM
BWA and screening illumina reads for contaminant jmartin Bioinformatics 3 04-22-2010 03:55 PM

Reply
 
Thread Tools
Old 08-27-2017, 11:23 AM   #21
yang zhang
Junior Member
 
Location: Corvallis, Oregon, USA

Join Date: Aug 2017
Posts: 2
Default

Hi Brain,

This might be a silly question, but do we need to index the reference every time for every sample?

Thanks!

Last edited by yang zhang; 08-27-2017 at 11:27 AM.
yang zhang is offline   Reply With Quote
Old 08-28-2017, 03:10 AM   #22
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,022
Default

Quote:
Originally Posted by yang zhang View Post
Hi Brain,

This might be a silly question, but do we need to index the reference every time for every sample?

Thanks!
No. You can create an index of the reference up-front by doing
Code:
 bbmap.sh in=ref.fa
that will create a "ref" directory, which will contain all index files. Do not worry about the contents of the folder since they are arranged in a way BBMap requires them.

In future when you want to use this index replace "ref=" with "path=/path_to_directory_containing_ref_folder" in your command line.
GenoMax is offline   Reply With Quote
Old 08-28-2017, 07:53 AM   #23
yang zhang
Junior Member
 
Location: Corvallis, Oregon, USA

Join Date: Aug 2017
Posts: 2
Smile

I see. Thank you for the clarification, GenoMax!
yang zhang is offline   Reply With Quote
Old 08-11-2018, 08:18 PM   #24
Kristian_Andersen
Junior Member
 
Location: La Jolla, CA

Join Date: Jul 2015
Posts: 4
Default

Hi Brian,

Just reviving an old thread here. I have been testing out a lot of different methods to clean human reads and I really love BBMap because it's such a well thought-out program. However, when I try to clean human reads with the settings you have specified, I routinely get a ton of reads remaining - upwards of 70% (so only 30% are cleaned). I have tried to adjust the various parameters, but the only thing that seems to make a difference for depletion is the 'minid' setting. Setting that at 0.50 (which is *very* low) depletes around 95% of reads. As a comparison, a default run with bwa mem depletes 100%.

Any idea how I might get BBMap to more accurately deplete human reads?
Kristian_Andersen is offline   Reply With Quote
Old 08-12-2018, 05:01 AM   #25
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,022
Default

Have you tried using "bbsplit.sh" with human genome to see if that works better. If you are interested in non-human data then I would use the non-masked genome and risk losing a few additional reads.
GenoMax is offline   Reply With Quote
Old 08-12-2018, 08:33 AM   #26
Kristian_Andersen
Junior Member
 
Location: La Jolla, CA

Join Date: Jul 2015
Posts: 4
Default

I have not - however, bbsplit doesn't really seem to be the right tool for removing human reads?
Kristian_Andersen is offline   Reply With Quote
Old 08-12-2018, 03:27 PM   #27
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,022
Default

"bbsplit.sh" is a general purpose tool that will bin reads into any number of bins (depending on the reference sequences provided, you can provide as many as you want). In this case you would provide human_genome.fa (and any other reference you want to use). If you only use human then reads not mapping to human genome will be collected in other bin.
GenoMax is offline   Reply With Quote
Old 06-11-2019, 06:54 PM   #28
jylee
Junior Member
 
Location: East Lansing, MI

Join Date: Jun 2019
Posts: 3
Default

Hello Brian,

I am trying to use removehuman.sh on MSU HPCC.

Inputs (*_filtered.fastq.gz files) are phix filtered R1 and R2 files using BBduk as following: [leejooy5@dev-intel18 filtered_reads]$ bbduk.sh -Xmx10g in1=NFW_R1_trimmed.fastq.gz in2=NFW_R2_trimmed.fastq.gz out1=NFW_R1_filtered.fastq.gz out2=NFW_R2_filtered.fastq.gz ref=/opt/software/BBMap/37.93-foss-2018a/resources/phix174_ill.ref.fa.gz k=31 hdist=1 stats=GR25_stats.txt threadS=8)

As shown in below, error message "Exception in thread "main" java.lang.RuntimeException: Can't find file /global/projectb/sandbox/gaag/bbtools/hg19/ref/genome/1/summary.txt
" popped up when I tried to run "removehuman.sh". I tried without additional parameters such as -Xmx and threads, but same error happened. Also, I tried to find the find the
file "/global/projectb/sandbox/gaag/bbtools/hg19/ref/genome/1/summary.txt", but I couldn't. Could you tell me what mistake I did or let me know where I can find a solution? Thank you for your time and consideration.

Cheers,
Joo-Young

=======================
[leejooy5@dev-intel18 filtered_reads]$ removehuman.sh -Xmx10g in1=NFW_R1_filtered.fastq.gz in2=NFW_R2_filtered.fastq.gz out1=NFW_R1_clean.fastq.gz out2=NFW_R2_clean.fastq.gz threads=8

removehuman.sh -Xmx10g in1=NFW_R1_filtered.fastq.gz in2=NFW_R2_filtered.fastq.gz out1=NFW_R1_clean.fastq.gz out2=NFW_R2_clean.fastq.gz threads=8
java -Djava.library.path=/opt/software/BBMap/37.93-foss-2018a/jni/ -ea -Xmx10g -cp /opt/software/BBMap/37.93-foss-2018a/current/ align2.BBMap minratio=0.9 maxindel=3 bwr=0.16 bw=12 quickmatch fast minhits=2 path=/global/projectb/sandbox/gaag/bbtools/hg19 pigz unpigz zl=6 qtrim=r trimq=10 untrim idtag usemodulo printunmappedcount usejni ztd=2 kfilter=25 maxsites=1 k=14 -Xmx10g in1=NFW_R1_filtered.fastq.gz in2=NFW_R2_filtered.fastq.gz out1=NFW_R1_clean.fastq.gz out2=NFW_R2_clean.fastq.gz threads=8
Executing align2.BBMap [tipsearch=20, maxindel=80, minhits=2, bwr=0.18, bw=40, minratio=0.65, midpad=150, minscaf=50, quickmatch=t, rescuemismatches=15, rescuedist=800, maxsites=3, maxsites2=100, minratio=0.9, maxindel=3, bwr=0.16, bw=12, quickmatch, minhits=2, path=/global/projectb/sandbox/gaag/bbtools/hg19, pigz, unpigz, zl=6, qtrim=r, trimq=10, untrim, idtag, usemodulo, printunmappedcount, usejni, ztd=2, kfilter=25, maxsites=1, k=14, -Xmx10g, in1=NFW_R1_filtered.fastq.gz, in2=NFW_R2_filtered.fastq.gz, out1=NFW_R1_clean.fastq.gz, out2=NFW_R2_clean.fastq.gz, threads=8]
Version 37.93 [tipsearch=20, maxindel=80, minhits=2, bwr=0.18, bw=40, minratio=0.65, midpad=150, minscaf=50, quickmatch=t, rescuemismatches=15, rescuedist=800, maxsites=3, maxsites2=100, minratio=0.9, maxindel=3, bwr=0.16, bw=12, quickmatch, minhits=2, path=/global/projectb/sandbox/gaag/bbtools/hg19, pigz, unpigz, zl=6, qtrim=r, trimq=10, untrim, idtag, usemodulo, printunmappedcount, usejni, ztd=2, kfilter=25, maxsites=1, k=14, -Xmx10g, in1=NFW_R1_filtered.fastq.gz, in2=NFW_R2_filtered.fastq.gz, out1=NFW_R1_clean.fastq.gz, out2=NFW_R2_clean.fastq.gz, threads=8]

Set MINIMUM_ALIGNMENT_SCORE_RATIO to 0.650
Set MINIMUM_ALIGNMENT_SCORE_RATIO to 0.900
Set threads to 8
Retaining first best site only for ambiguous mappings.
Exception in thread "main" java.lang.RuntimeException: Can't find file /global/projectb/sandbox/gaag/bbtools/hg19/ref/genome/1/summary.txt
at fileIO.ReadWrite.getRawInputStream(ReadWrite.java:906)
at fileIO.ReadWrite.getInputStream(ReadWrite.java:871)
at fileIO.TextFile.open(TextFile.java:227)
at fileIO.TextFile.<init>(TextFile.java:71)
at dna.Data.setGenome2(Data.java:822)
at dna.Data.setGenome(Data.java:768)
at align2.BBMap.loadIndex(BBMap.java:313)
at align2.BBMap.main(BBMap.java:32)
jylee is offline   Reply With Quote
Old 06-12-2019, 07:13 AM   #29
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,022
Default

@jylee: "/global/projectb/sandbox/gaag/bbtools/hg19/ref/genome/1/summary.txt" appears to refer to a location on JGI servers (if that is not your own). You will need to download and provide hg19 reference sequence. You can pre-index the genome with BBMap to use with path= or use ref= option to point to the genome sequence multi-fasta file location.
GenoMax is offline   Reply With Quote
Old 12-02-2019, 02:35 PM   #30
junaruga
Junior Member
 
Location: Brno, Czech Republic

Join Date: Nov 2019
Posts: 1
Smile /usr/local/bin/bbmap.sh: line 349: 9039 Killed java ...

Hi Brain,

I am trying to run bbmap.sh with your data like this on my laptop (Linux Fedora 30, memory 8GB). Then I got following error "/usr/local/bin/bbmap.sh: line 349: 9039 Killed java ...". Do you have any idea to fix it? Thank you.

Code:
$ bbmap.sh --version
java -Djava.library.path=/home/root/local/bbmap/jni/ -ea -Xmx1124m -cp /home/root/local/bbmap/current/ align2.BBMap build=1 overwrite=true fastareadlen=500 --version
Executing align2.BBMap [build=1, overwrite=true, fastareadlen=500, --version]

BBMap version 37.10

$ java -version
openjdk version "1.8.0_232"
OpenJDK Runtime Environment (build 1.8.0_232-b09)
OpenJDK 64-Bit Server VM (build 25.232-b09, mixed mode)

$ bbmap.sh -Xmx24G ref=hg19_main_mask_ribo_animal_allplant_allfungus.fa.gz
...
Loaded Reference: 0.020 seconds.
Loading index for chunk 1-7, build 1
No index available; generating from reference genome: /home/jaruga/data/yamp/ref/index/1/chr1-3_index_k13_c2_b1.block
Indexing threads started for block 0-3
Indexing threads finished for block 0-3
No index available; generating from reference genome: /home/jaruga/data/yamp/ref/index/1/chr4-7_index_k13_c2_b1.block
Indexing threads started for block 4-7
/usr/local/bin/bbmap.sh: line 349:  9039 Killed                  java -Djava.library.path=/home/root/local/bbmap/jni/ -ea -Xmx24G -cp /home/root/local/bbmap/current/ align2.BBMap build=1 overwrite=true fastareadlen=500 -Xmx24G ref=hg19_main_mask_ribo_animal_allplant_allfungus.fa.gz

$ echo $?
137
Detail: https://gist.github.com/junaruga/902...file-bbmap-L42
junaruga is offline   Reply With Quote
Old 01-05-2020, 02:10 AM   #31
uloeber
Member
 
Location: Germany

Join Date: Mar 2013
Posts: 42
Default how may I create a masked genome reference for mice?

Hi Brian,
I would love to have a masked version of Mus musculus genome. Could you probably share the details how you created the human masked genome, such that I can replicate this for mice?
Bests,
Ulrike
uloeber is offline   Reply With Quote
Old 01-05-2020, 02:27 AM   #32
uloeber
Member
 
Location: Germany

Join Date: Mar 2013
Posts: 42
Default

Quote:
Originally Posted by Brian Bushnell View Post
... and the entire silva ribosomal database. .
Since then there were some SILVA updates, latest (V138) end of last year. Would you mind to update the human masked hg19? Much appreciated your effort in the community!
Thank you so much!
uloeber is offline   Reply With Quote
Old 01-05-2020, 04:01 AM   #33
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,022
Default

@uloeber: @Brian described the method of masking in first post of this thread. It uses "bbmask.sh" which is a component of BBMap suite. Take a look at in-line help for bbmask.sh to get an idea of what parameters you can use.

Quote:
1) Areas containing multiple repeat short kmers.
2) Windows of low entropy, calculated using pentamer frequencies.
3) Via mapping from sam files:
The mapping is the more interesting approach. I used every fungal assembly on JGI's Mycocosm, every plant genome on JGI's phytozome, and a handful of others, including zebra danio (the closest relative to human I included), and the entire silva ribosomal database. First, I completely removed the assemblies that contained human contamination (a handful). Then, I shredded the remaining data into 70-80bp pieces, mapped it to human requiring a minimum of ~85% identity, and used BBMask to mask everything the shreds hit.
GenoMax is offline   Reply With Quote
Old 01-16-2020, 09:15 PM   #34
Theodre
Junior Member
 
Location: Madurai, TN, India

Join Date: Dec 2019
Posts: 2
Default

Well crap! Good Work. Great info and Iím trying to take it all in. Thank you for sharing.
Theodre is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 10:45 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO