Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #31
    Hi Brain,

    I am trying to run bbmap.sh with your data like this on my laptop (Linux Fedora 30, memory 8GB). Then I got following error "/usr/local/bin/bbmap.sh: line 349: 9039 Killed java ...". Do you have any idea to fix it? Thank you.

    Code:
    $ bbmap.sh --version
    java -Djava.library.path=/home/root/local/bbmap/jni/ -ea -Xmx1124m -cp /home/root/local/bbmap/current/ align2.BBMap build=1 overwrite=true fastareadlen=500 --version
    Executing align2.BBMap [build=1, overwrite=true, fastareadlen=500, --version]
    
    BBMap version 37.10
    
    $ java -version
    openjdk version "1.8.0_232"
    OpenJDK Runtime Environment (build 1.8.0_232-b09)
    OpenJDK 64-Bit Server VM (build 25.232-b09, mixed mode)
    
    $ bbmap.sh -Xmx24G ref=hg19_main_mask_ribo_animal_allplant_allfungus.fa.gz
    ...
    Loaded Reference:	0.020 seconds.
    Loading index for chunk 1-7, build 1
    No index available; generating from reference genome: /home/jaruga/data/yamp/ref/index/1/chr1-3_index_k13_c2_b1.block
    Indexing threads started for block 0-3
    Indexing threads finished for block 0-3
    No index available; generating from reference genome: /home/jaruga/data/yamp/ref/index/1/chr4-7_index_k13_c2_b1.block
    Indexing threads started for block 4-7
    /usr/local/bin/bbmap.sh: line 349:  9039 Killed                  java -Djava.library.path=/home/root/local/bbmap/jni/ -ea -Xmx24G -cp /home/root/local/bbmap/current/ align2.BBMap build=1 overwrite=true fastareadlen=500 -Xmx24G ref=hg19_main_mask_ribo_animal_allplant_allfungus.fa.gz
    
    $ echo $?
    137
    Detail: https://gist.github.com/junaruga/902...file-bbmap-L42

    Comment


    • #32
      /usr/local/bin/bbmap.sh: line 349: 9039 Killed

      Hi Brain,

      I am trying to run bbmap.sh with your data like this on my laptop (Linux Fedora 30, memory 8GB). Then I got following error "/usr/local/bin/bbmap.sh: line 349: 9039 Killed java ...". Do you have any idea to fix it? Thank you.

      Code:
      $ bbmap.sh --version
      java -Djava.library.path=/home/root/local/bbmap/jni/ -ea -Xmx1124m -cp /home/root/local/bbmap/current/ align2.BBMap build=1 overwrite=true fastareadlen=500 --version
      Executing align2.BBMap [build=1, overwrite=true, fastareadlen=500, --version]
      
      BBMap version 37.10
      
      $ java -version
      openjdk version "1.8.0_232"
      OpenJDK Runtime Environment (build 1.8.0_232-b09)
      OpenJDK 64-Bit Server VM (build 25.232-b09, mixed mode)
      
      $ bbmap.sh -Xmx24G ref=hg19_main_mask_ribo_animal_allplant_allfungus.fa.gz
      ...
      Loaded Reference: 0.020 seconds.
      Loading index for chunk 1-7, build 1
      No index available; generating from reference genome: /home/jaruga/data/yamp/ref/index/1/chr1-3_index_k13_c2_b1.block
      Indexing threads started for block 0-3
      Indexing threads finished for block 0-3
      No index available; generating from reference genome: /home/jaruga/data/yamp/ref/index/1/chr4-7_index_k13_c2_b1.block
      Indexing threads started for block 4-7
      /usr/local/bin/bbmap.sh: line 349:  9039 Killed                  java -Djava.library.path=/home/root/local/bbmap/jni/ -ea -Xmx24G -cp /home/root/local/bbmap/current/ align2.BBMap build=1 overwrite=true fastareadlen=500 -Xmx24G ref=hg19_main_mask_ribo_animal_allplant_allfungus.fa.gz
      
      $ echo $?
      137
      Detail: https://gist.github.com/junaruga/902...file-bbmap-L42

      Comment


      • #33
        /usr/local/bin/bbmap.sh: line 349: 9039 Killed java ...

        Hi Brain,

        I am trying to run bbmap.sh with your data like this on my laptop (Linux Fedora 30, memory 8GB). Then I got following error "/usr/local/bin/bbmap.sh: line 349: 9039 Killed java ...". Do you have any idea to fix it? Thank you.

        Code:
        $ bbmap.sh --version
        java -Djava.library.path=/home/root/local/bbmap/jni/ -ea -Xmx1124m -cp /home/root/local/bbmap/current/ align2.BBMap build=1 overwrite=true fastareadlen=500 --version
        Executing align2.BBMap [build=1, overwrite=true, fastareadlen=500, --version]
        
        BBMap version 37.10
        
        $ java -version
        openjdk version "1.8.0_232"
        OpenJDK Runtime Environment (build 1.8.0_232-b09)
        OpenJDK 64-Bit Server VM (build 25.232-b09, mixed mode)
        
        $ bbmap.sh -Xmx24G ref=hg19_main_mask_ribo_animal_allplant_allfungus.fa.gz
        ...
        Loaded Reference: 0.020 seconds.
        Loading index for chunk 1-7, build 1
        No index available; generating from reference genome: /home/jaruga/data/yamp/ref/index/1/chr1-3_index_k13_c2_b1.block
        Indexing threads started for block 0-3
        Indexing threads finished for block 0-3
        No index available; generating from reference genome: /home/jaruga/data/yamp/ref/index/1/chr4-7_index_k13_c2_b1.block
        Indexing threads started for block 4-7
        /usr/local/bin/bbmap.sh: line 349:  9039 Killed                  java -Djava.library.path=/home/root/local/bbmap/jni/ -ea -Xmx24G -cp /home/root/local/bbmap/current/ align2.BBMap build=1 overwrite=true fastareadlen=500 -Xmx24G ref=hg19_main_mask_ribo_animal_allplant_allfungus.fa.gz
        
        $ echo $?
        137
        Detail: https://gist.github.com/junaruga/902...file-bbmap-L42

        Comment


        • #34
          how may I create a masked genome reference for mice?

          Hi Brian,
          I would love to have a masked version of Mus musculus genome. Could you probably share the details how you created the human masked genome, such that I can replicate this for mice?
          Bests,
          Ulrike

          Comment


          • #35
            Originally posted by Brian Bushnell View Post
            ... and the entire silva ribosomal database. .
            Since then there were some SILVA updates, latest (V138) end of last year. Would you mind to update the human masked hg19? Much appreciated your effort in the community!
            Thank you so much!

            Comment


            • #36
              @uloeber: @Brian described the method of masking in first post of this thread. It uses "bbmask.sh" which is a component of BBMap suite. Take a look at in-line help for bbmask.sh to get an idea of what parameters you can use.

              1) Areas containing multiple repeat short kmers.
              2) Windows of low entropy, calculated using pentamer frequencies.
              3) Via mapping from sam files:
              The mapping is the more interesting approach. I used every fungal assembly on JGI's Mycocosm, every plant genome on JGI's phytozome, and a handful of others, including zebra danio (the closest relative to human I included), and the entire silva ribosomal database. First, I completely removed the assemblies that contained human contamination (a handful). Then, I shredded the remaining data into 70-80bp pieces, mapped it to human requiring a minimum of ~85% identity, and used BBMask to mask everything the shreds hit.

              Comment


              • #37
                Hi Everyone!

                Thank you for existing masked databases and the guidance on how to create your own database. I think it would be good if there would be a more formalised guide that includes the actual commands to execute so that there are no misunderstandings. I will try to write it up and would be happy if you would correct me if something is missing or wrong.

                The goal is to mask all bases from a host that could also come from the microbiome. I will focus only on Archaea and Prokaryotes.

                1. Input:
                - A genome/gene database with a as complete as possible representation of bacterial/archaeal species. E.g. 100k species from Progenomes2 (now referred to as progrenomes.fasta)
                - The host genome (now referred to as host.fasta)
                - The silva 138 database (now referred to as silva.fasta)
                - bbmap.sh/bbmask.sh

                2. Output:
                - A masked host genome (now referred to as host.masked.fasta)

                How to get there:

                - Take progenomes.fasta + silva.fasta and split all sequences into subsequences of 80bp. Should they be overlapping sequences extracted like in a sliding window or do you just chop each sequence into subsequences of equal length? (Output now referred to as progrenomes.silva.chopped.fasta). Does bbmap have a tool for this shredding?
                - Align the progenomes.silva.chopped.fasta against the host genome and store as sam file. command: bbmap.sh ref=host.fasta in=progrenomes.silva.chopped.fasta minid=0.85 out=progenomes.silva.chopped.sam mappedonly=t secondary=t idfilter=0.85 cigar=t
                - Use bbmask.sh with standard parameters and the progenomes.silva.chopped.sam as input. command: bbmask.sh in=host.fasta sam=progenomes.silva.chopped.sam out=host.masked.fasta

                Does this sound about right?

                Thanks a lot,

                Hans

                Comment


                • #38
                  Hello everyone,

                  I have a small problem in understanding the working of 'removehuman.sh' script from the BBTools (Version 39.01). Maybe I did something wrong.

                  I was trying to remove human reads from paired-end fastq.gz files with a cmd similar to this:

                  bash $BBMAP_dir/removehuman.sh in1=${R1_File} in2=${R2_File} out1=${sample_name}_cleaned_1.fastq.gz out2=${sample_name}_cleaned_2.fastq.gz t=12

                  It invoked various commands in the backend and I got 2 output files as expected. The output files are lesser in size than the original files, so I assumed the difference is due to human read removal.

                  For eg. one of the file sizes were as was such:

                  266K SRR1X_1.fastq.gz 252K SRR1X_cleaned_1.fastq.gz
                  249K SRR1X_2.fastq.gz 235K SRR1X_cleaned_2.fastq.gz
                  But when I looked into the read count of all 4 files (2 input & 2 output files) with FASTQC (Version 11.9) s/w, they were all the same.
                  I confirmed the read count with a simple cmd to count the number of lines and divide it by 4. It was the same read count even then.

                  Cmd to count reads: echo $(zcat ${FILENAME}.fastq.gz|wc -l)/4|bc
                  So my question is,
                  a. Is this normal behavior? (Are the reads not removed but retained & just masked in the original file somehow?) OR
                  b. Did the code not work the way I was expecting it to work? (i.e, to remove human reads and give me all other reads only)

                  Clearly, I am overthinking this thing and ran the code in a wrong way. But still, can you help me understand what went wrong?


                  The only difference that I saw that
                  1. The reads in the final files were in a different order than before
                  2. The id variable and other things (from the 3rd line of each read) were removed from the output file.

                  For eg. The green highlighted part was missing form the output file.
                  One read from the original file:
                  @SRR13867754.1 SN640:172:HT7GYBCX2:1:1108:7955:3638 length=34
                  GAACGCTGGGCAGGGGGGGCGTCATGAGGCTACC
                  +
                  SRR13867754.1 SN640:172:HT7GYBCX2:1:1108:7955:3638 length=34
                  DDDDDIIIIIIIIIIIIIIIIIIIIIIIIIIIII

                  The same read from the output file:
                  @SRR13867754.1 SN640:172:HT7GYBCX2:1:1108:7955:3638 length=34
                  GAACGCTGGGCAGGGGGGGCGTCATGAGGCTACC
                  +
                  DDDDDIIIIIIIIIIIIIIIIIIIIIIIIIIIII
                  I think the code ran fine without any errors. But maybe it was the wrong way of running it.
                  java -ea -Xmx15000m -Xms15000m -cp /home/gpfs/o_shinde/Softwares_Tejus/bbmap/current/ align2.BBMap minratio=0.9 maxindel=3 bwr=0.16 bw=12 quickmatch fast minhits=2 path=$path_human_genome/GRCh38_human_genome pigz unpigz zl=6 qtrim=r trimq=10 untrim idtag usemodulo printunmappedcount ztd=2 kfilter=25 maxsites=1 k=14 bloomfilter in1=$path/SRR1X_1.fastq.gz
                  in2=$path/SRR1X_2.fastq.gz
                  out1=SRR1X_cleaned_1.fastq.gz out2=SRR1X_cleaned_2.fastq.gz t=12

                  Executing align2.BBMap [tipsearch=20, maxindel=80, minhits=2, bwr=0.18, bw=40, minratio=0.65, midpad=150, minscaf=50, quickmatch=t, rescuemismatches=15, rescuedist=800, maxsites=3, maxsites2=100, minratio=0.9, maxindel=3, bwr=0.16, bw=12, quickmatch, minhits=2, path=/home/gpfs/o_shinde/References_local/GRCh38_human_genome, pigz, unpigz, zl=6, qtrim=r, trimq=10, untrim, idtag, usemodulo, printunmappedcount, ztd=2, kfilter=25, maxsites=1, k=14, bloomfilter,
                  in1=$path/SRR1X_1.fastq.gz,
                  in2=$path/SRR1X_2.fastq.gz,
                  out1=SRR1X_cleaned_1.fastq.gz, out2=SRR1X_cleaned_2.fastq.gz, t=12]

                  Version 39.00
                  P.S. Attaching the fast files for reference.
                  Attached Files

                  Comment


                  • #39
                    Any chance someone generated a bacterial genome masked human genome database using the latest human genome?

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Current Approaches to Protein Sequencing
                      by seqadmin


                      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                      04-04-2024, 04:25 PM
                    • seqadmin
                      Strategies for Sequencing Challenging Samples
                      by seqadmin


                      Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                      03-22-2024, 06:39 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, 04-11-2024, 12:08 PM
                    0 responses
                    25 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 10:19 PM
                    0 responses
                    29 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 09:21 AM
                    0 responses
                    24 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-04-2024, 09:00 AM
                    0 responses
                    52 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X