Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Originally posted by GenoMax View Post
    BBMap is one of the aligners that uses the full header present in your fasta file when creating the index and passes it along to alignment file. If there are spaces in the header name they are written to alignment. Some downstream programs have a problem with this.

    You can use the option "trd=t" to truncate the fasta header names after the first space in the name. There is an option for reformat.sh that can do this after the fact for aligned data. I can look this up later if you don't find it.
    Thank you. I had found this too sometime after my post and before yours, but its really great to get a helpful reply!

    Comment


    • Hi Brian,

      Does bbmap callvariants.sh ignore duplicates marked by picard MarkDuplicates by default (or is there an option to ignore duplicates) or do duplicates have to be deleted?

      Best,
      Gopo

      Comment


      • Hi Brian,
        I'm using trimrname=t from reformat.sh on a sam file. It looks like it trims the ref names correctly on the alignment lines, but not in header lines (the ones that start with @).

        Comment


        • Dear Brian,

          I am using bbmap for mapping and callvariants.sh for variant calling on PE Illumina reads.
          I am comparing two mice strains. I am therefore downsampling my .sam files to have the same number of mapped reads going into the alignment.

          When I input the original file containing ~680k mapped PE reads I get 2283 variants. When I run the down sampled file containing ~200k mapped PE reads I get 3375 variants.

          I would expect to get a lower number of variants when I put in less reads. Could you explain this to me? I am clearly missing something that might be important for my analysis

          (I am using the exact same criteria for variant calling and filtering for the two samples)

          Comment


          • I am trying to run BBmap on the cluster, but got the error below. Can anyone help me to solve the error?
            Thanks,
            [hgx080@quser10 DNA_all]$ /home/hgx080/bbmap/bbmap.sh ref=/projects/b1052/Wells_b1042/GaoHan/CANDO_RNA/assemble/final/idba/DNA-all/DNA-all-contig.fa in=/projects/b1052/Wells_b1042/GaoHan/CANDO_RNA/clean_reads/Wells02/filter_reads/RNA-Ac-1_S7_filter.fa out=RNA-Ac-1_S7_filter.test.sam minid=0.95 ambig=random reads=100000 -Xmx100g -eoom
            java -Djava.library.path=/home/hgx080/bbmap/jni/ -ea -Xmx100g -cp /home/hgx080/bbmap/current/ align2.BBMap build=1 overwrite=true fastareadlen=500 ref=/projects/b1052/Wells_b1042/GaoHan/CANDO_RNA/assemble/final/idba/DNA-all/DNA-all-contig.fa in=/projects/b1052/Wells_b1042/GaoHan/CANDO_RNA/clean_reads/Wells02/filter_reads/RNA-Ac-1_S7_filter.fa out=RNA-Ac-1_S7_filter.test.sam minid=0.95 ambig=random reads=100000 -Xmx100g -eoom
            Executing align2.BBMap [build=1, overwrite=true, fastareadlen=500, ref=/projects/b1052/Wells_b1042/GaoHan/CANDO_RNA/assemble/final/idba/DNA-all/DNA-all-contig.fa, in=/projects/b1052/Wells_b1042/GaoHan/CANDO_RNA/clean_reads/Wells02/filter_reads/RNA-Ac-1_S7_filter.fa, out=RNA-Ac-1_S7_filter.test.sam, minid=0.95, ambig=random, reads=100000, -Xmx100g, -eoom]
            Version 38.11

            Choosing a site randomly for ambiguous mappings.
            Set MINIMUM_ALIGNMENT_SCORE_RATIO to 0.908
            NOTE: Ignoring reference file because it already appears to have been processed.
            NOTE: If you wish to regenerate the index, please manually delete ref/genome/1/summary.txt
            Max reads: 100000
            Set genome to 1

            Exception in thread "Thread-0" java.lang.RuntimeException: java.lang.RuntimeException: java.io.EOFException: Unexpected end of ZLIB input stream
            at align2.ChromLoadThread.run(ChromLoadThread.java:79)
            Caused by: java.lang.RuntimeException: java.io.EOFException: Unexpected end of ZLIB input stream
            at fileIO.ReadWrite.readObject(ReadWrite.java:806)
            at fileIO.ReadWrite.read(ReadWrite.java:1246)
            at dna.ChromosomeArray.read(ChromosomeArray.java:65)
            at align2.ChromLoadThread.run(ChromLoadThread.java:76)
            Caused by: java.io.EOFException: Unexpected end of ZLIB input stream
            at java.util.zip.InflaterInputStream.fill(InflaterInputStream.java:240)
            at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:158)
            at java.util.zip.GZIPInputStream.read(GZIPInputStream.java:117)
            at java.io.ObjectInputStream$PeekInputStream.read(ObjectInputStream.java:2620)
            at java.io.ObjectInputStream$BlockDataInputStream.read(ObjectInputStream.java:3031)
            at java.io.ObjectInputStream$BlockDataInputStream.readFully(ObjectInputStream.java:3061)
            at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1914)
            at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1529)
            at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2245)
            at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2169)
            at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027)
            at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)
            at java.io.ObjectInputStream.readObject(ObjectInputStream.java:422)
            at fileIO.ReadWrite.readObject(ReadWrite.java:802)
            ... 3 more

            Comment


            • Hello! Thanks for all of the wonderful bbmap scripts. Today I was was trying to use bbfakereads.sh but the script can not locate or open the jgi/FakeReads file. Any thoughts? I noticed a Fakereads files in the current/jgi directory. Do you think the path in the script is incorrect?

              Thanks for your time and help!

              bbfakereads.sh in=scaffolds.fasta out=fakePE_R1.fasta out2=fakePE.R2.fasta length=150
              java -ea -Xmx600m -cp /media/bioinformaticprograms/BBMap/sh/current/ jgi.FakeReads in=scaffolds.fasta out=fakePE_R1.fasta out2=fakePE.R2.fasta length=150
              Error: Could not find or load main class jgi.FakeReads

              Comment


              • Is your bbmap installed correctly? Have you moved any files around? I am able to run "bbfakereads.sh" and generate fastq and fasta files without a problem.

                Comment


                • You're right. It wasn't downloaded correctly. At first I used git to download the bbmap package. But when I just downloaded with wget from https://sourceforge.net/projects/bbm...p_38.12.tar.gz everything was organized correctly.

                  Thanks for the help.

                  Comment


                  • Add hg19 masked reference to distribution

                    Hello,
                    I'm using BBTools via bioconda and the corresponding quay.io docker container. The image has the necessary resources, e.g. the adapters fasta file:

                    Code:
                    (base) 
                     Wed 25 Jul - 17:10  ~/code/tick-genome/reflow   origin ☊ master 9☀ 1● 
                      docker run -it -v $PWD:/data quay.io/biocontainers/bbmap:38.06--2 bash
                    bash-4.2# find . -name adapters.fa
                    ./usr/local/opt/bbmap-38.06/resources/adapters.fa
                    bash-4.2# cd ./usr/local/opt/bbmap-38.06/resources
                    bash-4.2# ll
                    bash: ll: command not found
                    bash-4.2# ls 
                    adapters.fa                          blacklist_silva_species_500.sketch   lambda.fa.gz                         nextera_LMP_linker.fa.gz             primes.txt.gz                        sequencing_artifacts.fa.gz
                    adapters_no_transposase.fa.gz        contents.txt                         lfpe.linker.fa.gz                    pJET1.2.fa                           remote_files.txt                     short.fa
                    blacklist_img_species_300.sketch     crelox.fa.gz                         mtst.fa                              phix174_ill.ref.fa.gz                remote_files_old.txt                 truseq.fa.gz
                    blacklist_nt_species_1000.sketch     favicon.ico                          nextera.fa.gz                        phix_adapters.fa.gz                  sample1.fq.gz                        truseq_rna.fa.gz
                    blacklist_refseq_species_250.sketch  kapatags.L40.fa                      nextera_LMP_adapter.fa.gz            polyA.fa.gz                          sample2.fq.gz
                    However, the removehuman.sh script uses a hardcoded path for the masked human genome posted in the RemoveHuman thread.


                    Code:
                    	local CMD="java -Djava.library.path=$NATIVELIBDIR $EA $z -cp $CP align2.BBMap minratio=0.9 maxindel=3 bwr=0.16 bw=12 quickmatch fast minhits=2 path=/global/projectb/sandbox/gaag/bbtools/hg19 pigz unpigz zl=6 qtrim=r trimq=10 untrim idtag usemodulo printunmappedcount usejni ztd=2 kfilter=25 maxsites=1 k=14 $@
                    Can the masked genome be included in the distribution?

                    Thank you!
                    Warmest,
                    Olga

                    Comment


                    • Hello Brian,
                      After running mapPacBio.sh, how can I combine the sequence of the same ID?
                      for example I want to combine the sequences as following:
                      m151006_234406_42219_c100867912550000001823195203031665_s1_p0/110457/57769_70466 id=3_0_part_2_6
                      m151006_234406_42219_c100867912550000001823195203031665_s1_p0/110457/57769_70466 id=3_0_part_3

                      Thanks,
                      Fuyou

                      Comment


                      • pull out sequences with matching primers

                        Hi Brian,
                        I was wondering if bbmap has a tool that will pull out reads matching a particular primer sequences? I have fastq files with amplicons from 12 different primers in the same file so i want to make subsets of the reads having specific primers of interest from this.

                        i have used your tool for other tasks so i figured I would ask if it also has this capability?

                        Thank you,
                        Jen

                        Comment


                        • @JenBarb see this thread in Biostars.

                          Comment


                          • Thank you! Love the tool!

                            Comment


                            • Hi,
                              Hoping somebody can help me with this.

                              I used BBMap and now I would like to extract the reads from by .bam file that are split (/chimeric?) ie. reads that indicate a deletion.

                              I tried to use samblaster, but it doesn't recognize any reads as split...
                              (samtools view -h in.bam | samblaster -a -s split.sam -o /dev/null)
                              Are the split reads marked differently in BBMap compared to other aligners causing samblaster to fail?

                              IGV shows a good amount of reads with deletions and I can also call deletions using BBTools callvariants.sh - so I know they are in there. I just have a feeling callvariants is calling fewer deletions and with lower coverage than what IGV suggests, so I want to check up on it.

                              Comment


                              • mkf argument in bbduk.sh (bbmap tool)

                                Hello,
                                I am trying to use the flag mkf (minkmerfraction) and I am getting an error that that argument does not exist.
                                sh /data/barbj/bbmap/bbduk.sh in=./../Stool_001-01.fastq outm=v2fstoolfq.fa literal=CTCAAACTTGGGTAATTAAACC k=17 mkf=0.8
                                java -Djava.library.path=/data/barbj/bbmap/jni/ -ea -Xmx39767m -Xms39767m -cp /data/barbj/bbmap/current/ jgi.BBDukF in=./../Stool_001-01.fastq outm=v2fstoolfq.fa literal=CTCAAACTTGGGTAATTAAACC k=17 mkf=0.8
                                Executing jgi.BBDukF [in=./../Stool_001-01.fastq, outm=v2fstoolfq.fa, literal=CTCAAACTTGGGTAATTAAACC, k=17, mkf=0.8]

                                Exception in thread "main" java.lang.RuntimeException: Unknown parameter mkf=0.8
                                at jgi.BBDukF.<init>(BBDukF.java:402)
                                any ideas why this is not working?

                                Jen

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Techniques and Challenges in Conservation Genomics
                                  by seqadmin



                                  The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                  Avian Conservation
                                  Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                  03-08-2024, 10:41 AM
                                • seqadmin
                                  The Impact of AI in Genomic Medicine
                                  by seqadmin



                                  Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
                                  02-26-2024, 02:07 PM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 03-14-2024, 06:13 AM
                                0 responses
                                34 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-08-2024, 08:03 AM
                                0 responses
                                72 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-07-2024, 08:13 AM
                                0 responses
                                81 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-06-2024, 09:51 AM
                                0 responses
                                68 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X