Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Wow, that's actually... a lot nicer. Thanks! I'll put it in the next release (and credit you of course).

    Comment


    • Hi,

      sorry, the Javva 6 version is not working for me either. If I check my version I get:

      $ java -version
      java version "1.6.0_27"
      OpenJDK Runtime Environment (IcedTea6 1.12.6) (6b27-1.12.6-1~deb7u1)
      OpenJDK 64-Bit Server VM (build 20.0-b12, mixed mode)
      all29c@aahl-02-mel:~/data/019/19.5$

      The error is:
      Exception in thread "main" java.lang.UnsupportedClassVersionError: align2/BBMap : Unsupported major.minor version 51.0
      at java.lang.ClassLoader.defineClass1(Native Method)
      at java.lang.ClassLoader.defineClass(ClassLoader.java:634)
      at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
      at java.net.URLClassLoader.defineClass(URLClassLoader.java:277)
      at java.net.URLClassLoader.access$000(URLClassLoader.java:73)
      at java.net.URLClassLoader$1.run(URLClassLoader.java:212)
      at java.security.AccessController.doPrivileged(Native Method)
      at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
      at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
      at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
      at java.lang.ClassLoader.loadClass(ClassLoader.java:266)
      Could not find the main class: align2.BBMap. Program will exit.

      Comment


      • Cool! I did bbduk.sh as well. I've been using those two the most the last couple of days. FYI I had to make sure there wasn't any mixture of tabs and spaces (fortunately Sublime Text 2 can make that swap for me). Also it's a lot easier to write those tedious usage things in a single echo instead of multiple.

        Code:
        usage(){
        echo "
        Written by Brian Bushnell
        Last modified October 27, 2014
        
        Description:  Compares reads to the kmers in a reference dataset, optionally 
        allowing an edit distance. Splits the reads into two outputs - those that 
        match the reference, and those that don't. Can also trim (remove) the matching 
        parts of the reads rather than binning the reads.
        
        Usage:  bbduk.sh in=<input file> out=<output file> ref=<contaminant files>
        
        Input may be stdin or a fasta, fastq, or sam file, compressed or uncompressed.
        Output may be stdout or a file.
        If you pipe via stdin/stdout, please include the file type; e.g. for gzipped 
        fasta input, set in=stdin.fa.gz
        
        Optional parameters (and their defaults)
        
        Input parameters:
        
        in=<file>           Main input. in=stdin.fq will pipe from stdin.
        in2=<file>          Input for 2nd read of pairs in a different file.
        ref=<file,file>     Comma-delimited list of reference files.
        touppercase=f       (tuc) Change all bases upper-case.
        interleaved=auto    (int) t/f overrides interleaved autodetection.
        qin=auto            Input quality offset: 33 (Sanger), 64, or auto.
        reads=-1            If positive, quit after processing X reads or pairs.
        
        Output parameters:
        
        out=<file>          (outnonmatch) Write reads here that do not contain 
                            kmers matching the database.  'out=stdout.fq' will pipe 
                            to standard out.
        out2=<file>         (outnonmatch2) Use this to write 2nd read of pairs to a 
                            different file.
        outmatch=<file>     (outm or outb) Write reads here that contain kmers 
                            matching the database.
        outmatch2=<file>    (outm2 or outb2) Use this to write 2nd read of pairs 
                            to a different file.
        outsingle=<file>    (outs) Use this to write singleton reads whose mate was 
                            trimmed shorter than minlen.
        stats=<file>        Write statistics about which contamininants were detected.
        duk=<file>          Write statistics in duk's format.
        overwrite=t         (ow) Grant permission to overwrite files.
        showspeed=t         (ss) 'f' suppresses display of processing speed.
        ziplevel=2          (zl) Compression level; 1 (min) through 9 (max).
        fastawrap=80        Length of lines in fasta output.
        qout=auto           Output quality offset: 33 (Sanger), 64, or auto.
        
        Histogram output parameters:
        
        bhist=<file>        Base composition histogram by position.
        qhist=<file>        Quality histogram by position.
        aqhist=<file>       Histogram of average read quality.
        bqhist=<file>       Quality histogram designed for box plots.
        lhist=<file>        Read length histogram.
        gchist=<file>       Read GC content histogram.
        
        Histograms for sam files only (requires sam format 1.4 or higher):
        
        ehist=<file>        Errors-per-read histogram.
        qahist=<file>       Quality accuracy histogram of error rates versus quality 
                            score.
        indelhist=<file>    Indel length histogram.
        mhist=<file>        Histogram of match, sub, del, and ins rates by read location.
        idhist=<file>       Histogram of read count versus percent identity.
        
        Processing parameters:
        
        threads=auto        (t) Set number of threads to use; default is number of 
                            logical processors.
        k=27                Kmer length used for finding contaminants.  Contaminants 
                            shorter than k will not be found.  k must be at least 1.
        rcomp=t             Look for reverse-complements of kmers in addition to 
                            forward kmers.
        maskmiddle=t        (mm) Treat the middle base of a kmer as a wildcard, to 
                            increase sensitivity in the presence of errors.
        maxbadkmers=0       (mbk) Reads with more than this many contaminant kmers 
                            will be discarded.
        hammingdistance=0   (hdist) Maximum Hamming distance from ref kmers (subs only).  
                            Memory use is proportional to (3*K)^hdist.
        editdistance=0      (edist) Maximum edit distance from ref kmers (subs and indels).  
                            Memory use is proportional to (8*K)^edist.
        forbidn=f           (fn) Forbids matching of read kmers containing N.  By default, 
                            these will match a reference 'A' if hdist>0 or edist>0, to 
                            increase sensitivity.
        maxns=-1            If non-negative, reads with more Ns than this 
                            (before trimming) will be discarded.
        minskip=1           (mns) Force minimal skip interval when indexing reference 
                            kmers.  1 means use all, 2 means use every other kmer, etc.
        maxskip=99          (mxs) Restrict maximal skip interval when indexing 
                            reference kmers. Normally all are used for scaffolds<100kb, 
                            but with longer scaffolds, up to K-1 are skipped.
        removeifeitherbad=t (rieb) Paired reads get sent to 'outmatch' if either is 
                            match (or either is trimmed shorter than minlen).  
                            Set to false to require both.
        findbestmatch=f     (fbm) If multiple matches, associate read with sequence 
                            sharing most kmers.
        prealloc=f          Preallocate memory in table.  Allows faster table loading 
                            and more efficient memory usage, for a large reference.
        
        Trimming/Masking parameters:
        
        ktrim=f             Trim reads to remove bases matching reference kmers.
                            Values: 
                                     t (trim)
                                     f (don't trim), 
                                     r (trim right end), 
                                     l (trim left end), 
                                     n (convert to N instead of trimming).
                            Any non-whitespace character other than t, f, r, l, n: 
                            convert to that symbol rather than trimming, and process 
                            short kmers on both ends.
        useshortkmers=f     (usk) Look for shorter kmers at read tips (only for 
                            k-trimming).  Enabling this will disable maskmiddle.
        mink=6              Minimum length of short kmers.  Setting this automatically 
                            sets useshortkmers=t.
        qtrim=f             Trim read ends to remove bases with quality below minq.
                            Performed AFTER looking for kmers.
                            Values: 
                                    t (trim both ends), 
                                    f (neither end), 
                                    r (right end only), 
                                    l (left end only).
        trimq=6             Trim quality threshold.
        minlength=10        (ml) Reads shorter than this after trimming will be 
                            discarded.  Pairs will be discarded only if both are shorter.
        maxlength=          Reads longer than this after trimming will be discarded.  Pairs 
                            will be discarded only if both are longer.
        minavgquality=0     (maq) Reads with average quality (before trimming) below 
                            this will be discarded.
        otm=f               (outputtrimmedtomatch) Output reads trimmed to shorter than 
                            minlength to outm rather than discarding.
        tp=0                (trimpad) Trim this much extra around matching kmers.
        tbo=f               (trimbyoverlap) Trim adapters based on where paired 
                            reads overlap.
        minoverlap=24       Require this many bases of overlap for overlap detection.
        mininsert=25        Require insert size of at least this much for overlap 
                            detection (will automatically set minoverlap too).
        tpe=f               (trimpairsevenly) When kmer right-trimming, trim both reads 
                            to the minimum length of either.
        forcetrimleft=0     (ftl) If positive, trim bases to the left of this position 
                            (exclusive, 0-based).
        forcetrimright=0    (ftr) If positive, trim bases to the right of this position 
                            (exclusive, 0-based).
        restrictleft=0      If positive, only look for kmer matches in the 
                            leftmost X bases.
        restrictright=0     If positive, only look for kmer matches in the 
                            rightmost X bases.
        
        Java Parameters:
        
        -Xmx                This will be passed to Java to set memory usage, overriding 
                            the program's automatic memory detection. -Xmx20g will specify 
                            20 gigs of RAM, and -Xmx200m will specify 200 megs.  
                            The max is typically 85% of physical memory.
        
        There is a changelog at 
        /global/projectb/sandbox/gaag/bbtools/docs/changelog_bbduk.txt
        Please contact Brian Bushnell at [email protected] if you encounter any problems.
        "	
        }
        /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
        Salk Institute for Biological Studies, La Jolla, CA, USA */

        Comment


        • Originally posted by susanklein View Post
          Hi,

          sorry, the Javva 6 version is not working for me either. If I check my version I get:

          $ java -version
          java version "1.6.0_27"
          OpenJDK Runtime Environment (IcedTea6 1.12.6) (6b27-1.12.6-1~deb7u1)
          OpenJDK 64-Bit Server VM (build 20.0-b12, mixed mode)
          all29c@aahl-02-mel:~/data/019/19.5$
          Your sys admins must hate making any changes

          For Java 6 .. last release is 45. Looks like you are on release 27.

          Comment


          • Originally posted by susanklein View Post
            Hi,

            sorry, the Javva 6 version is not working for me either. If I check my version I get:

            $ java -version
            java version "1.6.0_27"
            OpenJDK Runtime Environment (IcedTea6 1.12.6) (6b27-1.12.6-1~deb7u1)
            OpenJDK 64-Bit Server VM (build 20.0-b12, mixed mode)
            all29c@aahl-02-mel:~/data/019/19.5$

            The error is:
            Exception in thread "main" java.lang.UnsupportedClassVersionError: align2/BBMap : Unsupported major.minor version 51.0
            at java.lang.ClassLoader.defineClass1(Native Method)
            at java.lang.ClassLoader.defineClass(ClassLoader.java:634)
            at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
            at java.net.URLClassLoader.defineClass(URLClassLoader.java:277)
            at java.net.URLClassLoader.access$000(URLClassLoader.java:73)
            at java.net.URLClassLoader$1.run(URLClassLoader.java:212)
            at java.security.AccessController.doPrivileged(Native Method)
            at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
            at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
            at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
            at java.lang.ClassLoader.loadClass(ClassLoader.java:266)
            Could not find the main class: align2.BBMap. Program will exit.
            Susan,

            Sorry, that was my fault; I was able to replicate that error. The wrong binaries were somehow in the java6 package. I've fixed it for real this time with BBMap_33.89b_java6.tar.gz which is now uploaded.

            -Brian

            Comment


            • Originally posted by sdriscoll View Post
              Cool! I did bbduk.sh as well. I've been using those two the most the last couple of days. FYI I had to make sure there wasn't any mixture of tabs and spaces (fortunately Sublime Text 2 can make that swap for me). Also it's a lot easier to write those tedious usage things in a single echo instead of multiple.
              Thanks again! I'll have to try out Sublime Text 2.

              Comment


              • Hi,

                this is very wierd.. it must be a machine problem, but bbmap (and bowtie1/2) are not working for me for certain file sizes (~200k). The error is below. The file is there, looks normal. If I double or half its size, then the index builds. Very strange.

                Thanks for the help though!


                5$ ~/bin/bbmap/bbmap.sh ref=/OSM/HOME-MEL/all29c/data/019/19.5/plasmids.fna
                java -Djava.library.path=/OSM/HOME-MEL/all29c/bin/bbmap/jni/ -ea -Xmx32955m -cp /OSM/HOME-MEL/all29c/bin/bbmap/current/ align2.BBMap build=1 overwrite=true fastareadlen=500 ref=/OSM/HOME-MEL/all29c/data/019/19.5/plasmids.fna
                Executing align2.BBMap [build=1, overwrite=true, fastareadlen=500, ref=/OSM/HOME-MEL/all29c/data/019/19.5/plasmids.fna]

                BBMap version 33.89
                Retaining first best site only for ambiguous mappings.
                No output file.
                Writing reference.
                Executing dna.FastaToChromArrays2 [/OSM/HOME-MEL/all29c/data/019/19.5/plasmids.fna, 1, writeinthread=false, genscaffoldinfo=true, retain, waitforwriting=false, gzip=true, chromc=false, maxlen=536670912, writechroms=true, minscaf=1, midpad=300, startpad=8000, stoppad=8000, nodisk=false]

                Set genScaffoldInfo=true
                Writing chunk 1
                Set genome to 1

                Loaded Reference: 0.012 seconds.
                Loading index for chunk 1-1, build 1
                No index available; generating from reference genome: /OSM/HOME-MEL/all29c/data/019/19.5/ref/index/1/chr1_index_k13_c13_b1.block
                Indexing threads started for block 0-1
                Indexing threads finished for block 0-1
                Generated Index: 2.162 seconds.
                No reads to process; quitting.
                Attached Files

                Comment


                • Although.. not sure now.. it seems to have made two files chr1_index_k13_c13_b1.block and chr1_index_k13_c13_b1.block2.gz
                  is this right? Maybe it has worked.. The 'no reads to process' threw me.

                  Thanks.

                  S

                  Originally posted by susanklein View Post
                  Hi,

                  this is very wierd.. it must be a machine problem, but bbmap (and bowtie1/2) are not working for me for certain file sizes (~200k). The error is below. The file is there, looks normal. If I double or half its size, then the index builds. Very strange.

                  Thanks for the help though!


                  5$ ~/bin/bbmap/bbmap.sh ref=/OSM/HOME-MEL/all29c/data/019/19.5/plasmids.fna
                  java -Djava.library.path=/OSM/HOME-MEL/all29c/bin/bbmap/jni/ -ea -Xmx32955m -cp /OSM/HOME-MEL/all29c/bin/bbmap/current/ align2.BBMap build=1 overwrite=true fastareadlen=500 ref=/OSM/HOME-MEL/all29c/data/019/19.5/plasmids.fna
                  Executing align2.BBMap [build=1, overwrite=true, fastareadlen=500, ref=/OSM/HOME-MEL/all29c/data/019/19.5/plasmids.fna]

                  BBMap version 33.89
                  Retaining first best site only for ambiguous mappings.
                  No output file.
                  Writing reference.
                  Executing dna.FastaToChromArrays2 [/OSM/HOME-MEL/all29c/data/019/19.5/plasmids.fna, 1, writeinthread=false, genscaffoldinfo=true, retain, waitforwriting=false, gzip=true, chromc=false, maxlen=536670912, writechroms=true, minscaf=1, midpad=300, startpad=8000, stoppad=8000, nodisk=false]

                  Set genScaffoldInfo=true
                  Writing chunk 1
                  Set genome to 1

                  Loaded Reference: 0.012 seconds.
                  Loading index for chunk 1-1, build 1
                  No index available; generating from reference genome: /OSM/HOME-MEL/all29c/data/019/19.5/ref/index/1/chr1_index_k13_c13_b1.block
                  Indexing threads started for block 0-1
                  Indexing threads finished for block 0-1
                  Generated Index: 2.162 seconds.
                  No reads to process; quitting.

                  Comment


                  • Originally posted by susanklein View Post
                    Although.. not sure now.. it seems to have made two files chr1_index_k13_c13_b1.block and chr1_index_k13_c13_b1.block2.gz
                    is this right? Maybe it has worked.. The 'no reads to process' threw me.

                    Thanks.

                    S
                    It has worked. One could do the reference generation and mapping at the same time or pre-create the indexes (as you did). In the latter case the program outputs the "no reads to process" message.

                    Use path=/OSM/HOME-MEL/all29c/data/019/19.5 when you do the mapping.

                    Comment


                    • how do you control the quality filtering threshold in bbmap? i see the maq threshold setting in reformat.sh - is that same setting usable in bbmap? thanks.
                      /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
                      Salk Institute for Biological Studies, La Jolla, CA, USA */

                      Comment


                      • Reformat and BBDuk both support 'maq' (minAverageQuality), but it's not in BBMap - I didn't really think it would be useful. I will add it, though. You can also pipe input:

                        reformat.sh in=reads.fq out=stdout.fq maq=10 | bbmap.sh in=stdin.fq ...

                        ...though that's not really ideal.

                        Comment


                        • I must be misunderstanding something then. I aligned some PE 100 reads from 2011 that had some crappy qualities with bbmap and in the output bam it had flagged many reads with x200 (qc failed). Is bbmap not filtering by base qualities?
                          Last edited by sdriscoll; 11-14-2014, 10:17 AM. Reason: typo
                          /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
                          Salk Institute for Biological Studies, La Jolla, CA, USA */

                          Comment


                          • Oh... that's a heuristic that cannot be adjusted, though it CAN be disabled (with the "usequality=f" or "uq=f" flag). I'll add that to the help menu.

                            Not every kmer in a read is used to search the index. Most are discarded, and they are discarded based on several factors:
                            1) How close they are to other retained kmers (to get a good distribution across the read).
                            2) The probability that the kmer is correct, based on base qualities; by default, kmers with a 94%+ chance of containing an error are usually discarded, and kmers with a 99.99%+ chance of containing an error are always discarded - this includes any kmer that spans an 'N'.
                            3) Kmers that are more common in that specific genome are more likely to be discarded.

                            If a read ends up with fewer than "minhits" kmers (default 1) then it will be discarded as "low quality".

                            "uq=f" will turn off rule 2, so quality scores will not be considered. However, a read will still be discarded as low quality if there are so many Ns in it that there are not K consecutive defined bases anywhere.

                            Comment


                            • Ah thanks for the explanation! Very interesting. I'm not used to aligners making those kind of decisions but I think it's useful. I have another question for you.

                              Originally posted by sdriscoll View Post
                              OK one more question. What would the correct alignment option setting be to get alignments with 0 mismatches and still allow splices? If it's not possible I can just do downstream filtering.

                              Thanks.
                              Originally posted by Brian Bushnell View Post
                              There is no option for that, unfortunately.
                              I'm wondering how the setting of minid changes the reporting of intron spanning alignments in RNA-seq reads. At what point will minid cause bbmap to NOT report introns? For example I may want to be very restrictive about mismatches (minid=0.98) but I'd still want my RNA-seq reads to map across junctions. Is there a measurable identity penalty when reads map across introns?

                              Thanks.
                              /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
                              Salk Institute for Biological Studies, La Jolla, CA, USA */

                              Comment


                              • First, let me describe "minid" and "idfilter", which are slightly different:

                                "minid" sets a threshold for the minimum allowable alignment score, which is based on a custom hard-coded affine-transform weight matrix, and only loosely correlated with the actual percent identity, depending on the types and lengths of mutation events. It directly affects the mapping process (and speed), as sites that can be proven to have a score below the minimum ratio are not further considered.

                                "idfilter" does not affect mapping at all, but rather, sites that have an true percent identity below "idfilter" are not printed. You can use both or either; the difference is subtle, but important in some cases.

                                Now, as for scoring - the affine score matrix is deeply embedded in BBMap. A 100bp read that maps perfectly with only matches has a 100% identity and BBMap score of 9970 (100 points per match, but only 70 points for the first match). If a 100bp read can map somewhere with 1 mismatch (99% identity, BBMap score of 9713: 9970+1*(-127-100-100+70)), that site will always be chosen over a site with 50bp match, 20k intron, 50bp match (4.7% identity, BBMap score of roughly 8841). But if the 100bp read can map somewhere with 5 mismatches (95% identity, BBMap score of 8685: 9970 +5*(-127-100-100+70)), the spliced alignment will be chosen, because 8841>8685. However they are very close so it would get a low mapping score.

                                These site-selection priorities will be made no matter what you set for idfilter or minid; they could only be changed by a new weight matrix, which is nontrivial because it affects a lot of the proofs scattered through the code. A couple mismatches will always be preferred over a really long splice.

                                So, you can post-filter based on the number of mismatches, but there's currently no way to adjust what BBMap thinks is the best alignment beyond using the "maxindel" flag. However, I could make a new weight matrix that penalizes substitutions more and deletions less, as long as a single deletion strictly has a greater score penalty than a single substitution. Would that be helpful?

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Current Approaches to Protein Sequencing
                                  by seqadmin


                                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                  04-04-2024, 04:25 PM
                                • seqadmin
                                  Strategies for Sequencing Challenging Samples
                                  by seqadmin


                                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                  03-22-2024, 06:39 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 04-11-2024, 12:08 PM
                                0 responses
                                24 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 10:19 PM
                                0 responses
                                25 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 09:21 AM
                                0 responses
                                21 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-04-2024, 09:00 AM
                                0 responses
                                52 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X