Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Brian Bushnell
    Super Moderator
    • Jan 2014
    • 2709

    #16
    Woody,

    It sounds like Java is not installed on your cluster. Try running this command:

    java -version

    It should print some kind of message, like this:

    java version "1.6.0_07"
    Java(TM) SE Runtime Environment (build 1.6.0_07-b06)
    Java HotSpot(TM) Client VM (build 10.0-b23, mixed mode)


    If it just says "command not found" then java is not installed. You should contact the system administrator, or download it from here:

    Subscribe to Java SE and get the most comprehensive Java support available, with 24/7 global access to the experts.


    ...and put 'java' in your path. Or explicitly change the path of 'java' in my shellscript to wherever you install it.

    Comment

    • woodydon
      Member
      • Jan 2010
      • 52

      #17
      Hi Brian,

      Thank you for the quick reply. It indeed was the problem of not finding java. I have consulted with the administrator about this since I used java before without a problem.

      Sincerely,
      Woody

      Comment

      • woodydon
        Member
        • Jan 2010
        • 52

        #18
        Originally posted by Brian Bushnell View Post
        Woody,

        Did you try error-correcting and using the 'vloose' setting? Both of those can substantially improve the merge rate.

        If you want to see what the actual, unbiased insert size distribution of your data, then you can generate a histogram by mapping:

        bbmap.sh in1=read1.fq in2=read2.fq ref=reference.fa ihist=ihist.txt nodisk

        Then plot a graph of 'ihist.txt', so you can see how many reads should be expected to assemble.
        I can't find error-correcting option in bbmerge.sh. Also, in bbmap.sh, "out=" seems outputs reads instead of sam files. Could you please give me a hint? Thanks!
        Last edited by woodydon; 06-22-2014, 11:33 PM.

        Comment

        • GenoMax
          Senior Member
          • Feb 2008
          • 7142

          #19
          Originally posted by woodydon View Post
          I can't find error-correcting option in bbmerge.sh. Also, in bbmap.sh, "out=" seems outputs reads instead of sam files. Could you please give me a hint? Thanks!
          With bbmap.sh out=filename.sam should produce a sam file (http://seqanswers.com/forums/showpos...69&postcount=1).

          Comment

          • Brian Bushnell
            Super Moderator
            • Jan 2014
            • 2709

            #20
            Originally posted by woodydon View Post
            I can't find error-correcting option in bbmerge.sh. Also, in bbmap.sh, "out=" seems outputs reads instead of sam files. Could you please give me a hint? Thanks!
            Woody,

            bbmerge does not do error-correction; for that, you need to use the script "ecc.sh", e.g.

            ecc.sh in=reads.fq out=corrected.fq

            And as Genomax implied, all of my programs are sensitive to file name extensions. If you say "out=reads.fasta" you'll get fasta; if you say "out=reads.sam.gz", you'll get gzipped sam, and so forth.

            Comment

            • woodydon
              Member
              • Jan 2010
              • 52

              #21
              Originally posted by Brian Bushnell View Post
              Woody,

              bbmerge does not do error-correction; for that, you need to use the script "ecc.sh", e.g.

              ecc.sh in=reads.fq out=corrected.fq

              And as Genomax implied, all of my programs are sensitive to file name extensions. If you say "out=reads.fasta" you'll get fasta; if you say "out=reads.sam.gz", you'll get gzipped sam, and so forth.
              I see~ thanks for GenoMax's and your explanations!

              Woody

              Comment

              • woodydon
                Member
                • Jan 2010
                • 52

                #22
                Originally posted by Brian Bushnell View Post
                Woody,

                bbmerge does not do error-correction; for that, you need to use the script "ecc.sh", e.g.

                ecc.sh in=reads.fq out=corrected.fq

                And as Genomax implied, all of my programs are sensitive to file name extensions. If you say "out=reads.fasta" you'll get fasta; if you say "out=reads.sam.gz", you'll get gzipped sam, and so forth.
                Hi Brain,

                I used ecc.sh to correct my raw .fq files and then use bbmerge.sh again. I found the assemble rate is much lower:

                Code:
                Reads:          136095860
                Joined:         13084199        9.614%
                Ambiguous:      31319           0.023%
                No Solution:    122968112       90.354%
                Too Short:      12230           0.009%
                Avg Insert:                     120.9
                Standard Deviation:             12.9
                
                Insert range:           26 - 138
                90th percentile:        134
                50th percentile:        123
                10th percentile:        107
                Without ecc.sh, the joined reads were about 30~40%. Now, it is only 9%...

                Woody

                Comment

                • Brian Bushnell
                  Super Moderator
                  • Jan 2014
                  • 2709

                  #23
                  Woody,

                  That's very strange. The only explanation I can come up with is that the order of the reads is getting changed so read pairs are no longer always together.

                  If you ran ecc.sh twice on different files, that would make sense; unless you add the "ordered" flag, the reads will not necessarily come out in the same order. But it's best to correct all the reads together so that there is more coverage. So, assuming your reads are in two files (rather than interleaved), you should run it like this:

                  ecc.sh in1=read1.fq in2=read2.fq out1=corrected1.fq out2=corrected2.fq

                  And if your reads are interleaved in 1 file, you can add the "int=t" flag to ensure the reads are treated as interleaved. This is autodetected but the autodetection will fail if the read names are different than expected.

                  ecc.sh in=reads.fq out=corrected.fq int=t

                  If you do not think that was the problem, then please post the command line and standard out / standard error output of ecc. Thanks!

                  Comment

                  • woodydon
                    Member
                    • Jan 2010
                    • 52

                    #24
                    My command was:

                    Code:
                    ~/bbmap/ecc.sh threads=8 -Xmx14g in=~/reads/n_R1
                    .fastq out=~/reads/n_eccR1.fastq
                    Ecc.sh's log:

                    Code:
                    Settings:
                    threads:                8
                    k:                      31
                    deterministic:          false
                    toss error reads:       false
                    passes:                 1
                    bits per cell:          16
                    cells:                  3147.19M
                    hashes:                 3
                    prefilter bits:         2
                    prefilter cells:        13.56B
                    prefilter hashes:       2
                    base min quality:       5
                    kmer min prob:          0.5
                    
                    target depth:           40
                    min depth:              6
                    max depth:              40
                    min good kmers:         15
                    depth percentile:       54.0
                    ignore dupe kmers:      true
                    fix spikes:             false
                    
                    Changed from ASCII-33 to ASCII-64 on input quality 94 while prescanning.
                    Made prefilter:         hashes = 2       mem = 3.15 GB          cells = 13.54B 
                           used = 3.739%
                    Made hash table:        hashes = 3       mem = 5.86 GB          cells = 3144.33M   
                         used = 6.197%
                    
                    Estimated kmers of depth 1-3:   203133322
                    Estimated kmers of depth 4+ :   54945575
                    Estimated unique kmers:         258078898
                    
                    Table creation time:            1887.618 seconds.
                    Started output threads.
                    Table read time:                1510.451 seconds.       6757.71 kb/sec
                    Total reads in:                 136095860       100.000% Kept
                    Total bases in:                 10207189500     100.000% Kept
                    Error reads in:                 12059846        8.861%
                    Error type 1:                   3939140         2.894%
                    Error type 2:                   6350473         4.666%
                    Error type 3:                   9031666         6.636%
                    
                    During Error Correction:
                    Errors Suspected:               31378616
                    Errors Corrected:               15385594
                    Errors Marked:                  0
                    
                    Total kmers counted:            6085312047
                    Total unique kmer count:        408064601
                    
                    Includes forward kmers only.
                    The unique kmer estimate can be more accurate than the unique count, if the tables are
                     very full.
                    The most accurate value is the greater of the two.
                    
                    Percent unique:                  6.71%
                    Depth average:                  14.91   (unique kmers)
                    Depth median:                   1       (unique kmers)
                    Depth standard deviation:       535.10  (unique kmers)
                    
                    Depth average:                  6532.20 (all kmers)
                    Depth median:                   2141    (all kmers)
                    Depth standard deviation:       26317.70        (all kmers)
                    
                    Approx. read depth median:      3568.33
                    
                    Total time:                     3399.627 seconds.       3002.44 kb/sec
                    Hope it helps.

                    Woody

                    Comment

                    • Brian Bushnell
                      Super Moderator
                      • Jan 2014
                      • 2709

                      #25
                      Woody,

                      Yep, the problem was error-correcting the two files independently, which will reorder the reads on a multithreaded machine. So, this command will solve it:

                      ecc.sh threads=8 -Xmx14g in=~/reads/n_R1.fastq in2=~/reads/n_R2.fastq out=~/reads/n_eccR1.fastq out2=~/reads/n_eccR2.fastq

                      Comment

                      • salamay
                        Member
                        • May 2014
                        • 20

                        #26
                        Hi Brian,

                        Could you try benchmarking against SeqPreP (https://github.com/jstjohn/SeqPrep)?

                        Comment

                        • Brian Bushnell
                          Super Moderator
                          • Jan 2014
                          • 2709

                          #27
                          salamay,

                          I will certainly add it to my list of things to do

                          Comment

                          • Elsie
                            Member
                            • Mar 2011
                            • 85

                            #28
                            This is a very nice, very useful thread, thank you, and thank you Brian for the details on BBmerge. I have just started using PEAR so it was good to read these comments.

                            Comment

                            • zhangjiajie
                              Junior Member
                              • Jul 2012
                              • 1

                              #29
                              Hi,
                              I am one of the authors of PEAR. I must say this is a very unfair benchmark for PEAR. If you really want to show your program is better, please run PEAR with the default parameters like this:
                              pear -f r1.fq -r r2.fq -o outpear

                              The statistical test in an integrated part of PEAR, it makes no sense to switch it off. Also, you should not limit the minimal merge length in PEAR.

                              Comment

                              • Brian Bushnell
                                Super Moderator
                                • Jan 2014
                                • 2709

                                #30
                                I tried to make the comparison as fair as possible, but I will retest this with your suggested parameters and post the results sometime later this week.

                                Comment

                                Latest Articles

                                Collapse

                                • SEQadmin2
                                  From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                                  by SEQadmin2


                                  Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                                  The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                                  ...
                                  06-02-2026, 10:05 AM
                                • SEQadmin2
                                  Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                                  by SEQadmin2


                                  With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                                  Introduction

                                  Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                                  05-22-2026, 06:42 AM
                                • SEQadmin2
                                  Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                                  by SEQadmin2

                                  Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                                  Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                                  05-06-2026, 09:04 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by SEQadmin2, Yesterday, 08:59 AM
                                0 responses
                                14 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, 06-02-2026, 12:03 PM
                                0 responses
                                22 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, 06-02-2026, 11:40 AM
                                0 responses
                                19 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, 05-28-2026, 11:40 AM
                                0 responses
                                32 views
                                0 reactions
                                Last Post SEQadmin2  
                                Working...