Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Woody,

    It sounds like Java is not installed on your cluster. Try running this command:

    java -version

    It should print some kind of message, like this:

    java version "1.6.0_07"
    Java(TM) SE Runtime Environment (build 1.6.0_07-b06)
    Java HotSpot(TM) Client VM (build 10.0-b23, mixed mode)


    If it just says "command not found" then java is not installed. You should contact the system administrator, or download it from here:

    Subscribe to Java SE and get the most comprehensive Java support available, with 24/7 global access to the experts.


    ...and put 'java' in your path. Or explicitly change the path of 'java' in my shellscript to wherever you install it.

    Comment


    • #17
      Hi Brian,

      Thank you for the quick reply. It indeed was the problem of not finding java. I have consulted with the administrator about this since I used java before without a problem.

      Sincerely,
      Woody

      Comment


      • #18
        Originally posted by Brian Bushnell View Post
        Woody,

        Did you try error-correcting and using the 'vloose' setting? Both of those can substantially improve the merge rate.

        If you want to see what the actual, unbiased insert size distribution of your data, then you can generate a histogram by mapping:

        bbmap.sh in1=read1.fq in2=read2.fq ref=reference.fa ihist=ihist.txt nodisk

        Then plot a graph of 'ihist.txt', so you can see how many reads should be expected to assemble.
        I can't find error-correcting option in bbmerge.sh. Also, in bbmap.sh, "out=" seems outputs reads instead of sam files. Could you please give me a hint? Thanks!
        Last edited by woodydon; 06-22-2014, 11:33 PM.

        Comment


        • #19
          Originally posted by woodydon View Post
          I can't find error-correcting option in bbmerge.sh. Also, in bbmap.sh, "out=" seems outputs reads instead of sam files. Could you please give me a hint? Thanks!
          With bbmap.sh out=filename.sam should produce a sam file (http://seqanswers.com/forums/showpos...69&postcount=1).

          Comment


          • #20
            Originally posted by woodydon View Post
            I can't find error-correcting option in bbmerge.sh. Also, in bbmap.sh, "out=" seems outputs reads instead of sam files. Could you please give me a hint? Thanks!
            Woody,

            bbmerge does not do error-correction; for that, you need to use the script "ecc.sh", e.g.

            ecc.sh in=reads.fq out=corrected.fq

            And as Genomax implied, all of my programs are sensitive to file name extensions. If you say "out=reads.fasta" you'll get fasta; if you say "out=reads.sam.gz", you'll get gzipped sam, and so forth.

            Comment


            • #21
              Originally posted by Brian Bushnell View Post
              Woody,

              bbmerge does not do error-correction; for that, you need to use the script "ecc.sh", e.g.

              ecc.sh in=reads.fq out=corrected.fq

              And as Genomax implied, all of my programs are sensitive to file name extensions. If you say "out=reads.fasta" you'll get fasta; if you say "out=reads.sam.gz", you'll get gzipped sam, and so forth.
              I see~ thanks for GenoMax's and your explanations!

              Woody

              Comment


              • #22
                Originally posted by Brian Bushnell View Post
                Woody,

                bbmerge does not do error-correction; for that, you need to use the script "ecc.sh", e.g.

                ecc.sh in=reads.fq out=corrected.fq

                And as Genomax implied, all of my programs are sensitive to file name extensions. If you say "out=reads.fasta" you'll get fasta; if you say "out=reads.sam.gz", you'll get gzipped sam, and so forth.
                Hi Brain,

                I used ecc.sh to correct my raw .fq files and then use bbmerge.sh again. I found the assemble rate is much lower:

                Code:
                Reads:          136095860
                Joined:         13084199        9.614%
                Ambiguous:      31319           0.023%
                No Solution:    122968112       90.354%
                Too Short:      12230           0.009%
                Avg Insert:                     120.9
                Standard Deviation:             12.9
                
                Insert range:           26 - 138
                90th percentile:        134
                50th percentile:        123
                10th percentile:        107
                Without ecc.sh, the joined reads were about 30~40%. Now, it is only 9%...

                Woody

                Comment


                • #23
                  Woody,

                  That's very strange. The only explanation I can come up with is that the order of the reads is getting changed so read pairs are no longer always together.

                  If you ran ecc.sh twice on different files, that would make sense; unless you add the "ordered" flag, the reads will not necessarily come out in the same order. But it's best to correct all the reads together so that there is more coverage. So, assuming your reads are in two files (rather than interleaved), you should run it like this:

                  ecc.sh in1=read1.fq in2=read2.fq out1=corrected1.fq out2=corrected2.fq

                  And if your reads are interleaved in 1 file, you can add the "int=t" flag to ensure the reads are treated as interleaved. This is autodetected but the autodetection will fail if the read names are different than expected.

                  ecc.sh in=reads.fq out=corrected.fq int=t

                  If you do not think that was the problem, then please post the command line and standard out / standard error output of ecc. Thanks!

                  Comment


                  • #24
                    My command was:

                    Code:
                    ~/bbmap/ecc.sh threads=8 -Xmx14g in=~/reads/n_R1
                    .fastq out=~/reads/n_eccR1.fastq
                    Ecc.sh's log:

                    Code:
                    Settings:
                    threads:                8
                    k:                      31
                    deterministic:          false
                    toss error reads:       false
                    passes:                 1
                    bits per cell:          16
                    cells:                  3147.19M
                    hashes:                 3
                    prefilter bits:         2
                    prefilter cells:        13.56B
                    prefilter hashes:       2
                    base min quality:       5
                    kmer min prob:          0.5
                    
                    target depth:           40
                    min depth:              6
                    max depth:              40
                    min good kmers:         15
                    depth percentile:       54.0
                    ignore dupe kmers:      true
                    fix spikes:             false
                    
                    Changed from ASCII-33 to ASCII-64 on input quality 94 while prescanning.
                    Made prefilter:         hashes = 2       mem = 3.15 GB          cells = 13.54B 
                           used = 3.739%
                    Made hash table:        hashes = 3       mem = 5.86 GB          cells = 3144.33M   
                         used = 6.197%
                    
                    Estimated kmers of depth 1-3:   203133322
                    Estimated kmers of depth 4+ :   54945575
                    Estimated unique kmers:         258078898
                    
                    Table creation time:            1887.618 seconds.
                    Started output threads.
                    Table read time:                1510.451 seconds.       6757.71 kb/sec
                    Total reads in:                 136095860       100.000% Kept
                    Total bases in:                 10207189500     100.000% Kept
                    Error reads in:                 12059846        8.861%
                    Error type 1:                   3939140         2.894%
                    Error type 2:                   6350473         4.666%
                    Error type 3:                   9031666         6.636%
                    
                    During Error Correction:
                    Errors Suspected:               31378616
                    Errors Corrected:               15385594
                    Errors Marked:                  0
                    
                    Total kmers counted:            6085312047
                    Total unique kmer count:        408064601
                    
                    Includes forward kmers only.
                    The unique kmer estimate can be more accurate than the unique count, if the tables are
                     very full.
                    The most accurate value is the greater of the two.
                    
                    Percent unique:                  6.71%
                    Depth average:                  14.91   (unique kmers)
                    Depth median:                   1       (unique kmers)
                    Depth standard deviation:       535.10  (unique kmers)
                    
                    Depth average:                  6532.20 (all kmers)
                    Depth median:                   2141    (all kmers)
                    Depth standard deviation:       26317.70        (all kmers)
                    
                    Approx. read depth median:      3568.33
                    
                    Total time:                     3399.627 seconds.       3002.44 kb/sec
                    Hope it helps.

                    Woody

                    Comment


                    • #25
                      Woody,

                      Yep, the problem was error-correcting the two files independently, which will reorder the reads on a multithreaded machine. So, this command will solve it:

                      ecc.sh threads=8 -Xmx14g in=~/reads/n_R1.fastq in2=~/reads/n_R2.fastq out=~/reads/n_eccR1.fastq out2=~/reads/n_eccR2.fastq

                      Comment


                      • #26
                        Hi Brian,

                        Could you try benchmarking against SeqPreP (https://github.com/jstjohn/SeqPrep)?

                        Comment


                        • #27
                          salamay,

                          I will certainly add it to my list of things to do

                          Comment


                          • #28
                            This is a very nice, very useful thread, thank you, and thank you Brian for the details on BBmerge. I have just started using PEAR so it was good to read these comments.

                            Comment


                            • #29
                              Hi,
                              I am one of the authors of PEAR. I must say this is a very unfair benchmark for PEAR. If you really want to show your program is better, please run PEAR with the default parameters like this:
                              pear -f r1.fq -r r2.fq -o outpear

                              The statistical test in an integrated part of PEAR, it makes no sense to switch it off. Also, you should not limit the minimal merge length in PEAR.

                              Comment


                              • #30
                                I tried to make the comparison as fair as possible, but I will retest this with your suggested parameters and post the results sometime later this week.

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Techniques and Challenges in Conservation Genomics
                                  by seqadmin



                                  The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                  Avian Conservation
                                  Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                  03-08-2024, 10:41 AM
                                • seqadmin
                                  The Impact of AI in Genomic Medicine
                                  by seqadmin



                                  Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
                                  02-26-2024, 02:07 PM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 03-14-2024, 06:13 AM
                                0 responses
                                32 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-08-2024, 08:03 AM
                                0 responses
                                71 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-07-2024, 08:13 AM
                                0 responses
                                80 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-06-2024, 09:51 AM
                                0 responses
                                68 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X