Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    There are several parameters you can use to affect the table fullness, such as using the prefilter or reducing the number of bits per cell, or manually specifying the -Xmx parameter. Can you post the command line you used, and the standard error log? And, if possible, attach a kmer frequency histogram (from the full dataset). Even when the table gets really full, the accuracy is generally quite high.

    Also, quality-trimming and adapter-trimming the data before analyzing it can have a huge impact, because the vast majority of the kmers that get counted are error kmers rather than genomic kmers; so anything that reduces the number of errors and amount of non-genomic sequence will be beneficial.
    Last edited by Brian Bushnell; 02-06-2015, 01:24 PM.

    Comment


    • #17
      the command was
      ./bbnorm.sh in=all_spurge_genomic_R1.fastq out=single_copy_R1.fastq target=80 min=5 passes=1 below is a link to a picture of the screenshot and the histogram of the combined R1 and R2 reads (I set the first two kmers to 0 since the numbers for these was very large). When I try this, with both R1 and R2 files I get a hash table that is 100% full.

      http://de.iplantcollaborative.org/dl...2/Untitled.jpg

      Comment


      • #18
        OK, it looks like there are some good possibilities here:

        First, it does not look like BBNorm is using all available memory, for whatever reason, so that can be configured manually. Try this:

        ./bbnorm.sh -Xmx11g in=all_spurge_genomic_R1.fastq out=single_copy_R1.fastq target=80 min=5 passes=1 prefilter=t cbits=16 minprob=0.8

        -Xmx11g will force it to use 11GB of RAM; prefilter will put low-count kmers in a more efficient table; and cbits=16 will double the number of cells in the hashtable by capping the counts at 65k (rather than 2 billion).

        Secondly, BBNorm will keep read 1 and read 2 together if you use them both at the same time (with in1= and in2=), which is generally a good idea. Thirdly, you can further reduce the error rate (and thus kmer count), as I mentioned, by quality-trimming and adapter-trimming prior to this process. Also (or alternatively), you can ignore kmers that have a high chance of being incorrect, by adjusting the "minprob" flag upward from it's default 0.5 to, say, 0.8 (meaning, only count kmers with at least a 80% chance of being error-free according to the quality values).

        Normalization and other kmer-counting-based processes use a lot of memory when working with large genomes, and 12GB is not very much in the world of high-throughput sequencing.
        Last edited by Brian Bushnell; 02-06-2015, 04:01 PM.

        Comment


        • #19
          Awesome! Thanks again- look me up at PAG next year and I'll buy the beer :-). Incidentally, I already quality trimmed and adapter trimmed the dataset (and checked the qc). It's the first thing I do with any new set of reads before attempting any other sort of analysis. However, I am sure some primers might make it through the trimming programs, so another pass sounds like a good idea so long as it does not add a lot of time. I'll have to wait until Monday to give it a go- unless I can get access to a hotter virtual machine from iPlant. Hopefully you won't hear from me again (until I ask you how you want to be cited :-).

          Comment


          • #20
            Sadly, I am still having an issue (I think). The program has been running for three days now (but has not appeared to quit) and has the following error messages:
            Exception in thread "Thread-45" java.lang.RuntimeException: java.io.IOException: Protocol error at stream.ReadStreamByteWriter.run (ReadStreamByteWriter,java:31)
            caused by: ....a seried of java.io.IoExceptions, FileOutputStream, and ReadStreamByteWriter, and BufferedOutputStream.write

            Should I kill it or let it ruminate?

            Comment


            • #21
              What size is your input file? Is the bbnorm process actively running (check using "top")?

              Comment


              • #22
                I just ran top to see if it was still thinking, and it flashes between 0% memory to 97% use for java, so I am guessing it is still crunching output. The file size is about 68G, but I only have 12Gb RAM on my machine. The developer gave me a few tricks to allow it to run with minimal memory, so I will just be patient.

                Comment


                • #23
                  Hopefully you have an adequate amount of swap space configured. With 12G of RAM swap must be in use. If it has become full (check to see if any partitions are at 100% though swap may be configured differently on your server) that may not be good.
                  Last edited by GenoMax; 02-10-2015, 07:33 AM.

                  Comment


                  • #24
                    Not sure if I have what i need or not. Looking at Top output, I have 12g in KiB Mem, 11.9G used, 34M free, 7M buffer, KiB swap 52M total, 28M used 23M free, and 21M cached.

                    Comment


                    • #25
                      That is a relatively small amount of swap for a machine with 12G of RAM. Let us see what Brian makes of that error message.

                      Comment


                      • #26
                        BBNorm is pretty fast; I've never seen an input that made it take more than a few hours. If it throws an exception, that's fatal, so kill it.

                        The specific exception, though, is rather strange. It seems like for some reason it was unable to write a file - like, perhaps, the disk was full, or the path was invalid, or you did not have write permission, or something. Perhaps you could post the entire error message? And also, the output you get when you run "java -version", and the exact command line...

                        Comment

                        Latest Articles

                        Collapse

                        • seqadmin
                          Strategies for Sequencing Challenging Samples
                          by seqadmin


                          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                          03-22-2024, 06:39 AM
                        • seqadmin
                          Techniques and Challenges in Conservation Genomics
                          by seqadmin



                          The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                          Avian Conservation
                          Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                          03-08-2024, 10:41 AM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by seqadmin, Yesterday, 06:37 PM
                        0 responses
                        10 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, Yesterday, 06:07 PM
                        0 responses
                        9 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 03-22-2024, 10:03 AM
                        0 responses
                        50 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 03-21-2024, 07:32 AM
                        0 responses
                        67 views
                        0 likes
                        Last Post seqadmin  
                        Working...
                        X