Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Is a server with 512 GB RAM enough for denovo genome assembly?

    Hi,
    We are in the process of procuring a server with the following configuration:
    512 GB RAM
    10 TB internal storage
    40 cores

    I wanted to get a sense of what genomes (particularly mammalian) will we be able to assemble using a system with such a configuration if we have 50X - 100X read depth on Illumina platform. We will be using assembly algorithms such as Velvet and SOAPdenovo.

    Thanks much!

  • #2
    According to the paper: Li, R., H. Zhu, et al. (2010). "De novo assembly of human genomes with massively parallel short read sequencing." Genome Res 20(2): 265-272, assembling 50X of human genome sequencing data by SOAPdenovo has a peak memory usage of about 140G bytes.

    Comment


    • #3
      Probably dependent on the size of the target genome, the assembly algorithm; size, type and quality of input data. The following places might be of help

      Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc



      HTH

      Comment


      • #4
        512 GB ought to be enough for anybody, except maybe people who work with gigantic plant and protozoa genomes. Anyway, thanks to longer reads, assembly related RAM requirements should decrease in the future.
        Last edited by rhinoceros; 01-17-2014, 04:39 AM.
        savetherhino.org

        Comment


        • #5
          Diginorm may also help with memory requirements, though I haven't run benchmarks recently.

          Comment


          • #6
            Thank you all for the responses!

            Comment


            • #7
              Dear Apexy,
              Thanks a lot for sharing all the very useful links!

              Comment


              • #8
                If its a vertebrate sized genome (i.e. 2-3Gb) you’ll probably have issues with AllPaths unless you do some dignorm.

                SOAP and ABySS should be fine though. Though the error corrector for SOAPdenovo likes a lot of memory.

                If you run out of RAM and money is an issue, I’d suggest finding a cluster with 1TB nodes to run those parts of the pipeline that need >512 rather than buying the whole 1TB yourself. That much RAM is crazy expensive to only need for a handful of jobs in a machines lifetime.

                Comment


                • #9
                  I have used Allpaths-LG to assemble a ~1.6 GB vertebrate genome. It used up all of our 760 GB RAM and took about a week.

                  I don't think you'd want to run Allpaths-LG on a cluster because it is only able to use 1 CPU.

                  Comment


                  • #10
                    SGA and Ray have much lower memory requirements than other de-novo assembly programs, but with 512GB you can use almost anything.

                    Comment


                    • #11
                      Processors vs RAM using Velvet

                      I have another question. We have a server that has 20 motherboards and 4 processors per motherboard (total of 80 processors). Each motherboard has 8G of RAM for a total of 160G RAM. RAM can be bumped up to 32G per motherboard if needed.

                      My question is this...Is it possible to assign more processors to run a large assembly (genome size = 2.8 Gb) using Velvet or does one simply have to have genormous amounts of RAM with 8-10 processors?

                      The reason I ask is that the IT person running our server has asked this question as the programs he typically run (large data...not genome assembly) will run just fine with all processors but just 8G RAM for each processor. He said threading via parallel processing should be able to do this with Velvet...if indeed velvet runs on multiple threads.

                      Comment


                      • #12
                        I don't think Velvet is a good choice for vertibrate genomes. If you're looking for something that can use MPI (run across different motherboards) with relatively low RAM, ABYSS is a good choice, though their are others.

                        That total RAM of 160 might be cutting it pretty close, but for ABySS, just doubling would probably be plenty. You might still need one node with more than 32GB for a few things though.

                        Comment


                        • #13
                          I haven't tried other assemblers yet being quite UNIX challenged. It took me quite a while to get things set up for Velvet so would like to complete that first before moving on to another assembler. Abyss and SoapDenovo are on my list though!!

                          I can bump up the RAM for the departmental server to max of 640 GB if needed. Velvet Memory calculator says that my assembly should take ~350 GB ram in a perfect world. I know that previous runs (attempting to find the best kmer size and several other factors) have bumped up to 600Gb Ram.

                          Comment


                          • #14
                            Ray runs as an MPI process, and can distribute workload and memory requirements across multiple different machines.

                            Comment


                            • #15
                              I'll look into Ray. I don't want to have to spend too much money on buying hundreds of gigs of RAM if I don't have to. I figure it's reasonable to bump up my dept's server with an amount that puts me near 350G ram (~an additional 190 Gb).

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM
                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              18 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 10:19 PM
                              0 responses
                              22 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 09:21 AM
                              0 responses
                              17 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-04-2024, 09:00 AM
                              0 responses
                              49 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X