Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • System requirements linux comp. for off-machine assembly/analysis

    Dear all,

    I (we) would like to assembly 454-reads (50-100 Mb, possibly 200-400 Mb) on a computer off the actual sequencing FLX-machine. Roche proposes for this a 64-bit dual processor (dual x86 CPU) with 8 Gb RAM computer running Linux (brochure october 2008).

    1. Is this requirement still valid?
    2. Should I apply a computer with Quad-core and 8 or 16 Gb memory?
    3. Are there other people running the Roche 454-software off-machine?

    thanks,

    Richard

  • #2
    Requirements seem to have changed

    BioTeam has recently been setting up a decent sized off rig analysis cluster for a client. The 454 software comes with a script valTool.sh which will report if your rig is large enough. I was quite surprised to see that this wanted 16G on the master node, since we were following the same guidelines you mentioned. We had plenty of extra nodes, but all were configured with 8G of ram. During initial testing on one of these 8G machines it was thrashing hard. Long before base calling ever finished we found some mpi environment variables which allowed the work to run across the cluster quite quickly, but I'd be very wary of a single node analysis rig with only 8G.

    Other notable requirements include :
    • Master linux kernel >= 2.6.9-34 smp 64b
    • Disk space accessible from Master >= 1TB available
    • Compute nodes require >= 4GB RAM and same CPU/ARCH/OS specs as head/master



    [email protected]
    Last edited by cariaso; 02-12-2009, 11:44 AM. Reason: base calling, not assembly

    Comment


    • #3
      last time I did a top when using gsMapper or gsAssembler it was only using 1 core. The image analysis/base caller for titanium is mpi/multi core aware but I don't think the other tools are so the only thing that will help you is the additional memory. On the other hand we are using an 8 core 32G machine to do image/base calling ~14 hours per full plate. So you may want to take that under consideration if that is in your plans.

      Slow to post so adding:

      I think cariaso is talking about base calling. Not assembly. I think. FLX is easy either way it is just titanium that taxes everything.
      Last edited by Tom Bair; 02-12-2009, 09:20 AM. Reason: Slow to post

      Comment


      • #4
        true I did intend base calling. corrected. It seems I've been doing too many assemblies this week.

        Comment


        • #5
          runAssembly run times

          I didn't see any examples of run times for various sizes of assembly, so I thought I would post some here. Apologies if this isn't the right place.

          We're running Roche's "runAssembly" wrapper, version 2.0.00.20

          The interesting discovery that prompts this post is the "-large" flag. If you provide this flag to runAssembly, it "shortcuts some of the computationally expensive tasks" in the algorithm.

          Here are some runtimes, for single threads running on dedicated x86_64 linux machines with 8GB of RAM.

          1 data directory: 9.5M "seeds". 15 min, 9 min with LARGE flag
          2 data directories: 14M "seeds". 31 min. 21 min with LARGE flag
          3 data directories: 23M "seeds". 85 min. 21 min with LARGE flag
          4 data directories: 31M "seeds". still running. 30 min with LARGE flag.
          ...
          10 data directories: 78M "seeds". killed. 42 min with LARGE flag.

          These are sequences from a prokaryote. Your milage may vary.

          Comment


          • #6
            Originally posted by cdwan View Post
            10 data directories: 78M "seeds". killed. 42 min with LARGE flag.
            What do you mean by "killed"? Did the software fail? I've had newbler assembler fail with large amounts of data as well.

            Comment


            • #7
              De Novo Assembly into large genome 50 Mb to 100 MB, that is into insect range, beyond fungal genomes.

              It requires lots of memory. 8Mb memory machine is not enough.

              We are using 4 core, 32 MB machine, 64 bits. Our machine works for GS Assembly for fungal. But insect assembly is tough. Fungal runAssembly on this machine for 1 run only takes 1 hour or 2. But I did an insect assembly before on 35 runs of FLX, it took about 10 days to finish.

              -large flag for gs Assembly helps on speed. But still, I would prefer a beefy machine with huge memory. I would say as large memory as possible.

              Assembly is memory hog computation.

              Comment


              • #8
                Originally posted by erimar77 View Post
                What do you mean by "killed"? Did the software fail? I've had newbler assembler fail with large amounts of data as well.
                We have no idea whether it would have succeeded eventually or not. It seemed to be progressing - slowly - through the all vs. all comparison stage. We ran out of time to mess with it.

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Current Approaches to Protein Sequencing
                  by seqadmin


                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                  04-04-2024, 04:25 PM
                • seqadmin
                  Strategies for Sequencing Challenging Samples
                  by seqadmin


                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                  03-22-2024, 06:39 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 04-11-2024, 12:08 PM
                0 responses
                25 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 10:19 PM
                0 responses
                29 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 09:21 AM
                0 responses
                24 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-04-2024, 09:00 AM
                0 responses
                52 views
                0 likes
                Last Post seqadmin  
                Working...
                X