Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • What is the best and RAM efficient pipeline for de novo assembly of...

    Hello everyone,

    What is the best and RAM efficient pipeline for de novo assembly with two datasets including about 6 millions illumina paired-end reads of 36 bp and 44 millions illumina paired-end reads of 100 bp. It's a bacterian genome of 6.5 Mbp.

    I tried Velvet, Abyss, SOAPdenovo... but it's seems impossible with 8G RAM.

    Map Illumina reads to a backbone, is it a solution ? In this case, which pipeline ?

    Thanks in advance for your help.

    Diego

  • #2
    What backbone would you have access to?

    8 GB is going to be tough. My typical workflows to lower memory usage are to use longer kmer words and velvet's -create_binary (but I am using MiSeq), but I'm not sure if that will work with an 8G limit. What is your memory footprint assembling only the 36 bp paired ends?

    Edit: C. Titus Brown has some workflows which may help. They are intended for very large metagenomic assemblies, but may be useful.

    Last edited by winsettz; 02-05-2013, 08:46 AM.

    Comment


    • #3
      In addition to digital normalization, you might try the Minia assembler, which is intended to be very memory efficient.



      Also, Amazon EC2 is quite cheap as a source of compute power. You should be able to assemble this on EC2 with Ray for <10 euros -- one Quad Extra Large High Memory instance can devour much larger datasets in an hour or so.

      Comment


      • #4
        Thanks for your reply winsettz.

        I have access to a working draft sequence (11x coverage), that's why I would prefer de novo assembly.
        I am testing Gossamer so I can't answer now for memory footprint assembly but if I remember clearly he didn't exceed 8 GB RAM with 36 bp. With the second dataset the memory footprint was 8G (RAM) plus 5G of SWAP.

        Thanks for the link, I am going to get into it.

        Best regards,

        Diego
        Last edited by Diegodescarpates; 02-05-2013, 09:28 AM.

        Comment


        • #5
          Originally posted by krobison View Post
          In addition to digital normalization, you might try the Minia assembler, which is intended to be very memory efficient.



          Also, Amazon EC2 is quite cheap as a source of compute power. You should be able to assemble this on EC2 with Ray for <10 euros -- one Quad Extra Large High Memory instance can devour much larger datasets in an hour or so.
          Thanks for information. I am trying Minia...

          Best regards,

          Diego
          Last edited by Diegodescarpates; 02-05-2013, 10:26 AM.

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Essential Discoveries and Tools in Epitranscriptomics
            by seqadmin




            The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
            04-22-2024, 07:01 AM
          • seqadmin
            Current Approaches to Protein Sequencing
            by seqadmin


            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
            04-04-2024, 04:25 PM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, Today, 08:47 AM
          0 responses
          12 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-11-2024, 12:08 PM
          0 responses
          60 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 10:19 PM
          0 responses
          59 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 09:21 AM
          0 responses
          54 views
          0 likes
          Last Post seqadmin  
          Working...
          X