Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • De novo assembler for 300 million Solexa reads

    Hi~ all ^^

    I'm currently trying to assemble genome sequence and I have about 300 million Solexa reads (Paired-end; 220bp insert size).

    When I using Velvet for assembling, I've got an error as below:
    ---------------------------------------------------------------------------------
    velvetg: Can't calloc 18446744072010747658 InsertionMarkers totalling
    18446744046528688288 bytes: Cannot allocate memory
    Reading roadmap file ./Roadmaps
    301083362 roadmaps reads
    Creating insertion markers
    ---------------------------------------------------------------------------------

    So I try to use other assemblers. If somebody tried to assemble paired-end Solexa reads and successfully completed, please tell me about the assembler and run command.

    Server specification: 12 CPU, 72GB RAM

    Thanks for any comments.

  • #2
    you might need almost 10x that amount of memory to handle 300M reads
    use a subset of the reads, trim aggressively, and use stringent kmer and cvCut settings
    --
    Jeremy Leipzig
    Bioinformatics Programmer
    --
    My blog
    Twitter

    Comment


    • #3
      I think that a solution to your problem is use SOAPdenovo or ABySS that have lower memory requirements

      Francesco

      Comment


      • #4
        I second giving ABySS a try. I find it's memory requirements are much more reasonable.

        However, if you do want to get Velvet working you could look at Curtain, which might help you out.

        Comment


        • #5
          Hi natstreet
          there is a thing that I don't understand about Curtain. It is a reference assisted assembler (it uses maq to align reads against a reference and then if improved the reference assembly) or after assembling with velvet or others it maps with maq on the contigs anc then uses the pair read information to improve the assembly?

          Francesco

          Comment


          • #6
            Sorry - I forgot to say that curtain only works if you have a reference as a starting point. I don't have hands-on experience with it - I just came across it and thought it might be worth a pointer.

            Personally, I would give ABySS a try as a starting point.

            Comment


            • #7
              Thanks for all the comments everyone. I'm going to use ABySS.

              Comment


              • #8
                How was your experience with ABySS?

                Thanks,
                Jason

                Comment


                • #9
                  In my experience, SOAPdenovo better than ABySS for assembling large Solexa read set.
                  I successfully completed assembly using SOAPdenovo. :-)

                  Comment


                  • #10
                    Originally posted by odysseus View Post
                    In my experience, SOAPdenovo better than ABySS for assembling large Solexa read set.
                    I successfully completed assembly using SOAPdenovo. :-)
                    what genome?
                    -drd

                    Comment


                    • #11
                      Can anyone tell how much RAM needed for 300M PE reads if we use soapdenovo? I tried soapdenovo and found 264G RAM cannot fill up the soapdenovo requirements.

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        Essential Discoveries and Tools in Epitranscriptomics
                        by seqadmin


                        The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
                        Yesterday, 07:01 AM
                      • seqadmin
                        Current Approaches to Protein Sequencing
                        by seqadmin


                        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                        04-04-2024, 04:25 PM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, 04-11-2024, 12:08 PM
                      0 responses
                      39 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-10-2024, 10:19 PM
                      0 responses
                      41 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-10-2024, 09:21 AM
                      0 responses
                      35 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-04-2024, 09:00 AM
                      0 responses
                      55 views
                      0 likes
                      Last Post seqadmin  
                      Working...
                      X