Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • large genome assembly using paired end and mate paire reads.

    Which assembler should i go for if i want to assemble genome of 1 -1.5 GB size. i have illumina paired end and mate pair reads of 101 bp length. how can i use mate paire reads for Scaffolding?

  • #2
    I'd try Allpaths-LG (http://www.broadinstitute.org/softwa...paths-lg/blog/) if your paired end reads mostly overlap (i.e. fragment size of ~180 bp). It will use your mate pairs for scaffolding.

    Comment


    • #3
      Thank you sarvidsson,

      I have access to 96GB memory and 24 core machine. is it sufficiant to work with Allpaths-LG?

      Comment


      • #4
        96 GB can be a bit tight, 24 cores should be fine - I'd expect the assembly to run for up to 3 days. If you error correct and normalize the paired end reads prior to assembly (with e.g. BBNorm http://seqanswers.com/forums/showthread.php?t=49763) you typically reduce memory usage for the assembly.

        Comment


        • #5
          We have 100-200Mbp fungal assemblies that run out of memory (with AllPaths-LG) on 128GB nodes, but complete on 256GB nodes. I'm guessing memory may be a serious problem; you probably are going to need more.

          Megahit is fast and seems to have a relatively low memory consumption, and Minia was designed for low memory consumption, so if AllPaths fails you might try those. Or, buy more memory, which will be essential if you plan to routinely assemble large genomes.

          Comment


          • #6
            With that amount of memory I'd recommend SGA...Minia is great too but there isn't a scaffolding option.

            Comment


            • #7
              Allpaths-LG is a good option if you have enough RAM and CPUs. Also I wonder whether one of your PE libraries are overlapping i.e. from Allpaths-LG doc "average separation size must be slightly less than twice the read size, such that the reads from a pair will likely overlap".

              Comment


              • #8
                Thank you all.

                I think i have to go for SGA or minia due to lack of memory. is it a good option to use paired end reads for assembly and then go for scaffolding with mate pair data.?
                which tool would be suitable for 101 bp mate pair data for scaffolding?

                Comment


                • #9
                  Originally posted by Pinal View Post
                  Thank you all.

                  I think i have to go for SGA or minia due to lack of memory. is it a good option to use paired end reads for assembly and then go for scaffolding with mate pair data.?
                  which tool would be suitable for 101 bp mate pair data for scaffolding?
                  You can start with SSPACE.
                  Our state-of-the-art bioinformatics and biostatistics solutions guarantee high-quality results and clear answers to your research questions.

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Current Approaches to Protein Sequencing
                    by seqadmin


                    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                    04-04-2024, 04:25 PM
                  • seqadmin
                    Strategies for Sequencing Challenging Samples
                    by seqadmin


                    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                    03-22-2024, 06:39 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 04-11-2024, 12:08 PM
                  0 responses
                  30 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 10:19 PM
                  0 responses
                  32 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 09:21 AM
                  0 responses
                  28 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-04-2024, 09:00 AM
                  0 responses
                  53 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X