Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Microbiome assembly and its normlization

    Hello,

    I am working on 24 Human gut microbiome samples generated via illiumina hiseq, almost 800 gb data.
    I have optimized and assembled the data using velvet but on different K mer, having assembly size; varying from 35 to 115mb. My query is there any need of justification to assemble all data on different k-mers. Is it ok or i should go for only one K-mer based assembly, also how do I proceed to assembly normalization.

  • #2
    For your assembly, you meant you pulled all data together and then assembled using velvet? I think this paper may be a good reference

    Comment


    • #3
      My suggestion is to try slew of different assemblers and different k-mer values.

      In terms of genome assembly, the larger the k-mer the better the assembly as you are reducing redundancy in the sequence and increase coverage.

      Now at the same time, it all depends on how much depth of coverage you want. If you have smaller k-mer value, then you will have more depth of coverage, but at the same time higher chance of mis-assembly of the reads.

      So to over come this, I would suggest then taking all your assembled reads from the different k-mers and try to re-assemble them to form "super contigs".

      This will help you achieve greater coverage (or reduce coverage to the specific genome by removing the same region that has been assembled multiple times) and reduce and bias that each assembler that might have.

      Hope this helps a bit.

      -Zapages

      Comment


      • #4
        Yes....
        thanks for great help!!!!!!

        Comment


        • #5
          IMO there are several newer assemblers out that typically outperform velvet in terms of both accuracy and speed. I typically use SPAdes for bacterial de novo assembly:


          see these as well for reference:

          De novo genome assembly is the process of reconstructing a complete genomic sequence from countless small sequencing reads. Due to the complexity of this task, numerous genome assemblers have been developed to cope with different requirements and the different kinds of data provided by sequencers within the fast evolving field of next-generation sequencing technologies. In particular, the recently introduced generation of benchtop sequencers, like Illumina's MiSeq and Ion Torrent's Personal Genome Machine (PGM), popularized the easy, fast, and cheap sequencing of bacterial organisms to a broad range of academic and clinical institutions. With a strong pragmatic focus, here, we give a novel insight into the line of assembly evaluation surveys as we benchmark popular de novo genome assemblers based on bacterial data generated by benchtop sequencers. Therefore, single-library assemblies were generated, assembled, and compared to each other by metrics describing assembly contiguity and accuracy, and also by practice-oriented criteria as for instance computing time. In addition, we extensively analyzed the effect of the depth of coverage on the genome assemblies within reasonable ranges and the k-mer optimization problem of de Bruijn Graph assemblers. Our results show that, although both MiSeq and PGM allow for good genome assemblies, they require different approaches. They not only pair with different assembler types, but also affect assemblies differently regarding the depth of coverage where oversampling can become problematic. Assemblies vary greatly with respect to contiguity and accuracy but also by the requirement on the computing power. Consequently, no assembler can be rated best for all preconditions. Instead, the given kind of data, the demands on assembly quality, and the available computing infrastructure determines which assembler suits best. The data sets, scripts and all additional information needed to replicate our results are freely available at ftp://ftp.cebitec.uni-bielefeld.de/pub/GABenchToB.

          Comment


          • #6
            I've actually used Minia for metagenome assembly with good success.

            Minia
            http://minia.genouest.org/

            Determining an optimal kmer size for a metagenome is tough. My suggestion would be to try several.

            KmerGenie
            http://arxiv.org/pdf/1304.5665.pdf

            Comment


            • #7
              Originally posted by fanli View Post
              IMO there are several newer assemblers out that typically outperform velvet in terms of both accuracy and speed. I typically use SPAdes for bacterial de novo assembly:


              see these as well for reference:

              http://journals.plos.org/plosone/art...l.pone.0107014
              Hi fanli,
              All these seems to be genome assemblers and trained on genomics data I think.

              Comment


              • #8
                I have to say, Spades is slow even on a single microbe; I doubt you could run it on 800Gbp of metagenomic reads.

                We had been using Soap and sometimes Ray for our metagenomes, but now are using Megahit which is faster and uses less memory than Soap.

                also how do I proceed to assembly normalization
                Can you clarify? I have written a normalization program to reduce high-depth reads prior to assembly, but I'm not sure that's what you are looking for.

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Current Approaches to Protein Sequencing
                  by seqadmin


                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                  04-04-2024, 04:25 PM
                • seqadmin
                  Strategies for Sequencing Challenging Samples
                  by seqadmin


                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                  03-22-2024, 06:39 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 04-11-2024, 12:08 PM
                0 responses
                27 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 10:19 PM
                0 responses
                30 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 09:21 AM
                0 responses
                26 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-04-2024, 09:00 AM
                0 responses
                52 views
                0 likes
                Last Post seqadmin  
                Working...
                X