Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • bacterial genome assembly on Miseq

    Hello!
    I am planning to make a Nextera XT library prep (a regular bacterial genome, a couple of megabases), sequence it with 300 cycles kit(2x150 bp), and then assemble. As far as I understand miseq has inbuilt software that does everything (including velvet assembly for small genomes) automatically. do I need to launch the process after the sequencing or is everything done automatically?

  • #2
    Everything should be done automatically if you set it up on your sample sheet. The assembly is carried out in BaseSpace. Although, if you go into the Run Options screen on your MiSeq, you have the ability to replicate the analysis locally - which you might want to do if not using BaseSpace.
    We've found the Velvet assembly on MiSeq to be rather hit and miss in terms of quality, ranging from acceptable to very poor. We got much better assemblies using an OLC assembler than we ever got with Velvet.

    Comment


    • #3
      I'll echo Tony's statements about assemblies coming straight off the MiSeq as being very hit or miss. For one genome we once got a 2.8Mbp contig that was nearly perfect out of a 3.5Mbp genome, but we've also gotten assemblies with N50s of 2Kbp and no contigs larger than 50Kbp. A large part of the problem is that the data doesn't appear to be pre-processed in any way to trim off low quality regions or look for PCR duplicates.

      I'd suggest setting up your run to produce the assembly, but also do the work yourself to compare. Most likely you'll find that you can do a much better job and be glad you didn't just rely on the system to give you an assembly.

      Comment


      • #4
        We once got an N50s less than the read-length (251PE).

        <AssemblyStatistics>
        <NumberOfContigs>59444</NumberOfContigs>
        <MeanContigLength>56.10188</MeanContigLength>
        <MedianContigLength>46</MedianContigLength>
        <MinContigLength>31</MinContigLength>
        <MaxContigLength>560</MaxContigLength>
        <BaseCount>3334920</BaseCount>
        <N50>62</N50>
        </AssemblyStatistics>

        No idea what was going on there. All quality stats suggested it was good sequencing (12m reads from v2 500 cycle with 93% >Q30). We assembled the data offline without problems. N50's went up to 150kb.

        If you want to use velvet, Nick Loman has a good guide about how to pre-process

        Comment


        • #5
          Thanks a lot! Well, anyaway Miseq stores unaligned fastaq data, so I will be able to have a look at the automatic assembly and then, if the quality is lacking try other software or run Velvet again but with pre-process.

          Comment


          • #6
            Ive had trouble with Velvet before. The issue was not the read quality but rather the sequencing that was too deep (>~50x Velvet falls apart). I am now almost exclusively using Spades (http://bioinf.spbau.ru/spades/) which does the read corrections and assembly on one go, and dosent mind very deep coverage. Spades also gives me better results than CLC.
            Last edited by nucleus; 12-13-2013, 09:01 AM.

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Essential Discoveries and Tools in Epitranscriptomics
              by seqadmin




              The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
              04-22-2024, 07:01 AM
            • seqadmin
              Current Approaches to Protein Sequencing
              by seqadmin


              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
              04-04-2024, 04:25 PM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, Yesterday, 11:49 AM
            0 responses
            15 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-24-2024, 08:47 AM
            0 responses
            16 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-11-2024, 12:08 PM
            0 responses
            61 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 10:19 PM
            0 responses
            60 views
            0 likes
            Last Post seqadmin  
            Working...
            X