Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • 10 kb plasmid de novo assembly

    I have the simplest task for a de novo assembler: to assemble a short bacterial 10kb plasmid without repeats from Illumina 90 bp long reads. The plasmid was sequenced on average 1,000 times over. I used default parameters on velvet and SeqMan NGen and couldn't get it assembled. Could anyone suggest an assembler and parameters that I could use for the task.

  • #2
    First throw away 90% of your reads. 1000x is too high.
    --
    Phillip

    Comment


    • #3
      Originally posted by pmiguel View Post
      First throw away 90% of your reads. 1000x is too high.
      --
      Phillip
      I have a similar project to this one and I'm having trouble assembling my viral sequences which I'm expecting 10,000-100,000x coverage. Why does too much coverage produce a problem for de novo assembly?

      Comment


      • #4
        Originally posted by e.dobbs View Post
        I have a similar project to this one and I'm having trouble assembling my viral sequences which I'm expecting 10,000-100,000x coverage. Why does too much coverage produce a problem for de novo assembly?
        I think it is because random errors get repeated over and over and start to look like real base calls. This complicates the solution path through the assembly graph and you get many highly related but separate contigs. Philip is correct, get a sub-sample of your data and do the assembly. You know what your solution should look like (1 contig) so do an experiment with 10X, 20X, 30X, 50X, 100X and see what you get. The N50 value will get better and better as you add reads and should approach your largest contig size (which hopefully is close to 10kb). After some level of coverage the N50 will fall and your largest contig will get shorter. With a 10kb plasmid you'll probably peak at 30 or 50X

        Travis

        Comment


        • #5
          Originally posted by e.dobbs View Post
          I have a similar project to this one and I'm having trouble assembling my viral sequences which I'm expecting 10,000-100,000x coverage. Why does too much coverage produce a problem for de novo assembly?
          The guys writing the code probably did not see >100X coverage as a common use case. So it is not optimized for that read depth. Kind of like you order a dump truck full of mulch for your landscaping. If you get that much you landscape your lawn. But if an air craft carrier load of mulch gets dumped on you, it crushes your house and smothers you.

          --
          Phillip

          Comment


          • #6
            Thanks for the answers guys! I've re-tried my assembly with 1/6000th of my data and the assembly looks much better

            Comment


            • #7
              Given your abundance of data, some really aggressive trimming of ends might help. Another approach would be to use a tool such as MUSKET that trims/corrects reads to eliminate ultra-rare kmers.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Current Approaches to Protein Sequencing
                by seqadmin


                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                04-04-2024, 04:25 PM
              • seqadmin
                Strategies for Sequencing Challenging Samples
                by seqadmin


                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                03-22-2024, 06:39 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 04-11-2024, 12:08 PM
              0 responses
              24 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 10:19 PM
              0 responses
              25 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 09:21 AM
              0 responses
              21 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-04-2024, 09:00 AM
              0 responses
              52 views
              0 likes
              Last Post seqadmin  
              Working...
              X