Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • contig assembly

    Hello,

    We've de novo assembled our RNA-Seq reads (about 50 millions 2×75 reads) into contigs by several de novo assemblers with different parameters. Most of the contigs we’ve get were very short due to the poor sequencing quality and the low sequencing depth. The contigs from each assmbler under different parameters varied from eachother, but some of them had overlaps. So these contigs may be assembled into longer contigs. The problem is that we couldn't assemble millions of contigs into supercontigs manually. Moreover, our computer resources were very low (12G RAM, 8 core CPU, 500G spaces). Is anyone knows how to assmeble these contigs into longer contigs with our limited computer resources, and which software could handle the assemble task.

    Thanks

    YH-GU
    Last edited by yh_gu; 08-07-2010, 01:57 AM.

  • #2
    If you have enough memory to assemble reads into contigs then you clearly have enough memory to assemble contigs into supercontigs, as that is an easier feat.

    When an assembler produces contigs that ostensibly overlap and yet remain separate there is some likely path ambiguity that has not been resolved. It sounds like you'll need more paired-end sequence for your assembly to coalesce.
    --
    Jeremy Leipzig
    Bioinformatics Programmer
    --
    My blog
    Twitter

    Comment


    • #3
      The last is certainly true - but can anyone recommend the most suitable software tool for the task? I am currently looking to do something similar and so far have only tried PAVE (which died with an error message that I'm tracking down). In my case I have performed de novo assembly on a number of genotypes of the same species and now I want to merge those together, identify SNPs and see if longer ESTs can be made by merging contigs across the per-genotype assemblies.

      So, what are the current favourite tools for merging large numbers of contigs coming from de novo transcript assemblies of short read data?

      Comment


      • #4
        Originally posted by Zigster View Post
        When an assembler produces contigs that ostensibly overlap and yet remain separate there is some likely path ambiguity that has not been resolved. It sounds like you'll need more paired-end sequence for your assembly to coalesce.
        The overlaps I've mentioned mainly refer to the contigs that produced by different assembler. So, we want to find a suitable software to assemble them longer.

        Comment


        • #5
          Velvet seems to work fairly well with contig assemblies in my hands, though as Zigster pointed out, the assembly path ambiguity will ultimately prevent use of as much productive overlaps as you'd suspect because there will be discrepancies across assemblers in just the wrong places per contig.

          It may be interesting to "go conservative" with different assemblers' contigs by trimming away their weakpoints. E.g. maybe try trimming away low quality ends of contigs to minimize including ambiguous sequence spans in your secondary assembly. Otherwise you'd expect to get good overlaps in the middle of the contigs but not good alignments at the ends. But each assembler has its challenge area, so you may want to deal with each one in its own way. At some point we (the collective) should put together some cross-assembler lessons learned, and maybe pre-configurations that help tools like Velvet use each assembler's strengths more natively.

          Velvet does have numerous options to tweak though, which I think gives it promise, and you can try "oases" which is a layer on top of Velvet which is intended to allow for splice variants. Marcel Schulz and Daniel Zerbino seem to have put together a very useful (and timely) toolsuite for this type of work. Kudos to them, and thanks to them as well for providing it as they continue perfecting it.

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Current Approaches to Protein Sequencing
            by seqadmin


            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
            04-04-2024, 04:25 PM
          • seqadmin
            Strategies for Sequencing Challenging Samples
            by seqadmin


            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
            03-22-2024, 06:39 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 04-11-2024, 12:08 PM
          0 responses
          25 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 10:19 PM
          0 responses
          27 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 09:21 AM
          0 responses
          24 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-04-2024, 09:00 AM
          0 responses
          52 views
          0 likes
          Last Post seqadmin  
          Working...
          X