Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • de novo 454 assembly w/ newbler ... how long?

    I'm having some issues with a newbler assembly that I posted about in another forum (but probably should have posted here ... hopefully this isn't a ghost town!); essentially, though, my concern is this: can someone give me an idea of how long their assemly runs with newbler have taken? I've got ~1.7 million reads (N50 ~250bp) from a plant genome, and I have an assembly rum that's going on ~65 hours now ... is that normal, or excessive?

    Any comments would be appreciated.

    ~Joe

  • #2
    Hi Joe,

    Your difficulty on assemling plants 454 data is expected.

    Plant sequences are highly repetitive. The 454 assembly running time is porportional to the degree of repeats in the data set.

    Typically, for bacterial data of your size, it takes only couple of hours to finish. But for plants, it can go on to several days, or not finishing at all, and our of memory crash.

    If you do pre-processing removing the repetitive reads in your data, it may help to get results faster and maybe better contigs to start with. Generally, plants are tough on bioinformatics for de novo assembly.
    Last edited by hlu; 01-02-2009, 02:27 PM.

    Comment


    • #3
      Hi,
      I've been working with much smaller genomes, bacterial approx. 4.5mb in size, 1.6million assembled reads. Using Newbler version 2.0, 64bit, checking the 'complex large genome' tab it took approx. 40min to perform the de novo assembly.

      As mentioned in the previous post, plant genomes are alot more of a headache bioinformatically and require a hefty amount of processing time. But 65h + does seem alot, when compared to the bacterial genome. Check with Roche as newbler may be RAM dependent, up'ing it may speed up the assembly?!??!?

      Comment


      • #4
        Thanks Raj -- I should have noted that I think I sounded the alarm too soon; my runs are finishing in several days ... it just appeared for a while that there was no progress and I was unfamiliar with newbler's behavior. I'm using the '-m' flag to keep all reads in memory, which should speed up the runs ... and they appear to be maxing out at ~10G.

        I've also removed reads that blasted well to RepBase's various plant libraries, and am re-assembling, but unfortunately haven't been timing the assembly runs exactly ... if I get a chance to benchmark raw and no-repeat assemblies against each other, I'll try to post results here.

        Comment


        • #5
          Hi,

          I recently got some data for transcriptome sequencing by 454, and want to analyze by using CLC workbench, for the de novo assembly, I got huge difference by only changing the minimum length of contigs, this confused me for further analysis, did anyone used the same, what are the recommendations???

          Comment


          • #6
            Originally posted by AAWT View Post
            Hi,

            I recently got some data for transcriptome sequencing by 454, and want to analyze by using CLC workbench, for the de novo assembly, I got huge difference by only changing the minimum length of contigs, this confused me for further analysis, did anyone used the same, what are the recommendations???
            well, first, please don't hijack threads, open a new one.

            Concerning your concern :-) ... if you raise the minimum contig length to something longer than the length of a substantial number of contigs in your assembly, then of course this influences the overall result.
            E.g. if you have a lot of very short fragments of 300bp and you raise the minimum contig length to 350bp, then you will loose a lot of contigs ..

            Did I get you right?

            hth, Sven

            Comment


            • #7
              Originally posted by sklages View Post
              well, first, please don't hijack threads, open a new one.

              Concerning your concern :-) ... if you raise the minimum contig length to something longer than the length of a substantial number of contigs in your assembly, then of course this influences the overall result.
              E.g. if you have a lot of very short fragments of 300bp and you raise the minimum contig length to 350bp, then you will loose a lot of contigs ..

              Did I get you right?

              hth, Sven
              Yes this is also one concern to loose some data and the other thing which I did was the local blast of de novo assembled contigs with already available reference sequence, what I got,,,,,,,with the same ref seq unigene many contigs aligned which give the confusion that many contigs have same seuence, so why the same seq include in as many contigs during de novo assembly,,,,,,,,what does it mean that the assembly is very poor,,,,,,,,,,or or or,,,,,,????

              Comment


              • #8
                Originally posted by AAWT View Post
                Yes this is also one concern to loose some data and the other thing which I did was the local blast of de novo assembled contigs with already available reference sequence, what I got,,,,,,,with the same ref seq unigene many contigs aligned which give the confusion that many contigs have same seuence, so why the same seq include in as many contigs during de novo assembly,,,,,,,,what does it mean that the assembly is very poor,,,,,,,,,,or or or,,,,,,????
                It does not necessarily mean that your contig sequences are identical;
                probably they are very similar, *almost* identical. Depending on the
                kind of assembler these are put together or, in your case, not.
                CLC is not really a cDNA denovo Assembler and quality of the results
                obtained may vary.

                And, did you trim your data (polyA, potential adaptors)? This will influence
                your assembly as well.

                Last but not least, to give you a kind of feeling for your dataset,
                try to use another assembler, at least as a "reference assembly",
                e.g. Roche's Newbler or MIRA.
                However, if your dataset is huge and the library is not normalised you may
                run into problems with most straight forward assembly approaches.

                hth, Sven

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Strategies for Sequencing Challenging Samples
                  by seqadmin


                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                  03-22-2024, 06:39 AM
                • seqadmin
                  Techniques and Challenges in Conservation Genomics
                  by seqadmin



                  The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                  Avian Conservation
                  Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                  03-08-2024, 10:41 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, Yesterday, 06:37 PM
                0 responses
                8 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, Yesterday, 06:07 PM
                0 responses
                8 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-22-2024, 10:03 AM
                0 responses
                49 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-21-2024, 07:32 AM
                0 responses
                67 views
                0 likes
                Last Post seqadmin  
                Working...
                X