Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • jnfass
    Member
    • Aug 2008
    • 88

    de novo 454 assembly w/ newbler ... how long?

    I'm having some issues with a newbler assembly that I posted about in another forum (but probably should have posted here ... hopefully this isn't a ghost town!); essentially, though, my concern is this: can someone give me an idea of how long their assemly runs with newbler have taken? I've got ~1.7 million reads (N50 ~250bp) from a plant genome, and I have an assembly rum that's going on ~65 hours now ... is that normal, or excessive?

    Any comments would be appreciated.

    ~Joe
  • hlu
    Member
    • Jan 2009
    • 32

    #2
    Hi Joe,

    Your difficulty on assemling plants 454 data is expected.

    Plant sequences are highly repetitive. The 454 assembly running time is porportional to the degree of repeats in the data set.

    Typically, for bacterial data of your size, it takes only couple of hours to finish. But for plants, it can go on to several days, or not finishing at all, and our of memory crash.

    If you do pre-processing removing the repetitive reads in your data, it may help to get results faster and maybe better contigs to start with. Generally, plants are tough on bioinformatics for de novo assembly.
    Last edited by hlu; 01-02-2009, 02:27 PM.

    Comment

    • Raj
      Member
      • Jan 2009
      • 15

      #3
      Hi,
      I've been working with much smaller genomes, bacterial approx. 4.5mb in size, 1.6million assembled reads. Using Newbler version 2.0, 64bit, checking the 'complex large genome' tab it took approx. 40min to perform the de novo assembly.

      As mentioned in the previous post, plant genomes are alot more of a headache bioinformatically and require a hefty amount of processing time. But 65h + does seem alot, when compared to the bacterial genome. Check with Roche as newbler may be RAM dependent, up'ing it may speed up the assembly?!??!?

      Comment

      • jnfass
        Member
        • Aug 2008
        • 88

        #4
        Thanks Raj -- I should have noted that I think I sounded the alarm too soon; my runs are finishing in several days ... it just appeared for a while that there was no progress and I was unfamiliar with newbler's behavior. I'm using the '-m' flag to keep all reads in memory, which should speed up the runs ... and they appear to be maxing out at ~10G.

        I've also removed reads that blasted well to RepBase's various plant libraries, and am re-assembling, but unfortunately haven't been timing the assembly runs exactly ... if I get a chance to benchmark raw and no-repeat assemblies against each other, I'll try to post results here.

        Comment

        • AAWT
          Junior Member
          • Jun 2011
          • 6

          #5
          Hi,

          I recently got some data for transcriptome sequencing by 454, and want to analyze by using CLC workbench, for the de novo assembly, I got huge difference by only changing the minimum length of contigs, this confused me for further analysis, did anyone used the same, what are the recommendations???

          Comment

          • sklages
            Senior Member
            • May 2008
            • 628

            #6
            Originally posted by AAWT View Post
            Hi,

            I recently got some data for transcriptome sequencing by 454, and want to analyze by using CLC workbench, for the de novo assembly, I got huge difference by only changing the minimum length of contigs, this confused me for further analysis, did anyone used the same, what are the recommendations???
            well, first, please don't hijack threads, open a new one.

            Concerning your concern :-) ... if you raise the minimum contig length to something longer than the length of a substantial number of contigs in your assembly, then of course this influences the overall result.
            E.g. if you have a lot of very short fragments of 300bp and you raise the minimum contig length to 350bp, then you will loose a lot of contigs ..

            Did I get you right?

            hth, Sven

            Comment

            • AAWT
              Junior Member
              • Jun 2011
              • 6

              #7
              Originally posted by sklages View Post
              well, first, please don't hijack threads, open a new one.

              Concerning your concern :-) ... if you raise the minimum contig length to something longer than the length of a substantial number of contigs in your assembly, then of course this influences the overall result.
              E.g. if you have a lot of very short fragments of 300bp and you raise the minimum contig length to 350bp, then you will loose a lot of contigs ..

              Did I get you right?

              hth, Sven
              Yes this is also one concern to loose some data and the other thing which I did was the local blast of de novo assembled contigs with already available reference sequence, what I got,,,,,,,with the same ref seq unigene many contigs aligned which give the confusion that many contigs have same seuence, so why the same seq include in as many contigs during de novo assembly,,,,,,,,what does it mean that the assembly is very poor,,,,,,,,,,or or or,,,,,,????

              Comment

              • sklages
                Senior Member
                • May 2008
                • 628

                #8
                Originally posted by AAWT View Post
                Yes this is also one concern to loose some data and the other thing which I did was the local blast of de novo assembled contigs with already available reference sequence, what I got,,,,,,,with the same ref seq unigene many contigs aligned which give the confusion that many contigs have same seuence, so why the same seq include in as many contigs during de novo assembly,,,,,,,,what does it mean that the assembly is very poor,,,,,,,,,,or or or,,,,,,????
                It does not necessarily mean that your contig sequences are identical;
                probably they are very similar, *almost* identical. Depending on the
                kind of assembler these are put together or, in your case, not.
                CLC is not really a cDNA denovo Assembler and quality of the results
                obtained may vary.

                And, did you trim your data (polyA, potential adaptors)? This will influence
                your assembly as well.

                Last but not least, to give you a kind of feeling for your dataset,
                try to use another assembler, at least as a "reference assembly",
                e.g. Roche's Newbler or MIRA.
                However, if your dataset is huge and the library is not normalised you may
                run into problems with most straight forward assembly approaches.

                hth, Sven

                Comment

                Latest Articles

                Collapse

                • SEQadmin2
                  Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                  by SEQadmin2


                  I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                  Here are nine questions we think about, in roughly the order they matter, before...
                  06-18-2026, 07:11 AM
                • SEQadmin2
                  From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                  by SEQadmin2


                  Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                  The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                  ...
                  06-02-2026, 10:05 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by SEQadmin2, Yesterday, 11:10 AM
                0 responses
                7 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 06-17-2026, 06:09 AM
                0 responses
                42 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 06-09-2026, 11:58 AM
                0 responses
                104 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 06-05-2026, 10:09 AM
                0 responses
                125 views
                0 reactions
                Last Post SEQadmin2  
                Working...