SEQanswers

Go Back   SEQanswers > Applications Forums > De novo discovery



Similar Threads
Thread Thread Starter Forum Replies Last Post
Newbler de novo assembly moinul De novo discovery 3 05-27-2011 05:13 PM
Current status de novo assembly 454 vs Solexa AlexB Bioinformatics 10 03-21-2010 04:06 PM
Newbler de novo assembly and repeats wiart De novo discovery 2 08-19-2009 12:28 PM
advice for de novo assembly of plant genome using 454 bio-x 454 Pyrosequencing 4 07-24-2009 10:05 AM
de novo 454 assembly strob Bioinformatics 8 01-21-2009 10:26 AM

Reply
 
Thread Tools
Old 12-22-2008, 09:57 AM   #1
jnfass
Member
 
Location: Davis, CA

Join Date: Aug 2008
Posts: 88
Default de novo 454 assembly w/ newbler ... how long?

I'm having some issues with a newbler assembly that I posted about in another forum (but probably should have posted here ... hopefully this isn't a ghost town!); essentially, though, my concern is this: can someone give me an idea of how long their assemly runs with newbler have taken? I've got ~1.7 million reads (N50 ~250bp) from a plant genome, and I have an assembly rum that's going on ~65 hours now ... is that normal, or excessive?

Any comments would be appreciated.

~Joe
jnfass is offline   Reply With Quote
Old 01-02-2009, 01:25 PM   #2
hlu
Member
 
Location: Branford, Connecticut

Join Date: Jan 2009
Posts: 32
Default

Hi Joe,

Your difficulty on assemling plants 454 data is expected.

Plant sequences are highly repetitive. The 454 assembly running time is porportional to the degree of repeats in the data set.

Typically, for bacterial data of your size, it takes only couple of hours to finish. But for plants, it can go on to several days, or not finishing at all, and our of memory crash.

If you do pre-processing removing the repetitive reads in your data, it may help to get results faster and maybe better contigs to start with. Generally, plants are tough on bioinformatics for de novo assembly.

Last edited by hlu; 01-02-2009 at 01:27 PM.
hlu is offline   Reply With Quote
Old 01-07-2009, 07:36 AM   #3
Raj
Member
 
Location: UK

Join Date: Jan 2009
Posts: 15
Default

Hi,
I've been working with much smaller genomes, bacterial approx. 4.5mb in size, 1.6million assembled reads. Using Newbler version 2.0, 64bit, checking the 'complex large genome' tab it took approx. 40min to perform the de novo assembly.

As mentioned in the previous post, plant genomes are alot more of a headache bioinformatically and require a hefty amount of processing time. But 65h + does seem alot, when compared to the bacterial genome. Check with Roche as newbler may be RAM dependent, up'ing it may speed up the assembly?!??!?
Raj is offline   Reply With Quote
Old 01-07-2009, 08:43 AM   #4
jnfass
Member
 
Location: Davis, CA

Join Date: Aug 2008
Posts: 88
Default

Thanks Raj -- I should have noted that I think I sounded the alarm too soon; my runs are finishing in several days ... it just appeared for a while that there was no progress and I was unfamiliar with newbler's behavior. I'm using the '-m' flag to keep all reads in memory, which should speed up the runs ... and they appear to be maxing out at ~10G.

I've also removed reads that blasted well to RepBase's various plant libraries, and am re-assembling, but unfortunately haven't been timing the assembly runs exactly ... if I get a chance to benchmark raw and no-repeat assemblies against each other, I'll try to post results here.
jnfass is offline   Reply With Quote
Old 06-20-2011, 05:39 AM   #5
AAWT
Junior Member
 
Location: Germany

Join Date: Jun 2011
Posts: 6
Default

Hi,

I recently got some data for transcriptome sequencing by 454, and want to analyze by using CLC workbench, for the de novo assembly, I got huge difference by only changing the minimum length of contigs, this confused me for further analysis, did anyone used the same, what are the recommendations???
AAWT is offline   Reply With Quote
Old 06-20-2011, 11:26 AM   #6
sklages
Senior Member
 
Location: Berlin, DE

Join Date: May 2008
Posts: 628
Default

Quote:
Originally Posted by AAWT View Post
Hi,

I recently got some data for transcriptome sequencing by 454, and want to analyze by using CLC workbench, for the de novo assembly, I got huge difference by only changing the minimum length of contigs, this confused me for further analysis, did anyone used the same, what are the recommendations???
well, first, please don't hijack threads, open a new one.

Concerning your concern :-) ... if you raise the minimum contig length to something longer than the length of a substantial number of contigs in your assembly, then of course this influences the overall result.
E.g. if you have a lot of very short fragments of 300bp and you raise the minimum contig length to 350bp, then you will loose a lot of contigs ..

Did I get you right?

hth, Sven
sklages is offline   Reply With Quote
Old 06-20-2011, 11:58 PM   #7
AAWT
Junior Member
 
Location: Germany

Join Date: Jun 2011
Posts: 6
Default

Quote:
Originally Posted by sklages View Post
well, first, please don't hijack threads, open a new one.

Concerning your concern :-) ... if you raise the minimum contig length to something longer than the length of a substantial number of contigs in your assembly, then of course this influences the overall result.
E.g. if you have a lot of very short fragments of 300bp and you raise the minimum contig length to 350bp, then you will loose a lot of contigs ..

Did I get you right?

hth, Sven
Yes this is also one concern to loose some data and the other thing which I did was the local blast of de novo assembled contigs with already available reference sequence, what I got,,,,,,,with the same ref seq unigene many contigs aligned which give the confusion that many contigs have same seuence, so why the same seq include in as many contigs during de novo assembly,,,,,,,,what does it mean that the assembly is very poor,,,,,,,,,,or or or,,,,,,????
AAWT is offline   Reply With Quote
Old 06-21-2011, 12:13 AM   #8
sklages
Senior Member
 
Location: Berlin, DE

Join Date: May 2008
Posts: 628
Default

Quote:
Originally Posted by AAWT View Post
Yes this is also one concern to loose some data and the other thing which I did was the local blast of de novo assembled contigs with already available reference sequence, what I got,,,,,,,with the same ref seq unigene many contigs aligned which give the confusion that many contigs have same seuence, so why the same seq include in as many contigs during de novo assembly,,,,,,,,what does it mean that the assembly is very poor,,,,,,,,,,or or or,,,,,,????
It does not necessarily mean that your contig sequences are identical;
probably they are very similar, *almost* identical. Depending on the
kind of assembler these are put together or, in your case, not.
CLC is not really a cDNA denovo Assembler and quality of the results
obtained may vary.

And, did you trim your data (polyA, potential adaptors)? This will influence
your assembly as well.

Last but not least, to give you a kind of feeling for your dataset,
try to use another assembler, at least as a "reference assembly",
e.g. Roche's Newbler or MIRA.
However, if your dataset is huge and the library is not normalised you may
run into problems with most straight forward assembly approaches.

hth, Sven
sklages is offline   Reply With Quote
Reply

Tags
454, assembly, de novo, newbler

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 10:44 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO