Unconfigured Ad

**hlu** · 01-02-2009, 02:25 PM

Hi Joe,

Your difficulty on assemling plants 454 data is expected.

Plant sequences are highly repetitive. The 454 assembly running time is porportional to the degree of repeats in the data set.

Typically, for bacterial data of your size, it takes only couple of hours to finish. But for plants, it can go on to several days, or not finishing at all, and our of memory crash.

If you do pre-processing removing the repetitive reads in your data, it may help to get results faster and maybe better contigs to start with. Generally, plants are tough on bioinformatics for de novo assembly.

**Raj** · 01-07-2009, 08:36 AM

Hi,
I've been working with much smaller genomes, bacterial approx. 4.5mb in size, 1.6million assembled reads. Using Newbler version 2.0, 64bit, checking the 'complex large genome' tab it took approx. 40min to perform the de novo assembly.

As mentioned in the previous post, plant genomes are alot more of a headache bioinformatically and require a hefty amount of processing time. But 65h + does seem alot, when compared to the bacterial genome. Check with Roche as newbler may be RAM dependent, up'ing it may speed up the assembly?!??!?

**jnfass** · 01-07-2009, 09:43 AM

Thanks Raj -- I should have noted that I think I sounded the alarm too soon; my runs are finishing in several days ... it just appeared for a while that there was no progress and I was unfamiliar with newbler's behavior. I'm using the '-m' flag to keep all reads in memory, which should speed up the runs ... and they appear to be maxing out at ~10G.

I've also removed reads that blasted well to RepBase's various plant libraries, and am re-assembling, but unfortunately haven't been timing the assembly runs exactly ... if I get a chance to benchmark raw and no-repeat assemblies against each other, I'll try to post results here.

**AAWT** · 06-20-2011, 05:39 AM

Hi,

I recently got some data for transcriptome sequencing by 454, and want to analyze by using CLC workbench, for the de novo assembly, I got huge difference by only changing the minimum length of contigs, this confused me for further analysis, did anyone used the same, what are the recommendations???

**sklages** · 06-20-2011, 11:26 AM

Originally posted by AAWT View Post

Hi,

I recently got some data for transcriptome sequencing by 454, and want to analyze by using CLC workbench, for the de novo assembly, I got huge difference by only changing the minimum length of contigs, this confused me for further analysis, did anyone used the same, what are the recommendations???

well, first, please don't hijack threads, open a new one.

Concerning your concern :-) ... if you raise the minimum contig length to something longer than the length of a substantial number of contigs in your assembly, then of course this influences the overall result.
E.g. if you have a lot of very short fragments of 300bp and you raise the minimum contig length to 350bp, then you will loose a lot of contigs ..

Did I get you right?

hth, Sven

**AAWT** · 06-20-2011, 11:58 PM

Originally posted by sklages View Post

well, first, please don't hijack threads, open a new one.

Concerning your concern :-) ... if you raise the minimum contig length to something longer than the length of a substantial number of contigs in your assembly, then of course this influences the overall result.
E.g. if you have a lot of very short fragments of 300bp and you raise the minimum contig length to 350bp, then you will loose a lot of contigs ..

Did I get you right?

hth, Sven

Yes this is also one concern to loose some data and the other thing which I did was the local blast of de novo assembled contigs with already available reference sequence, what I got,,,,,,,with the same ref seq unigene many contigs aligned which give the confusion that many contigs have same seuence, so why the same seq include in as many contigs during de novo assembly,,,,,,,,what does it mean that the assembly is very poor,,,,,,,,,,or or or,,,,,,????

**sklages** · 06-21-2011, 12:13 AM

Originally posted by AAWT View Post

Yes this is also one concern to loose some data and the other thing which I did was the local blast of de novo assembled contigs with already available reference sequence, what I got,,,,,,,with the same ref seq unigene many contigs aligned which give the confusion that many contigs have same seuence, so why the same seq include in as many contigs during de novo assembly,,,,,,,,what does it mean that the assembly is very poor,,,,,,,,,,or or or,,,,,,????

It does not necessarily mean that your contig sequences are identical;
probably they are very similar, *almost* identical. Depending on the
kind of assembler these are put together or, in your case, not.
CLC is not really a cDNA denovo Assembler and quality of the results
obtained may vary.

And, did you trim your data (polyA, potential adaptors)? This will influence
your assembly as well.

Last but not least, to give you a kind of feeling for your dataset,
try to use another assembler, at least as a "reference assembly",
e.g. Roche's Newbler or MIRA.
However, if your dataset is huge and the library is not normalised you may
run into problems with most straight forward assembly approaches.

hth, Sven

Topics	Statistics	Last Post
Large-Scale Protein Screen Uncovers Hidden Regulators of Alternative Polyadenylation by SEQadmin2 Started by SEQadmin2, Yesterday, 11:10 AM	0 responses 7 views 0 reactions	Last Post by SEQadmin2 Yesterday, 11:10 AM
Whole-Genome Sequencing Traces Faroe Islands Ancestry to a North Atlantic Founder Population by SEQadmin2 Started by SEQadmin2, 06-17-2026, 06:09 AM	0 responses 42 views 0 reactions	Last Post by SEQadmin2 06-17-2026, 06:09 AM
Sequencing the Two-Toed Sloth Genome Reveals Jumping Genes Tied to Its Extreme Metabolism by SEQadmin2 Started by SEQadmin2, 06-09-2026, 11:58 AM	0 responses 104 views 0 reactions	Last Post by SEQadmin2 06-09-2026, 11:58 AM
A New Method Makes Hantavirus Genome Analysis Faster and More Accessible by SEQadmin2 Started by SEQadmin2, 06-05-2026, 10:09 AM	0 responses 125 views 0 reactions	Last Post by SEQadmin2 06-05-2026, 10:09 AM

Unconfigured Ad

de novo 454 assembly w/ newbler ... how long?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News