SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
How to Improve Newbler Assembly shuang Bioinformatics 2 09-13-2011 09:45 PM
Newbler de novo assembly moinul De novo discovery 3 05-27-2011 05:13 PM
PubMed: The Complete Chloroplast Genome Sequence of Date Palm (Phoenix dactylifera L. Newsbot! Literature Watch 0 09-22-2010 02:00 AM
PubMed: Chloroplast genome sequences from total DNA for plant identification. Newsbot! Literature Watch 0 08-28-2010 10:30 AM
Newbler de novo assembly and repeats wiart De novo discovery 2 08-19-2009 12:28 PM

Reply
 
Thread Tools
Old 07-10-2009, 03:14 PM   #1
RajAgainstTheMachine
Junior Member
 
Location: San Francisco, CA

Join Date: Jul 2009
Posts: 6
Default Newbler Assembly on Chloroplast Genome

We have sequenced a chloroplast genome approximately 150kbp. We did about 100k 454 Titanium reads for approximately 30 million bases and 200x depth. However, when we run the assembly on Newbler (default parameters) we get roughly 10k contigs, the largest contig size is ~15kbp. My questions are:

1. What can we do to improve this assembly? What are the typical steps in this kind of situation?

2. Are these results typical? With this kind of depth, we were expecting a complete, or nearly complete, assembly result.

Thanks very much in advance for your help.
RajAgainstTheMachine is offline   Reply With Quote
Old 07-10-2009, 03:18 PM   #2
ECO
--Site Admin--
 
Location: SF Bay Area, CA, USA

Join Date: Oct 2007
Posts: 1,358
Default

Winner of the SEQanswers username competition!
ECO is offline   Reply With Quote
Old 07-10-2009, 07:21 PM   #3
kmcarr
Senior Member
 
Location: USA, Midwest

Join Date: May 2008
Posts: 1,177
Default

RATM,

The answers to both of your questions are interrelated. The answer to question number 1 is to reduce the amount of data you are putting into the assembly. Assembly of 454 reads with newbler works best when the coverage is ~25-35X. 30 fold coverage of a 150kbp Cp genome would be ~ 4,500kbp of. I am assuming that you have the SFF file from the run and have access to the sff tools which come with the Roche 454 software. There is a program called sfffile which can manipulate, merge, split and subset sff files. To create a random subset of your data the command would be:
Code:
%sfffile -pick 4500k -o my_subset.sff my_input.sff
This will take your sff file containing all of your data (my_input.sff), and randomly select a number of reads from that file such that the total number of bases is approximately 4,500kbp; this subset will be saved in a new sff file named my_subset.sff.

Try making a number of these random subsets of data and running them through newbler. You can also try varying the size of the subsets from 20X-40X coverage to see what effect that has. I'll bet that the assemblies you get from these smaller data sets will be better than from the whole data set.

You will probably never get to a single contig in the first pass of assembly. Most Cp genomes have some repetitive sequences and any assembler will break an assembly when it cannot unambiguously place reads which lie in repetitive regions. This is where finishing comes in.
kmcarr is offline   Reply With Quote
Old 07-12-2009, 10:44 PM   #4
Tuxido
Member
 
Location: Nijmegen, Netherlands

Join Date: Jun 2009
Posts: 22
Default

We also did a bacteria once, with coverage more than 100x. Initially assemblies led to many small fragments. The solution was as kmcarr says to split the data into random sets, first assemble these and then use the contigs that you got from these assemblies as the input for another assembly.

The last step we could however (at that time) not perform with Newbler because the maximum length for a "read" (in this case a contig from a subset) was 2000bp. Maybe they've changed this with last software update.
Tuxido is offline   Reply With Quote
Old 07-13-2009, 09:59 AM   #5
RajAgainstTheMachine
Junior Member
 
Location: San Francisco, CA

Join Date: Jul 2009
Posts: 6
Default

Thanks very much for the advice. I am trying a series of assemblies with smaller data sets as we speak.

If this works, my plan is to use Arachne to assemble the larger contigs that Newbler creates. Any thoughts on this strategy? Also, what are the better finishing programs out there? I am familiar with Consed and Autofinish. Anything better than that?
RajAgainstTheMachine is offline   Reply With Quote
Old 07-13-2009, 11:47 PM   #6
sklages
Senior Member
 
Location: Berlin, DE

Join Date: May 2008
Posts: 628
Default

Arachne is probably not a good choice for assembling some contig sequences created by newbler. In general it is a good idea, as mentioned by others, to create a good initial assembly of the data. This can be done bei either reducing the overall coverage of your input data as recommended or using a different assembler for your data.

If you want to use Arachne, use it on your input data (SFF), but I don't know if it can handle NGS data; an alternative to arachne would be celera assembler [1], which can handle titanium data.

A very good alternative for this size of project would be the MIRA assembler [2], you should give it a try ...

For finishing we are using either Consed (in most cases, large projects) or Gap4 (smaller projects like fosmids and/or BACs).
Gap5 is already available in a very early release; good for testing/playing not yet for production use. [3]

IMHO there is no good alternative to these two packages; if I am wrong let me know

cheers,
Sven

[1] = http://wgs-assembler.sourceforge.net/
[2] = http://chevreux.org/projects_mira.html
[3] = http://sourceforge.net/projects/staden/files/
sklages is offline   Reply With Quote
Old 07-14-2009, 10:41 AM   #7
RajAgainstTheMachine
Junior Member
 
Location: San Francisco, CA

Join Date: Jul 2009
Posts: 6
Default

Thanks to everyone for the assistance. All this help is almost as valuable as winning the SEQanswers username competition.
RajAgainstTheMachine is offline   Reply With Quote
Reply

Tags
454, assembly, chloroplast, newbler

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 04:11 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO