SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Reply
 
Thread Tools
Old 12-13-2010, 11:52 AM   #1
azroger
Junior Member
 
Location: Tucson, AZ

Join Date: Oct 2010
Posts: 7
Smile Plantagora

Hi All-

I'm new, but I thought it would be good to let people know about Plantagora, which is a project that I've been part of for the past year. It's purpose is to find the best approaches to sequencing a new genome using next gen sequencing and whole genome assembly. It is oriented towards plant genomes, but for the most part, the information, tools, etc. applies to all species. It's inspiration was the realization that even with a lot of good sequencing coverage, it can still be difficult or impossible to come up with a good genome sequence.

For the Plantagora project, we created simulated reads modeling those from the Illumina or 454 sequencing platforms. The source of the sequences was primarily rice chromosome one, but we also used some whole plant genomes, also. We used several different assemblers, depending on the data, e.g. Newbler, ABySS, and SOAPdeNovo. The resulting assemblies are evaluated using a very long list of metrics, some being statistics about the contigs and scaffolds, others are derived by alignment to the original sequence to provide various metrics about the fidelity of the assemblies.

The results of these studies, of which there are thousands, are entered into a database that is available for download. There is also a graphing tool, so that you can generate custom graphs from the data. The tools used to create the data are also posted. All of this is more or less now available on our new website: plantagora.org (http://www.plantagora.org/) We hope people will make use of it, because that's what it's there for! It was funded by NSF, but is now being taken over by the iPlant Collaborative, another NSF-funded project. It should be of great use to those considering a new genome sequencing project, and those of you working on whole genome assembly.
azroger is offline   Reply With Quote
Old 04-14-2011, 12:13 PM   #2
flobpf
Member
 
Location: USA

Join Date: Apr 2010
Posts: 76
Thumbs up Nice tool

Plantagora is a really useful tool for simulations. I want to use the scripts for denovo assembling a genome though.If I provide the assembly_run.sh script with my own sets of reads, would it assemble it into a regular assembly?

Also, does the Plantagora use Abyss for hybrid Illumina+454 assembly? How do I set it up for that?

Thanks
Flobpf
flobpf is offline   Reply With Quote
Old 04-18-2011, 11:27 AM   #3
azroger
Junior Member
 
Location: Tucson, AZ

Join Date: Oct 2010
Posts: 7
Default

Hi-
Thanks for the comment. The assembly_run.sh (which I didn't write) was designed to work with the Plantagora datasets to run multiple assembly runs. It may be most useful to look at the script and (if you can -- I'm not exactly expert at this) either edit it to your needs, or try using it with some of your own datasets with your own inputs and settings. In the end, though, if you're not doing a lot of different runs, then you can take the commands as they are written in the script and put in your own settings as you want and run the assemblies directly. For example, for abyss, the command in there is time mpirun -np 4 abyss-pe $params name=$header. You can leave out the time command if you don't want to time it, and in some cases you may not want or need to use mpi for a parallel run (which in this example is set to run on 4 processors. You have to have openmpi installed to do it. I have been studying Abyss and it has a lot of subprograms that it uses, one of which is abyss-pe. Abyss has to be installed and abyss-pe in the path environment to run the command. Otherwise you can try running it with just --help and it will tell you about the options. The options set in the file as it is distributed on the website (I think) are -j2 n=2 k=$k, where k is the kmer size which is something you may want to try to optimize, because it can make a big difference. You may already know a lot of this, but some of it is not too obvious when you first look at Abyss.

The interesting thing about abyss-pe is that it is a makefile, and it can be edited and you can also run the commands it uses independently, because it really just runs through a series of commands that invoke some of the other subprograms that also have to be in the path environment for abyss-pe to run properly -- they are in a bunch of subfolders of the abyss install. I believe the default command series will be spit out if you give it the option --dry-run. You can break down the commands and even replace some of them with other aligners or mappers, like bowtie. I'm trying to figure out at this point how best to use this.

In any case, Plantagora uses Abyss for the hybrid Illumina+454 assemblies, and some of them produce scaffolds even over 100,000 bp, although the scaffoldN50's are considerably lower than this. Abyss is one of the few assemblers that can readily make use of the combined data. I have been told by another group that you can convert Illumina reads to .sff files and use them with Newbler. They had trouble running the combo so far, but that is because the memory usage is really heavy for this combination. I don't know how efficiently Newbler can use the smaller reads, either. It does not use a de Bruijn graph or kmers, like the small read assemblers generally do. But it may work fine under some conditions.
azroger is offline   Reply With Quote
Old 04-18-2011, 11:34 AM   #4
flobpf
Member
 
Location: USA

Join Date: Apr 2010
Posts: 76
Default

Thanks Roger. Thats answers a lot of my questions.
flobpf is offline   Reply With Quote
Reply

Tags
abyss, genome assembly, metrics, next gen sequencing, plant

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 05:20 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO