SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Segmentation fault (core dumped) at contig step during SOAP denovo assembly tangzhonghui Bioinformatics 1 10-09-2012 05:32 PM
Improving contig sizes for denovo GC-rich assembly allthestairs Illumina/Solexa 2 10-17-2011 12:50 PM
SRMA Problem SAMRecord contig does not match the current reference sequence contig gavin.oliver Bioinformatics 5 07-05-2011 05:28 AM
Contig assembly Ashu Bioinformatics 3 03-08-2011 04:16 AM
Contig assembly file format cag12 General 0 06-18-2010 08:18 AM

Reply
 
Thread Tools
Old 08-06-2010, 06:09 AM   #1
yh_gu
Junior Member
 
Location: china

Join Date: Apr 2010
Posts: 4
Default contig assembly

Hello,

We've de novo assembled our RNA-Seq reads (about 50 millions 275 reads) into contigs by several de novo assemblers with different parameters. Most of the contigs we’ve get were very short due to the poor sequencing quality and the low sequencing depth. The contigs from each assmbler under different parameters varied from eachother, but some of them had overlaps. So these contigs may be assembled into longer contigs. The problem is that we couldn't assemble millions of contigs into supercontigs manually. Moreover, our computer resources were very low (12G RAM, 8 core CPU, 500G spaces). Is anyone knows how to assmeble these contigs into longer contigs with our limited computer resources, and which software could handle the assemble task.

Thanks

YH-GU

Last edited by yh_gu; 08-07-2010 at 01:57 AM.
yh_gu is offline   Reply With Quote
Old 08-06-2010, 06:20 AM   #2
Zigster
(Jeremy Leipzig)
 
Location: Philadelphia, PA

Join Date: May 2009
Posts: 116
Default

If you have enough memory to assemble reads into contigs then you clearly have enough memory to assemble contigs into supercontigs, as that is an easier feat.

When an assembler produces contigs that ostensibly overlap and yet remain separate there is some likely path ambiguity that has not been resolved. It sounds like you'll need more paired-end sequence for your assembly to coalesce.
__________________
--
Jeremy Leipzig
Bioinformatics Programmer
--
My blog
Twitter
Zigster is offline   Reply With Quote
Old 08-06-2010, 12:56 PM   #3
natstreet
Member
 
Location: Sweden

Join Date: Nov 2009
Posts: 83
Default

The last is certainly true - but can anyone recommend the most suitable software tool for the task? I am currently looking to do something similar and so far have only tried PAVE (which died with an error message that I'm tracking down). In my case I have performed de novo assembly on a number of genotypes of the same species and now I want to merge those together, identify SNPs and see if longer ESTs can be made by merging contigs across the per-genotype assemblies.

So, what are the current favourite tools for merging large numbers of contigs coming from de novo transcript assemblies of short read data?
natstreet is offline   Reply With Quote
Old 08-07-2010, 02:06 AM   #4
yh_gu
Junior Member
 
Location: china

Join Date: Apr 2010
Posts: 4
Default

Quote:
Originally Posted by Zigster View Post
When an assembler produces contigs that ostensibly overlap and yet remain separate there is some likely path ambiguity that has not been resolved. It sounds like you'll need more paired-end sequence for your assembly to coalesce.
The overlaps I've mentioned mainly refer to the contigs that produced by different assembler. So, we want to find a suitable software to assemble them longer.
yh_gu is offline   Reply With Quote
Old 08-08-2010, 08:08 AM   #5
jmw86069
Member
 
Location: RTP, NC, USA

Join Date: Jun 2009
Posts: 28
Default

Velvet seems to work fairly well with contig assemblies in my hands, though as Zigster pointed out, the assembly path ambiguity will ultimately prevent use of as much productive overlaps as you'd suspect because there will be discrepancies across assemblers in just the wrong places per contig.

It may be interesting to "go conservative" with different assemblers' contigs by trimming away their weakpoints. E.g. maybe try trimming away low quality ends of contigs to minimize including ambiguous sequence spans in your secondary assembly. Otherwise you'd expect to get good overlaps in the middle of the contigs but not good alignments at the ends. But each assembler has its challenge area, so you may want to deal with each one in its own way. At some point we (the collective) should put together some cross-assembler lessons learned, and maybe pre-configurations that help tools like Velvet use each assembler's strengths more natively.

Velvet does have numerous options to tweak though, which I think gives it promise, and you can try "oases" which is a layer on top of Velvet which is intended to allow for splice variants. Marcel Schulz and Daniel Zerbino seem to have put together a very useful (and timely) toolsuite for this type of work. Kudos to them, and thanks to them as well for providing it as they continue perfecting it.
jmw86069 is offline   Reply With Quote
Reply

Tags
de nove assembly, rna-seq

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 01:10 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO