SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
hello and test message SrCardgage Introductions 0 02-06-2012 01:41 PM
[Velvet,assembly] core dumped occured by runnning velvet matador3000 De novo discovery 0 12-17-2011 07:31 AM
The test for cuffmerge fgh1124 Bioinformatics 5 07-28-2011 01:45 AM
Admixture Test Summary KerryOdair Personalized Genomics 0 11-08-2010 08:02 AM
Trying to test the aligners Poshi Bioinformatics 6 05-14-2010 08:19 AM

Reply
 
Thread Tools
Old 04-07-2010, 10:28 PM   #1
strob
Member
 
Location: Belgium

Join Date: Nov 2008
Posts: 79
Default velvet test

Dear all,

I just received 1 Flow cell of illumina plant sequencing data (75bp PE reads, ~2 X 26 million reads/lane). As we don't have in most cases a reference sequence available, I will have to do some de novo assembly.
I started doing some tests using velvet. First I did an assembly on 1 lane, next I combined 2 lanes. In att you can see the virtual memory that was used during this process (I only show the velvetg part as this was the hardest to do) (horizontal you see the time needed, vertical you have the G of RAM needed). As you can see, the more datasets you combine, the more memory that you need (additive). As I have 14 sets to combine, I will never be able to perform this assembly using velvet (2 lanes already required 91G of RAM).

My questions:
- is there an other way to use velvet (to reduce this memory issue)?
- Are there other (well performing) assembly tools that use less memory? (I tested the CLCBio assembly tool and this one requires much less memory. But, this is of course a commercial tool)
- All suggestions are welcome

Thanks

Steven
Attached Images
File Type: jpg Untitled-1.jpg (69.2 KB, 71 views)
strob is offline   Reply With Quote
Old 04-08-2010, 11:44 AM   #2
wangchy
Junior Member
 
Location: Auburn

Join Date: Mar 2010
Posts: 3
Default

What kind of data are you working, transcriptome or random genome sequencing.
wangchy is offline   Reply With Quote
Old 04-11-2010, 11:32 PM   #3
strob
Member
 
Location: Belgium

Join Date: Nov 2008
Posts: 79
Default

genome sequencing
strob is offline   Reply With Quote
Old 04-12-2010, 01:31 PM   #4
Jonathan
Member
 
Location: Germany

Join Date: Jun 2009
Posts: 36
Default

Well, you could potentially denovo each lane on its own,
and then add the contigs as 'long' sequence type for subsequent runs;

Just an idea, I'd yet have to try it myself;
Only problem I see: You loose coverage information for resolving 'bubbles';
I'm not sure how exactly 'long' type sequence data is handled in the velvet algorithm...

best
-Jonathan
Jonathan is offline   Reply With Quote
Old 04-12-2010, 11:09 PM   #5
jvhaarst
Member
 
Location: Netherlands

Join Date: Sep 2008
Posts: 13
Default Curtain

You could give Curtain a try.
From the wiki:
Curtain is a Java wrapper around next-generation assemblers such as Velvet which allows the incremental introduction of read-pair information into the assembly process. This enables the assembly of larger genomes than would otherwise be possible within existing memory constraints.
jvhaarst is offline   Reply With Quote
Old 04-25-2010, 10:06 PM   #6
Haneko
Member
 
Location: Singapore

Join Date: Jan 2010
Posts: 36
Default

Hi,

I'm also trying out velvet with Illumina paired-end reads. A few problems I'm facing right now:

1. What should the input be like? I have read 1 and read 2 for the paired-end reads. Do I combine them into one file?

2. How do I run velvetg? The manual states that if insert size is not specified it will attempt to measure it for me. If that's the case do I still need to put -ins_length in my command? I do not know both the expected coverage and the insert size.

3. While testing, I can't seem to direct the console output of velvet into any file. So for example:
velvetg velvet_data/ -exp_cov auto -min_contig_lgth 100 &> velvetg.out

This give no output whatsoever in the velvetg.out file, neither on the console.
Haneko is offline   Reply With Quote
Old 04-26-2010, 10:16 PM   #7
Jonathan
Member
 
Location: Germany

Join Date: Jun 2009
Posts: 36
Default

Quote:
Originally Posted by Haneko View Post
1. What should the input be like? I have read 1 and read 2 for the paired-end reads. Do I combine them into one file?
If you had read the manual, you'd know:
Velvet expects paired-end data to be in an interleaved format;
aka
read1
read1pe
read2
read2pe
....

There's a tool/script for this shipped with velvet.

Quote:
Originally Posted by Haneko View Post
2. How do I run velvetg? The manual states that if insert size is not specified it will attempt to measure it for me. If that's the case do I still need to put -ins_length in my command? I do not know both the expected coverage and the insert size.
a) Expected coverage can be left for velvet with the '-exp_cov auto' switch.
b) Insert size is (usually - unless you have some other library prep) 200bp, +/- 10%; 10% is what velvet uses as default afair, just set the 200 and see if it works out.

Quote:
Originally Posted by Haneko View Post
3. While testing, I can't seem to direct the console output of velvet into any file. So for example:
velvetg velvet_data/ -exp_cov auto -min_contig_lgth 100 &> velvetg.out

This give no output whatsoever in the velvetg.out file, neither on the console.
Hm. Have you tried getting the different channels?
It works on my end:
velvetg velvet_data/ -exp_cov auto -min_contig_lgth 100 2> velvetg.err.out 1> velvetg.std.out

Best
-Jonathan
Jonathan is offline   Reply With Quote
Old 04-26-2010, 10:22 PM   #8
Haneko
Member
 
Location: Singapore

Join Date: Jan 2010
Posts: 36
Default

Thanks! I think the '&' somehow couldn't work the usual way it did. don't know why though.
Haneko is offline   Reply With Quote
Old 04-26-2010, 10:31 PM   #9
francesco.vezzi
Member
 
Location: Udine (Italy)

Join Date: Jan 2009
Posts: 50
Default

Hi
I think that the best way to deal with this huge amount of data is use one between SOAPdenovo and ABySS. With them I'm able to assembly 16 illumina lanes with less then 80Giga ram memory.


Francesco
francesco.vezzi is offline   Reply With Quote
Old 04-29-2010, 12:38 AM   #10
isharon
Junior Member
 
Location: Haifa, Israel

Join Date: Feb 2010
Posts: 4
Default

Hi Francesco,

Could you please provide a time estimate regarding how long it took you? Also some more details about the genome size etc would be great. I need to assemble 6 Illumina lanes and was wondering whether SOAPDenovo would be a reasonable choice for that.
isharon is offline   Reply With Quote
Old 04-29-2010, 12:44 AM   #11
francesco.vezzi
Member
 
Location: Udine (Italy)

Join Date: Jan 2009
Posts: 50
Default

I'm assembling a grapevine clone. The reference genome length is 400MB. SOAPdenovo is divided in several step. The read correction takes approximately 1 day. The denovo step takes 5 hours while the scaffolding step takes half day... I'm working on a server with 120Giga of RAM and 8 CPU. The ram peak is more or less 60Giga.
Abyss takes 7-8 hours using 8 machines with 8 CPU each and with 30 Giga of ram each.

The moral is: a lot of ram and a lot of CPU and wait

Hope this can help

Francesco
francesco.vezzi is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 02:23 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO