SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
The best genome de novo assembly software using hybrid data (Illumina, 454 & Sanger)? Godevil De novo discovery 36 08-01-2012 02:25 AM
How to generate de novo assembly sequence from complete genomics data? ssnowfox Bioinformatics 2 04-19-2012 09:34 PM
Mapping vs De Novo & Masking or not Eric Fournier General 0 07-05-2011 07:49 AM
Introductions - High volume data acquisition and analysis msuraweera Introductions 4 07-05-2010 05:48 AM
de novo assembly on 3.5Gbp data -- computational speed!!?? jtjli Bioinformatics 9 06-03-2009 10:53 AM

Reply
 
Thread Tools
Old 04-17-2012, 11:15 PM   #1
moinul
Member
 
Location: Bangladesh

Join Date: May 2011
Posts: 10
Default De novo assembly: raw data type & volume

Hi,
I'm trying to assemble 454 raw reads (gDNA) with Newbler 2.6. Can anyone tell me what is the maximum volume of raw data (nt) newbler could intake in one-step or incremental form (as it could assemble large genome of up to 3Gb in size) ? Also the proportion of shotgun and mate-paired reads we should use in order to have a better assembly ?
moinul is offline   Reply With Quote
Old 04-19-2012, 04:18 AM   #2
bioBob
Member
 
Location: Virginia

Join Date: Mar 2011
Posts: 72
Default

I have done some assemblies with some pretty large data sets, in the 40-50 Gb range. With the large and het options, I can get an assembly, without, they simply never finish. By never, I mean, after 6 weeks of processing, no updates of the status files for several weeks. I did these on machine with 1 TB RAM. I tried various incremental assemblies and different parameters and essentially got to the same place as when I presented Newbler with all the data. I didn't see any improvements with the CIO options.
bioBob is offline   Reply With Quote
Old 04-22-2012, 08:50 PM   #3
moinul
Member
 
Location: Bangladesh

Join Date: May 2011
Posts: 10
Default

Thanks Bob........did you use '-m' option or others advanced options? And also trim the dataset? I had the trouble when tried to feed the trimmed and split 454 mate-paired reads. Because Newbler couldn't detect them as mate-paired though it is not seen for Illumina paired-end reads after trimming.

And for a large eukaryotic genome, say 3Gb in size, the 40-50Gb dataset you used covers only 16-17(x) of the whole genome that might not be quite enough whereas, for a 300Mb genome the figure reached upto 167(x)! So, is there any rule that what coverage we should initially use while trying to assembling a large genome?
moinul is offline   Reply With Quote
Old 04-23-2012, 04:17 AM   #4
bioBob
Member
 
Location: Virginia

Join Date: Mar 2011
Posts: 72
Default

Hi,
yes, we do quality and contaminant trimming. Newbler looks for the linker, so if you mean you are splitting the reads and removing the linker, that doesn't work. Or at least, didn't last time I did that.
As for rules of thumb, I always refer to the Broad's guidelines. Which often don't work, but you have to start somewhere.
And yes, we did try the -m an other options. I probably tried about 20-30 different combinations.
bioBob is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 01:47 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO