Which assembler should i go for if i want to assemble genome of 1 -1.5 GB size. i have illumina paired end and mate pair reads of 101 bp length. how can i use mate paire reads for Scaffolding?
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
I'd try Allpaths-LG (http://www.broadinstitute.org/softwa...paths-lg/blog/) if your paired end reads mostly overlap (i.e. fragment size of ~180 bp). It will use your mate pairs for scaffolding.
-
96 GB can be a bit tight, 24 cores should be fine - I'd expect the assembly to run for up to 3 days. If you error correct and normalize the paired end reads prior to assembly (with e.g. BBNorm http://seqanswers.com/forums/showthread.php?t=49763) you typically reduce memory usage for the assembly.
Comment
-
We have 100-200Mbp fungal assemblies that run out of memory (with AllPaths-LG) on 128GB nodes, but complete on 256GB nodes. I'm guessing memory may be a serious problem; you probably are going to need more.
Megahit is fast and seems to have a relatively low memory consumption, and Minia was designed for low memory consumption, so if AllPaths fails you might try those. Or, buy more memory, which will be essential if you plan to routinely assemble large genomes.
Comment
-
Allpaths-LG is a good option if you have enough RAM and CPUs. Also I wonder whether one of your PE libraries are overlapping i.e. from Allpaths-LG doc "average separation size must be slightly less than twice the read size, such that the reads from a pair will likely overlap".
Comment
-
Thank you all.
I think i have to go for SGA or minia due to lack of memory. is it a good option to use paired end reads for assembly and then go for scaffolding with mate pair data.?
which tool would be suitable for 101 bp mate pair data for scaffolding?
Comment
-
Originally posted by Pinal View PostThank you all.
I think i have to go for SGA or minia due to lack of memory. is it a good option to use paired end reads for assembly and then go for scaffolding with mate pair data.?
which tool would be suitable for 101 bp mate pair data for scaffolding?
Comment
Latest Articles
Collapse
-
by seqadmin
Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...-
Channel: Articles
04-04-2024, 04:25 PM -
-
by seqadmin
Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...-
Channel: Articles
03-22-2024, 06:39 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 04-11-2024, 12:08 PM
|
0 responses
27 views
0 likes
|
Last Post
by seqadmin
04-11-2024, 12:08 PM
|
||
Started by seqadmin, 04-10-2024, 10:19 PM
|
0 responses
30 views
0 likes
|
Last Post
by seqadmin
04-10-2024, 10:19 PM
|
||
Started by seqadmin, 04-10-2024, 09:21 AM
|
0 responses
26 views
0 likes
|
Last Post
by seqadmin
04-10-2024, 09:21 AM
|
||
Started by seqadmin, 04-04-2024, 09:00 AM
|
0 responses
52 views
0 likes
|
Last Post
by seqadmin
04-04-2024, 09:00 AM
|
Comment