Seqanswers Leaderboard Ad

**jnfass** · 10-07-2009, 09:17 AM

@geschickten: I only know of ABySS, and as for parallelizing velvet, there was a post a little while ago on the velvet-users list by a Jeffrey Cook (http://www.jeffreycook.info/research) ...

@beelu: according to my sys admin guy, they're Sun X4600M2 systems .. the ones with 8 processor board slots (and quad-core, with 8 RAM slots per board) ... Intel might be a viable option within months or next year .. you might check out the Nehalem processors.

**geschickten** · 10-07-2009, 11:05 AM

Jnfass,

You say that you have done Velvet assemblies with > 100M reads (some paired-end) on a 512G machine; we know that Velvet is not a parallel assembler and you say that the Sun box ( I assume you run your assembly on the SUN machines you've mentioned) is multi-processor/core(s). Well my question is how are you or anybody for that matter use these non parallel software in a cluster or multi core/processor machines??? Do you know if all the 4/8 cores are being used by your software during assembly or it's just that you not using multicore machines!

**jnfass** · 10-07-2009, 11:13 AM

@geschickten: You're right - velvet isn't running parallel, either multi-threaded, or over MPI, or anything like that. So the number of processors is irrelevant. The total memory depends on the fact that there are eight 8G RAM sticks on each of 8 boards (I think), so 8^3 = 512G ...

**Zigster** · 10-28-2009, 08:40 AM

Originally posted by yvan.wenger View Post

Hello,
As an alternative, I am thinking to merge several assemblies, compare those that merge together if any, maybe keep a contig only if it appears in at least two different assemblies and so one... but everything needs to be done.

yes actually i believe the best set of contigs are scattered all over the parameter space in several assemblies. not sure how to retrieve them.

**francesco.vezzi** · 10-29-2009, 12:19 AM

Hi all,
I read this intresting topic. There are two main discussions the first is about the definitation of a proper tool able to assemble the trancriptome, the second is about the memory requirements when the data set is extremely big.

I'm really curious about the first part.... why you say that assembly the transcriptome is different from assmebly genome? why the actual instruments like velvet fail in assmebly transcriptome?

For what concerns the second part, I think that there is a general solution to this problem. From my experience and form what I have read in veltet user mailing list assemblers like velvet don't work well whith extremely large data sets. The trick usually is to work with a subset of 10% of the reads. Make multiple assemblyes of several random subsets and then merge toghether the results.

The main reason to to that (in my opinion) is the fact that tools like velvet and abyss build a de bruijin graph that is based on the number of different k-mers present in the subset. Enourmous data sets imply the presentce of an high number of errors. The errors make the de bruijin graph sparse and this is the reason qhy we create thousands of little contigs.

Best regards
Francesco

**scalabrin** · 11-02-2009, 02:36 PM

You might read this:

Page not available - PMC

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2648799/?tool=pubmed

Parallel short sequence assembly of transcriptomes
Benjamin G Jackson, Patrick S Schnable, and Srinivas Aluru

the tool is now named YAGA (yet another genome assembler) and in the current version handles not only transcriptomic data but also genomic data. The best run was on 1024 nodes (each node with only 512Mb memory available).
The authors are trying to assemble the Mo17 line of maize (B73, sequenced clone-by-clone via sanger, just appeared online).

Simone

**bioinfosm** · 11-02-2009, 03:03 PM

I tried working with this iterative approach as well, taking a small subset and assembly, followed by merging. But the results did not scale that easily, as a lot of reads from the random sampling were left unassembled!

Did you happen to make much headway?

sm

Originally posted by francesco.vezzi View Post

Hi all,
I read this intresting topic. There are two main discussions the first is about the definitation of a proper tool able to assemble the trancriptome, the second is about the memory requirements when the data set is extremely big.

I'm really curious about the first part.... why you say that assembly the transcriptome is different from assmebly genome? why the actual instruments like velvet fail in assmebly transcriptome?

For what concerns the second part, I think that there is a general solution to this problem. From my experience and form what I have read in veltet user mailing list assemblers like velvet don't work well whith extremely large data sets. The trick usually is to work with a subset of 10% of the reads. Make multiple assemblyes of several random subsets and then merge toghether the results.

The main reason to to that (in my opinion) is the fact that tools like velvet and abyss build a de bruijin graph that is based on the number of different k-mers present in the subset. Enourmous data sets imply the presentce of an high number of errors. The errors make the de bruijin graph sparse and this is the reason qhy we create thousands of little contigs.

Best regards
Francesco

**francesco.vezzi** · 11-02-2009, 11:44 PM

Until now I had good results using a subset of the generated reads. Your bad result can depend on several reasons.

I can try to suggest something like filter the low quality reads and trim the last bases of each read.
Another way can be to reuse the unassembled reads. You first generate a random set, you assemble it, then from the remaining reads you generate another random set and you put inside this all the unassembled reads of the first set.

I don't know if this works but is the only strategy that comes in my mind

Francesco

Originally posted by bioinfosm View Post

I tried working with this iterative approach as well, taking a small subset and assembly, followed by merging. But the results did not scale that easily, as a lot of reads from the random sampling were left unassembled!

Did you happen to make much headway?

sm

**lbkoerich** · 08-25-2010, 06:17 AM

Hi,

This forum is reaaly interesting and I have the same doubts about the best assembler. I'm really inclined to use Velvet or ABySS, but I'm curious about MIRA. Have you heard or used this assembler?

The thing is that I have 454 and Sollexa reads. I'm planning to assemble each alone and to do an hybrid assembly. I've heard that MIRA is a good assembler for hybrid reads and is a good transcriptome assembler. However the manual is 190 pages long and, before I read that, I would like to hear the opinion of someone who have actualy used this assembler.

I'm starting to think that I should do the assemblies in all three assemblers and then compare the results...

**francesco.vezzi** · 08-25-2010, 06:23 AM

Hi,
Mira is probably a good solution to your problem. Two month ago I attended a conference in which one of the speakers was the Mira's author.
The tool is really good the only bad side is the length of the manual!!!!

**Zigster** · 08-25-2010, 06:27 AM

Newbler is very good for 454 assembly

You can try to feed those Newbler contigs into Velvet as though they were reference seqs along with your Illumina reads - perhaps the new Columbus module will work better than -longReads used to (very poorly)

MIRA seems to demand all these calibration files that the sequencing people usually throw away. Finding the way to turn off these demands requires a good bit of study.

**lbkoerich** · 08-25-2010, 07:08 AM

Unfortunately I must rely on an OS assembler.

**BaCh** · 09-01-2010, 11:57 AM

Originally posted by Zigster View Post

Newbler is very good for 454 assembly
MIRA seems to demand all these calibration files that the sequencing people usually throw away. Finding the way to turn off these demands requires a good bit of study.

Ummm ... no, you don't need these calibration files (whichever file you have in mind) as MIRA does not read them.

B.

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, Yesterday, 11:49 AM	0 responses 13 views 0 likes	Last Post by seqadmin Yesterday, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 16 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 61 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News