SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Benchmark (or experience) between SOAPdenovo, Velvet, Abyss, and ALLPATHS2 lcollado De novo discovery 26 04-02-2012 09:10 AM
[Velvet,assembly] core dumped occured by runnning velvet matador3000 De novo discovery 0 12-17-2011 07:31 AM
why can not download Abyss 1.3.2 elisadouzi Bioinformatics 0 12-13-2011 08:22 PM
TopHat or ABySS for transcriptome analysis? Ichinichi Bioinformatics 14 10-06-2010 06:37 PM
Abyss @ 454 joa_ds Bioinformatics 3 05-02-2010 05:40 AM

Reply
 
Thread Tools
Old 08-03-2009, 10:46 AM   #1
geschickten
Member
 
Location: India

Join Date: Jul 2009
Posts: 31
Default VELVET or ABYSS for Transcriptome

We are planning to use ABYSS and Velvet for de novo assembly on transcriptome data. Just wondering if the group can share their experience with either tool; also how does both compare? and which is the best tool available for the assembly of transcriptome data? Thank you...
geschickten is offline   Reply With Quote
Old 08-10-2009, 07:59 AM   #2
yvan.wenger
Member
 
Location: Switzerland

Join Date: Aug 2009
Posts: 30
Default De novo transcriptome assembly?

Hello,

I am wondering if you had any reply from your question concerning the best tool for the assembly of transcriptome... I am up to evaluate the tools but it seems that our draft genome gives an advantage to assemblies leading to short contigs as it has roughly 130'000 contigs (genomic then). As a consequence, the assembly with the best mapping to the genome is one with short contigs (otherwise large assembled contigs would jump from one genomic contig to another because those are quite shorts).

As N50 does not seem to be a good metric for transcriptomes, I was wondering what other measures/manip to use to rank the different assemblies. Also, I noted that both correct and wrong contigs can be found in all assemblies and that they are often different (you can find a correct contig that is only represented in a rather "bad" assembly for example). Given this, I am wondering if somebody in this forum as seen data on alternative methods to obtain good contigs without a good genome? I for instance just re-had a look on the Abyss paper (De novo Transcriptome Assembly with Abyss, Birol et al, Bioinformatics Advance Access published June 15 2009) and see there that they still assess their transcritpome assembly using the human genome. As an alternative, I am thinking to merge several assemblies, compare those that merge together if any, maybe keep a contig only if it appears in at least two different assemblies and so one... but everything needs to be done.

Any thoughts on all that? Or otherwise, is there a forum dedicated to this topic?

Best,

Yvan

Original message:
We are planning to use ABYSS and Velvet for de novo assembly on transcriptome data. Just wondering if the group can share their experience with either tool; also how does both compare? and which is the best tool available for the assembly of transcriptome data? Thank you...
yvan.wenger is offline   Reply With Quote
Old 08-10-2009, 08:54 PM   #3
geschickten
Member
 
Location: India

Join Date: Jul 2009
Posts: 31
Default

Yvan,

Well I haven't received any replies from the forum. I must admit I am new to this world of genomics and hence I may not be able to pass my comments on your observation.

I have not come across any forum dedicated to this topic.

Do let me know about your evaluation and if required we can even take this offline...
geschickten is offline   Reply With Quote
Old 08-13-2009, 08:48 AM   #4
wjeck
Member
 
Location: Chapel Hill, NC

Join Date: Mar 2009
Posts: 39
Default

all,

I believe that the short answer is: The proper tools are not publicly available yet. There is a wrong way to do this: assembling transcriptome data like it's genomic, and a right way: yet to be determined. I'm looking for pretty much the same thing and I can't seem to find it. The primary problem with assembling transcriptome data like it's genomic is that most transcriptome data sets have some genomic contamination, and they have alternative splicings. Both of these facts run counter to the assumptions of the genome assemblers, in which there is no alternative splicing (or at most two haplotype alternatives). If anyone is thinking about working on new assemblers for these new data sets PM me; I'm very interested in exploring the topic and maybe sitting down to write one.

Cheers,
--Will
wjeck is offline   Reply With Quote
Old 08-13-2009, 08:59 AM   #5
geschickten
Member
 
Location: India

Join Date: Jul 2009
Posts: 31
Default

Will,

I am willing to work on this and if you are okay then we can work together to design/develop an assembler for transcriptome data!

prahalad
geschickten is offline   Reply With Quote
Old 08-14-2009, 12:28 AM   #6
NSTbioinformatics
Member
 
Location: netherlands

Join Date: Apr 2009
Posts: 24
Default

I have used velvet and ABySS to assembly genomic sequences from Illumina reads. However velvet runs very slow and can not process 36507944 reads X36 nt + 95398944 reads X 76 nt on 32 G memory computer, it stoped due to memory problem. I don't know how to solve it.

From the paper De novo Transcriptome Assembly with Abyss, Birol et al, ABySS could assemble shotgun + pairedend runs together. I am wondering how it works. In the manual of ABySS, it only shows to assemble shotgun run and paired end run separatly.

I would like to hear from others about them
NSTbioinformatics is offline   Reply With Quote
Old 08-14-2009, 12:37 AM   #7
geschickten
Member
 
Location: India

Join Date: Jul 2009
Posts: 31
Default

NST, do you think you can share this papaer "De novo Transcriptome Assembly with Abyss, Birol et al"

-p
geschickten is offline   Reply With Quote
Old 08-14-2009, 01:09 AM   #8
jts
Member
 
Location: Cambridge

Join Date: Feb 2009
Posts: 22
Default

Hi NSTbioinformatics,

If you post the details of your problem on the abyss-users mailing list (http://www.bcgsc.ca/mailman/listinfo/abyss-users) Shaun Jackman or I can help you set up abyss for your data set. You will be able to assemble both single-end and paired-end reads in the same run but some care must be taken when choosing the assembly parameters.

Regards,
Jared Simpson
jts is offline   Reply With Quote
Old 08-14-2009, 05:08 AM   #9
kmcarr
Senior Member
 
Location: USA, Midwest

Join Date: May 2008
Posts: 1,177
Default

Here are the references for Abyss:

Simpson et al. ABySS: A parallel assembler for short read sequence data. Genome Res (2009) vol. 19 (6) pp. 1117-23

Birol et al. De novo Transcriptome Assembly with ABySS. Bioinformatics (2009) pp.
kmcarr is offline   Reply With Quote
Old 08-17-2009, 09:02 AM   #10
jnfass
Member
 
Location: Davis, CA

Join Date: Aug 2008
Posts: 88
Default

Quote:
Originally Posted by NSTbioinformatics View Post
I have used velvet and ABySS to assembly genomic sequences from Illumina reads. However velvet runs very slow and can not process 36507944 reads X36 nt + 95398944 reads X 76 nt on 32 G memory computer, it stoped due to memory problem. I don't know how to solve it.

From the paper De novo Transcriptome Assembly with Abyss, Birol et al, ABySS could assemble shotgun + pairedend runs together. I am wondering how it works. In the manual of ABySS, it only shows to assemble shotgun run and paired end run separatly.

I would like to hear from others about them
I've done velvet assemblies with > 100M reads (some paired-end) on a 512G machine ... yes, it does take a lot of memory ... but I'd be interested in hearing if ABySS is any better. My understanding is that these assemblers like to have the whole assembly graph in memory at once, and that's the roadblock to assembling in smaller RAM spaces (though, I've seen a few comments from people working on parallelizing one or the other program).

Before I had access to a large memory machine, I ran the single ended assembly first, then used those contigs as "long" reads to add to an assembly of the paired reads.

Velvet can definitely do single and paired reads together, and if you change a parameter before compiling, you can have an unlimited number of different paired read sets, each with different insert lengths.
jnfass is offline   Reply With Quote
Old 08-20-2009, 04:03 PM   #11
Zigster
(Jeremy Leipzig)
 
Location: Philadelphia, PA

Join Date: May 2009
Posts: 116
Default

Quote:
Originally Posted by NSTbioinformatics View Post
However velvet runs very slow and can not process 36507944 reads X36 nt + 95398944 reads X 76 nt on 32 G memory computer, it stoped due to memory problem. I don't know how to solve it.
I have had better luck with Velvet running at longer kmers and to a lesser extent higher coverage cutoffs. Apparently this is counter-intuitive given that there are 16x possible kmers of length 31 than say 29, but velvetg is much more likely to hit the wall at the shorter kmers.

I recently did a de novo transcriptome assembly of 100,425,440 72bp paired end reads totaling over 7,034,311,658 bp on a 256G machine but could not get below kmer 29 without crashing.

Fortunately velvet now accepts very large kmer lengths, so I would try those before giving up.
Zigster is offline   Reply With Quote
Old 10-05-2009, 05:59 AM   #12
beelu
Junior Member
 
Location: taiwan

Join Date: Mar 2008
Posts: 7
Default

Hi jnfass and Zigster, how do you build your machine to 512G/256G? How many CPU do you have and whats your RAM to core ratio? Thanks.

Beelu
beelu is offline   Reply With Quote
Old 10-05-2009, 06:08 AM   #13
Zigster
(Jeremy Leipzig)
 
Location: Philadelphia, PA

Join Date: May 2009
Posts: 116
Default

We use a Dell Poweredge something-or-other with 4 X7350 (16 cores total)
__________________
--
Jeremy Leipzig
Bioinformatics Programmer
--
My blog
Twitter
Zigster is offline   Reply With Quote
Old 10-05-2009, 10:02 AM   #14
geschickten
Member
 
Location: India

Join Date: Jul 2009
Posts: 31
Default

Quote:
Originally Posted by jnfass View Post
I've done velvet assemblies with > 100M reads (some paired-end) on a 512G machine ... yes, it does take a lot of memory ... but I'd be interested in hearing if ABySS is any better. My understanding is that these assemblers like to have the whole assembly graph in memory at once, and that's the roadblock to assembling in smaller RAM spaces (though, I've seen a few comments from people working on parallelizing one or the other program).

Before I had access to a large memory machine, I ran the single ended assembly first, then used those contigs as "long" reads to add to an assembly of the paired reads.

Velvet can definitely do single and paired reads together, and if you change a parameter before compiling, you can have an unlimited number of different paired read sets, each with different insert lengths.

Hi jnfass,

Can you please share some information on who's doing the work on parallelizing assemblers? Also kindly point to some good open source parallel assemblers if you know any.. thank you
geschickten is offline   Reply With Quote
Old 10-05-2009, 10:06 AM   #15
geschickten
Member
 
Location: India

Join Date: Jul 2009
Posts: 31
Default

Quote:
Originally Posted by Zigster View Post
I have had better luck with Velvet running at longer kmers and to a lesser extent higher coverage cutoffs. Apparently this is counter-intuitive given that there are 16x possible kmers of length 31 than say 29, but velvetg is much more likely to hit the wall at the shorter kmers.

I recently did a de novo transcriptome assembly of 100,425,440 72bp paired end reads totaling over 7,034,311,658 bp on a 256G machine but could not get below kmer 29 without crashing.

Fortunately velvet now accepts very large kmer lengths, so I would try those before giving up.
Hi Zigster,

Can you please share the exact configuration of the machine that you used to for this run. Also what's your take on if somebody allows you to run this in Cloud?? would you go for it?
geschickten is offline   Reply With Quote
Old 10-07-2009, 09:17 AM   #16
jnfass
Member
 
Location: Davis, CA

Join Date: Aug 2008
Posts: 88
Default

@geschickten: I only know of ABySS, and as for parallelizing velvet, there was a post a little while ago on the velvet-users list by a Jeffrey Cook (http://www.jeffreycook.info/research) ...

@beelu: according to my sys admin guy, they're Sun X4600M2 systems .. the ones with 8 processor board slots (and quad-core, with 8 RAM slots per board) ... Intel might be a viable option within months or next year .. you might check out the Nehalem processors.
jnfass is offline   Reply With Quote
Old 10-07-2009, 11:05 AM   #17
geschickten
Member
 
Location: India

Join Date: Jul 2009
Posts: 31
Default

Jnfass,

You say that you have done Velvet assemblies with > 100M reads (some paired-end) on a 512G machine; we know that Velvet is not a parallel assembler and you say that the Sun box ( I assume you run your assembly on the SUN machines you've mentioned) is multi-processor/core(s). Well my question is how are you or anybody for that matter use these non parallel software in a cluster or multi core/processor machines??? Do you know if all the 4/8 cores are being used by your software during assembly or it's just that you not using multicore machines!
geschickten is offline   Reply With Quote
Old 10-07-2009, 11:13 AM   #18
jnfass
Member
 
Location: Davis, CA

Join Date: Aug 2008
Posts: 88
Default

@geschickten: You're right - velvet isn't running parallel, either multi-threaded, or over MPI, or anything like that. So the number of processors is irrelevant. The total memory depends on the fact that there are eight 8G RAM sticks on each of 8 boards (I think), so 8^3 = 512G ...
jnfass is offline   Reply With Quote
Old 10-28-2009, 08:40 AM   #19
Zigster
(Jeremy Leipzig)
 
Location: Philadelphia, PA

Join Date: May 2009
Posts: 116
Default

Quote:
Originally Posted by yvan.wenger View Post
Hello,
As an alternative, I am thinking to merge several assemblies, compare those that merge together if any, maybe keep a contig only if it appears in at least two different assemblies and so one... but everything needs to be done.
yes actually i believe the best set of contigs are scattered all over the parameter space in several assemblies. not sure how to retrieve them.
__________________
--
Jeremy Leipzig
Bioinformatics Programmer
--
My blog
Twitter
Zigster is offline   Reply With Quote
Old 10-29-2009, 12:19 AM   #20
francesco.vezzi
Member
 
Location: Udine (Italy)

Join Date: Jan 2009
Posts: 50
Default

Hi all,
I read this intresting topic. There are two main discussions the first is about the definitation of a proper tool able to assemble the trancriptome, the second is about the memory requirements when the data set is extremely big.

I'm really curious about the first part.... why you say that assembly the transcriptome is different from assmebly genome? why the actual instruments like velvet fail in assmebly transcriptome?

For what concerns the second part, I think that there is a general solution to this problem. From my experience and form what I have read in veltet user mailing list assemblers like velvet don't work well whith extremely large data sets. The trick usually is to work with a subset of 10% of the reads. Make multiple assemblyes of several random subsets and then merge toghether the results.

The main reason to to that (in my opinion) is the fact that tools like velvet and abyss build a de bruijin graph that is based on the number of different k-mers present in the subset. Enourmous data sets imply the presentce of an high number of errors. The errors make the de bruijin graph sparse and this is the reason qhy we create thousands of little contigs.

Best regards
Francesco
francesco.vezzi is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 04:37 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO