SEQanswers

Go Back   SEQanswers > Applications Forums > De novo discovery



Similar Threads
Thread Thread Starter Forum Replies Last Post
de novo assembly using Trinity versus Velvet-Oases Nol De novo discovery 8 10-26-2013 12:56 PM
Which assembler for de-novo Illumina transcriptome assembly with relatively few reads kmkocot Bioinformatics 1 05-17-2011 04:13 AM
Assembling De Novo 454 Transcriptome Contigs and Singletons with Illumina Short Reads Vickenstein Bioinformatics 7 03-05-2011 01:43 AM
PubMed: De novo assembly of short sequence reads. Newsbot! Literature Watch 0 08-21-2010 03:01 AM
De novo Assembly of a 40 Mb Eukaryotic Genome from Short Sequence Reads Michael.James.Clark Literature Watch 1 04-09-2010 01:16 PM

Reply
 
Thread Tools
Old 02-02-2010, 12:01 PM   #1
lcollado
Member
 
Location: Baltimore, MD

Join Date: Jun 2009
Posts: 65
Default Oases: De novo transcriptome assembly of very short reads

Hello,

I thought that this might be interesting to share. It was sent to the Velvet user's list:

Quote:
Dear Velvet users,

Marcel Schulz (Max Planck Institute for Molecular Genetics) and I are very
pleased to announce the beta release of the Oases transcriptome assembler.

Many researchers wish to use their powerful next-gen sequencing machines
to study the transcriptomes of new species. Unfortunately, Velvet is not
designed for that task, as the repeat resolution modules rely explicitly
on assumptions of linearity and uniform coverage distribution. This means
that Velvet only produces fragmented transcriptome assemblies.

This is why we jointly developed Oases. This new program takes in a
preliminary assembly produced by Velvet, and exploits the read sequence
and pairing information to produce transcript isoforms. When possible, it
also detects and reports standard alternative splicing events. It is
specifically designed to get around the issues of unequal expression
levels and alternative splicing breakpoints.

The code is still quite new, but it has already been thoroughly tried out
by Marcel. He observed some very promising results on both simulated and
experimental datasets.

If you wish to try out Oases, simply consult the webpage at
www.ebi.ac.uk/~zerbino/oases . All feedback and suggestions are more than
welcome!

Best regards,

Daniel
On the manual they refer to this paper: http://dx.doi.org/10.1371/journal.pcbi.1000147


I have yet to try Oases or read the paper, but I bet that those in my lab will be interested to know if it works with bacterial transcriptomes.

Greetings,
Leonardo
__________________
L. Collado Torres, Ph.D. student in Biostatistics.
lcollado is offline   Reply With Quote
Old 02-05-2010, 07:56 PM   #2
MarcelS
Junior Member
 
Location: Berlin, Germany

Join Date: Feb 2010
Posts: 5
Default Oases paper and oases mailing list

Hi Leonardo,
first of all thanks for posting the initial announcement of Oases in this forum.

>On the manual they refer to this paper: >http://dx.doi.org/10.1371/journal.pcbi.1000147
We do refer to the paper by Sammeth et al., but not because Oases or any info about transcriptome assembly is described there. This paper defines a nomenclature for alternative splicing events that we have adapted for parts of the output of Oases. But it's worth reading anyhow .

The paper about Oases is not published, yet. Daniel and me put the software online as many people requested it. However it is still in beta.
For people that are interested in details the best is to subscribe to the new mailing list that Daniel set up and where all improvements and information is posted by us:
http://listserver.ebi.ac.uk/mailman/...fo/oases-users

>I have yet to try Oases or read the paper, but I bet that those in my lab >be interested to know if it works with bacterial transcriptomes.
Although I never tried it, for bacterial transcriptomes Oases should work really well. I would be great if could give us some feedback once you tried it.

Kind regards,
Marcel
MarcelS is offline   Reply With Quote
Old 02-15-2010, 12:39 PM   #3
jjohnson
Member
 
Location: Washington DC Metro Area

Join Date: Aug 2009
Posts: 20
Default

Hi Marcel,

Does oases support the use of SOLiD data. Can I run velvetg_de and feed this into oases?

Justin
__________________
Justin H. Johnson | Twitter: @BioInfo | LinkedIn: http://bit.ly/LIJHJ | EdgeBio
jjohnson is offline   Reply With Quote
Old 02-15-2010, 01:22 PM   #4
MarcelS
Junior Member
 
Location: Berlin, Germany

Join Date: Feb 2010
Posts: 5
Default

Quote:
Originally Posted by jjohnson View Post
Hi Marcel,

Does oases support the use of SOLiD data. Can I run velvetg_de and feed this into oases?

Justin
Hi Justin,
yes it does. Daniel tried it with one single-end SOLiD data set already. We would be very interested in getting some feedback about the quality of the results on other, possibly paired-end, SOLiD data. I copy a mail from the user list with a discussion and explanation by Daniel about using Solid with Oases:

>Hi Daniel,
>I think the point that Titus and other solid users is trying to drive at is
>that if you convert to sequence space earlier in the analysis pipeline, you
>lose the potential benefit of sequencing in color space.
>if you convert to sequence base at any point in the pipeline, I believe
>there is no sequence analysis program that will refuse solid data.

>>Hello Kevin,
>>Absolutely, but velvet_de (double encoded) does not function in sequence space.
>>Double encoding, although it uses letters {ATCG}, is in fact colorspace in disguise.
>>To summarize, the correct pipeline for SOLiD genomic assembly is:
>> Colorspace data => pre-conversion to double encoding* => velvet_de => post-conversion to sequence-space*
>>By analogy, the appropriate pipeline for SOLiD transcriptome assembly is:
>> Colorspace data => pre-conversion to double encoding* => velvet_de => oases => post-conversion to sequence space*
>> In effect, all velvet/oases operations are made in colorspace, under cover of double encoding.
>> Just to repeat myself, although the Velvet step needs to be performed with the specific velvet_de executable (under penalty of seriously mixed-up >>sequences), Oases works indifferently on double encoding or on sequence space (it just sees strings of letters). You can therefore use the same >>Oases executable as you would use in a sequence-space pipeline.

>>Best regards,
>>Daniel

>>(* programs available at http://solidsoftwaretools.com/gf/project/denovo/ , developed by Craig Cummings and Vrunda Sheth at ABI)


Good luck!
Marcel
MarcelS is offline   Reply With Quote
Old 03-01-2010, 09:42 PM   #5
lcollado
Member
 
Location: Baltimore, MD

Join Date: Jun 2009
Posts: 65
Default

I was checking the huge sea of blogs and posts on ABGT 2010 and found this one to be related: http://www.fejes.ca/2010/02/agbt-201...aas-broad.html

@jjohnson
Velvet can run with SOLiD data and so can Oases.
__________________
L. Collado Torres, Ph.D. student in Biostatistics.
lcollado is offline   Reply With Quote
Old 07-09-2010, 01:20 PM   #6
cerca
Junior Member
 
Location: College Park, MD

Join Date: Jul 2010
Posts: 1
Default

Quote:
Originally Posted by MarcelS View Post
Hi Justin,
yes it does. Daniel tried it with one single-end SOLiD data set already. We would be very interested in getting some feedback about the quality of the results on other, possibly paired-end, SOLiD data. I copy a mail from the user list with a discussion and explanation by Daniel about using Solid with Oases:

>Hi Daniel,
>I think the point that Titus and other solid users is trying to drive at is
>that if you convert to sequence space earlier in the analysis pipeline, you
>lose the potential benefit of sequencing in color space.
>if you convert to sequence base at any point in the pipeline, I believe
>there is no sequence analysis program that will refuse solid data.

>>Hello Kevin,
>>Absolutely, but velvet_de (double encoded) does not function in sequence space.
>>Double encoding, although it uses letters {ATCG}, is in fact colorspace in disguise.
>>To summarize, the correct pipeline for SOLiD genomic assembly is:
>> Colorspace data => pre-conversion to double encoding* => velvet_de => post-conversion to sequence-space*
>>By analogy, the appropriate pipeline for SOLiD transcriptome assembly is:
>> Colorspace data => pre-conversion to double encoding* => velvet_de => oases => post-conversion to sequence space*
>> In effect, all velvet/oases operations are made in colorspace, under cover of double encoding.
>> Just to repeat myself, although the Velvet step needs to be performed with the specific velvet_de executable (under penalty of seriously mixed-up >>sequences), Oases works indifferently on double encoding or on sequence space (it just sees strings of letters). You can therefore use the same >>Oases executable as you would use in a sequence-space pipeline.

>>Best regards,
>>Daniel

>>(* programs available at http://solidsoftwaretools.com/gf/project/denovo/ , developed by Craig Cummings and Vrunda Sheth at ABI)


Good luck!
Marcel
Hi Marcel and Daniel,
I'm trying to reconstruct transcripts from RNA-seq (solid).

* I have converted my reads to double encoded
* executed velvet_de/Oases
* Now I have the transcript in double encoded format and I'm trying to convert them to base space.

Can you recommend a tool for this last step?

I checked the link
http://solidsoftwaretools.com/gf/project/denovo/

But I was not able to figure out the best course of action.

Thanks

Last edited by cerca; 07-09-2010 at 01:24 PM.
cerca is offline   Reply With Quote
Old 07-31-2010, 11:14 PM   #7
xuying
Member
 
Location: Shanghai, China

Join Date: Mar 2008
Posts: 16
Default

Hi guys,
I am also very interested in how to convert from double-encoded format to base space by using denovo (or denovo2) tools once oases finished its job.
manual for this? I am using a SOLiD fragment library.

Best,
Ying
xuying is offline   Reply With Quote
Old 08-04-2010, 11:03 PM   #8
mucku
Member
 
Location: Berlin

Join Date: Jan 2009
Posts: 14
Default

Hello,
One question.
Is Velvet mandatory for Oases or can the contigs be derived from a different short read assembler such as ABySS?

Cheers

Markus
mucku is offline   Reply With Quote
Old 10-04-2010, 01:36 AM   #9
stefane
Junior Member
 
Location: Uppsala, Sweden

Join Date: Jun 2008
Posts: 2
Default

Quote:
Originally Posted by xuying View Post
Hi guys,
I am also very interested in how to convert from double-encoded format to base space by using denovo (or denovo2) tools once oases finished its job.
manual for this? I am using a SOLiD fragment library.

Best,
Ying
Hi,

I also got stuck on this part, did you solve it?

/Stefan
stefane is offline   Reply With Quote
Old 10-04-2010, 01:49 AM   #10
kopi-o
Senior Member
 
Location: Stockholm, Sweden

Join Date: Feb 2008
Posts: 317
Default

In the past, I've managed to convert Velvet/Oases output to base space using this package:

http://solidsoftwaretools.com/gf/project/denovotools/

Note that it's a different package from the previously mentioned http://solidsoftwaretools.com/gf/project/denovo/
kopi-o is offline   Reply With Quote
Old 10-04-2010, 04:25 AM   #11
stefane
Junior Member
 
Location: Uppsala, Sweden

Join Date: Jun 2008
Posts: 2
Default

Quote:
Originally Posted by kopi-o View Post
In the past, I've managed to convert Velvet/Oases output to base space using this package:

http://solidsoftwaretools.com/gf/project/denovotools/

Note that it's a different package from the previously mentioned http://solidsoftwaretools.com/gf/project/denovo/
Thanks!

The pre/post-processor around Velvet/Oases worked nicely and followed by the 'denovoadp' I think I'm back to base space.

But from what I've gathered the default input to the post-processor is the afg-output from Velvet, can I apply the same on the "transcripts.fa" as given by Oases?

/Stefan
stefane is offline   Reply With Quote
Old 10-21-2010, 11:37 PM   #12
AronaldJ
Junior Member
 
Location: whole transcriptome

Join Date: Oct 2010
Posts: 8
Default

Thank kopi-o work for denovo2, I use solid_denovo_preprocessor.pl to complete Oases. but the result is transcripts.fa and can not use solid_denovo_postprocessor.pl to convert. I look at "java -cp $denovo2/utils/miniAssembler.jarcom.lifetech.miniAssembler.util.FormatsTranslator <conversion_type> <sequence_file> <out_converted_file>" in DeNovoAssemblyProtocol0060810.pdf at page 32. however, in my denovo2 folder, there is not miniAssembler.jarcom.lifetech.miniAssembler.util.FormatsTranslator but miniAssembler. it is not to convert.

I look forward to your letter.
Thank you
AronaldJ is offline   Reply With Quote
Old 11-17-2010, 07:29 AM   #13
blackgore
Member
 
Location: UK

Join Date: Sep 2009
Posts: 19
Default

In terms of the practical memory limitations faced by Velvet, how big/complex a transcriptome can likely be assembled using Oases? I understand Curtain was brought out to get around the memory issues of Velvet itself, in order to assemble progressively larger genomes, but can it be applied here too?
blackgore is offline   Reply With Quote
Old 12-20-2010, 08:20 AM   #14
neXtGen seq
Junior Member
 
Location: USA

Join Date: Dec 2010
Posts: 2
Default question about Abyss

Hi

Question about velvette oases

1. Can i assemble illumina/ solexa paired end data that was prepaired prior to assembly using an inhouse script using velvette oases. Paired end information may be lost due to prepairing.. what is the best way to assemble such data denovo

2.Also what is the best way to get summary statistics from the run with N50 contig size, what % of reads were used in the assembly, number of contigs/transcripts etc etc...

3. How to decide on hash length and Kmer size.
I am new to Next gen seq assembly. So these many questions...
If you have commands that would do this , it will be very helpful.

Thanks

Andy

Last edited by neXtGen seq; 12-20-2010 at 09:10 AM. Reason: wrong title
neXtGen seq is offline   Reply With Quote
Old 01-24-2011, 08:56 PM   #15
Brett_CCG
Member
 
Location: Perth

Join Date: Jan 2011
Posts: 11
Default

Hi

I'm having problems with Oases.

I am performing a de novo assembly of a plant transcriptome. There is no reference genome available.

I am getting lots of Ns throughout the transcriptome.

I'd like to know how and why Oases inserts Ns and possible ways to avoid them?

I'm also interested in estimating expression levels. I have aligned the reads to the transcripts using BWA and SAMtools to calculate the mean coverage of each transcript. I am not confident with this strategy due to the randomness of which BWA aligns reads whens there are two positions of equal alignment. Any suggestions of possible strategies? I've seen some tools which can estimate expression levels, but all require the reference genome.

Thanks for your help.
Brett_CCG is offline   Reply With Quote
Old 02-11-2011, 01:44 AM   #16
Thorondor
Member
 
Location: Heidelberg

Join Date: Feb 2011
Posts: 68
Default

i also have my problems with oases and it is still kind of a black box because there is no paper about it. If you want less Ns you could probably just reduce the insert length. Of course this is the tricky way and probably not that reasonable. ;-)

Let me know when you figured out how exactly oases does scaffolding. I tried to get some information out of the code, sad it is not easy to "read" .

if you have a near relative where the proteome is known you could try out this sort of scaffolding: http://genome.cshlp.org/content/20/10/1432 .

What you could try for expression level is mapping back the velvet contigs which are used for oases, since these have the coverage in there ID you will also get and idea about the expression levels.
Thorondor is offline   Reply With Quote
Old 03-02-2011, 07:44 AM   #17
schmima
Member
 
Location: Zürich

Join Date: Apr 2010
Posts: 54
Default ad problem with solid

[second edit]
[third edit: seems to work fine. Used BLAT to check the contigs and transcripts - results as expected. However - the difference between asid -merge and the postprocessor script are still unknown.]

should work:

sample: arbitrary samplename
workdir: directory with everything inside

### run the preprocessor. In case you want to filter something, do it before this step... (before i pipetted directly into velvet with a filter - reason for the previous mess)

perl denovo_preprocessor_solid_v2.2.1.pl -run fragment -f3 sample.csfasta -dir workdir/


### run velvet. The -amos_file yes option is only required if you use the denovo_postprocessor_solid_v1.6.pl script

velveth_de workdir 31 -short -fasta workdir/de_fragment_input.de
velvetg_de workdir -read_trkg yes -amos_file yes


### if you want the velvet contigs in nucleotide space: either you use the script and asid_light:

perl denovo_postprocessor_solid_v1.6.pl --afgfile workdir/velvet_asm.afg --csfasta workdir/cs_fragment_input.csfasta --output workdir/color_reads.ma
asid_light -convert workdir/color_reads.ma [number of characters per line] > workdir/nt_contigs.fa


### or only asid_light:

asid_light -merge workdir/LastGraph workdir/contigs.fa workdir/cs_fragment_input.csfasta workdir/cs_fragment_input.idx workdir/gap_reads/ workdir/color_reads.ma workdir/asid_scaffolds.de graph2ma [mincontiglength] [library type]
asid_light -convert workdir/color_reads.ma [number of characters per line] > workdir/nt_contigs.fa

comparing the output after running velvet only (using the script and asid or only asid) I saw that they are different - reason?
partly due to min contig length of 80 in asid_light -merge; but there must be another reason...
already the color_reads.ma files are slightly different - and the contigs seem to be somehow similar - but they are definitively not equal. also runtimes are very different...

any ideas?

haven't got the time to look at it yet in detail... update may follow


[third edit - adding the oases stuff]

### run oases (-amos_file yes is required for the script)

oases_de workdir [whatever options] -amos_file yes


### again - to convert the transcripts, use either the script and asid_light:

perl denovo_postprocessor_solid_v1.6.pl --afgfile workdir/oases_asm.afg --csfasta workdir/cs_fragment_input.csfasta --output workdir/color_reads.ma
asid_light -convert workdir/color_reads.ma [number of characters per line] > workdir/nt_transcripts.fa

### or only asid_light:

asid_light -merge workdir/LastGraph workdir/transcripts.fa workdir/cs_fragment_input.csfasta workdir/cs_fragment_input.idx workdir/gap_reads/ workdir/color_reads.ma workdir/asid_scaffolds.de graph2ma [mincontiglength] [library type]
asid_light -convert workdir/color_reads.ma [number of characters per line] > workdir/nt_transcripts.fa

## done

Last edited by schmima; 03-06-2011 at 11:52 PM.
schmima is offline   Reply With Quote
Old 03-06-2011, 10:29 PM   #18
Symphysodon
Junior Member
 
Location: Australia

Join Date: Mar 2011
Posts: 5
Default Oases: segmentation fault

Hi all,

I'm trying to use Oases to do a de novo transcriptome assembly of single-end Illumina reads, but am getting the following error message:

[0.000000] Reading graph file velvet_assembly//Graph2
[0.001334] Graph has 120201 nodes and 22312981 sequences
Segmentation fault


I am using Oases 0.1.18 and Velvet 1.0.18. I assembled the single reads with Velvet 1.0.18 and kept -read_trkg on.

The size of my Velvet output files are:
37M contigs.fa
491M Graph2
491M LastGraph
611 Log
76M PreGraph
1009M Roadmaps
3.2G Sequences
10M stats.txt

I am trying to do the job on a 128 GB RAM machine.

Appreciate any suggestions for a solution. Thanks!
Symphysodon is offline   Reply With Quote
Old 03-06-2011, 11:47 PM   #19
schmima
Member
 
Location: Zürich

Join Date: Apr 2010
Posts: 54
Default

hm - sounds like an uncatched/unexpected error.

any errors during the compilation / did you forward all the velvet-settings while compiling oases? eg:

velvet compilation:
make 'CATEGORIES=1' 'MAXKMERLENGTH=31'
oases compilation:
make 'VELVET_DIR=/blabla/velvet_1.0.18' 'MAXKMERLENGTH=31' 'CATEGORIES=1')

[I dont think that RAM is a problem... In case the reads are not a total crap it should be no problem - filtered in my case the low quality and low complexity reads. with 20 mio reads (in color space however) I was doing well with my 12 GB desktop computer (k = 31)]

Last edited by schmima; 03-06-2011 at 11:49 PM.
schmima is offline   Reply With Quote
Old 03-07-2011, 12:17 AM   #20
Thorondor
Member
 
Location: Heidelberg

Join Date: Feb 2011
Posts: 68
Default

if you can reproduce this error, you should tell zerbino about that: http://www.ebi.ac.uk/~zerbino/oases/ join the mailinglist. ;-) zerbino is active there and you get a response quite fast.

not much you can do when you get "segmentation fault" as an error, what kmer do you use? Probably try again with a higher hash value?
Thorondor is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 09:52 AM.


Powered by vBulletin® Version 3.8.6
Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.