SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
SOAPdenovo v1.05 scami Bioinformatics 1 01-04-2013 11:04 PM
trans abyss assembly papori RNA Sequencing 3 04-18-2011 06:10 PM
Trans-abyss reference genome?? ritzriya RNA Sequencing 4 01-13-2011 06:32 AM
Getting Trans-ABySS running mrawlins Bioinformatics 11 11-23-2010 01:30 PM
SOAPdenovo AnthonyB Bioinformatics 2 05-30-2009 06:12 PM

Reply
 
Thread Tools
Old 02-24-2012, 04:36 AM   #1
pbrand
Member
 
Location: Bochum, Germany

Join Date: Feb 2012
Posts: 13
Default Knowledge about SOAPdenovo-trans

Hi all,
I found a unpublished tool for transcriptome de novo assembly: SOAPdenovo-trans http://soap.genomics.org.cn/SOAPdenovo-Trans.html but I haven't found anything about it's performance in comparison with e.g. Trinity or Oases. I heard, that it shall do a good job.

Can anyone say something about its performance?

Thanks in advance,
Philipp
pbrand is offline   Reply With Quote
Old 06-12-2012, 01:57 PM   #2
dongilbert
Junior Member
 
Location: indiana

Join Date: Jun 2012
Posts: 9
Default Knowledge about SOAPdenovo-trans

Find here a summary of my uses of your RNA transcript assemblers,
comparing with what I see as 3 good and improving programs for
this: Velvet/Oases, Trinity and SOAPdenovo-Trans.

http://arthropods.eugenes.org/EvidentialGene/evigene/
evigene_rnaseq_2012_stats.txt

Very briefly, three de-novo assemblers tested here are closely ranked,
and ranking depends on the particular species and data set used.
Locust insect: Velvet/O > Trinity > SOAPTrans
Cacao plant: SOAPTrans > Trinity > Velvet/O >> Cufflinks
Daphnia waterflea: Velvet/O > SOAPTrans > Trinity >> Cufflinks

SOAPTrans in particular can assembly better, quicker, with less memory use than the other two. It can also fail inexplicably, or do worse than the others.
My recommendation is to try these three, see which works for you and if possible use them all and extract the best subset by some gene evidence criteria (like homology, high coding ratio, ...).
dongilbert is offline   Reply With Quote
Old 06-21-2012, 12:15 AM   #3
pbrand
Member
 
Location: Bochum, Germany

Join Date: Feb 2012
Posts: 13
Default

Hi Don,
thanks for your post, it was really helpful.

Regarding SOAP-trans what k-mer seems the best to you to use, as it has no multi-k function like oases? I am interested in the low and highly expressed genes in my transcriptome.

Cheers,
Philipp
pbrand is offline   Reply With Quote
Old 03-01-2016, 12:15 PM   #4
Rahul shelke
Junior Member
 
Location: Guwahati,India

Join Date: Mar 2015
Posts: 8
Default

In between 25-31 would be good for RNA seq
Rahul shelke is offline   Reply With Quote
Old 03-01-2016, 03:11 PM   #5
dongilbert
Junior Member
 
Location: indiana

Join Date: Jun 2012
Posts: 9
Default SOAPdenovo-trans and kmer size for best gene assembly

This is an old thread but still relevant as there is much mis-information about this. Given that great improvements have been made in Illumina read quality since the early generation of short short 35 bp reads, we need to revisit how best to assemble these. Kmer size shreds reads to smaller pieces to better assemble, but when reads are accurate, shredding introduces errors by allowing mis-mated reads to be assembled together.

For highly expressed genes, that are long and somewhat repetetive (eg. muscle genes), small kmers are a problem for inaccuract gene assembly, even though there use can lead to that technical measure of "more reads assembled". We should care more about "more accurately assembled genes". When I use kmer sizes up to the read size (eg. 100 bp or longer), I get the most accurate gene assemblies for some of the loci that are well expressed. On average, the most accurare gene assemblies are for kmers above 35 ranging to 95. This holds for SOAPtrans, Velvet/Oases, idba-trans, and is why these do better than Trinity, since the later is restricted to 25 or 31 kmer.

Here is a recent example from the yellow fever mosquito Anopheles, for longest 10,000 genes assembled, best kmer size:
10k_longest 1k_long
226 k05 18 k05
1224 k25 92 k25
2912 k35 335 k35
1852 k45 197 k45
1553 k55 182 k55
1069 k65 106 k65
522 k75 28 k75
414 k85 21 k85
228 k95 21 k95

Best assembler:
10k_longest 1k_long
4684 velo 580 velvet/oases
3675 idba 275 idba-trans
1306 soap 116 soapdenovo
335 trin 29 trinity

E.g. Velvet/oases remains the most capable accurate gene assembler, and does so in part by doing well with kmer > 30 gene assemblies. SOAP denovo remains good, but "idba-trans" has surpassed it in producing 2nd most accurate assemblies. Trinity is in last place still (and this is w/ mos recent 2014/2015 version).

Another important note is these genes assembled from mRNA-seq are more accurate, more orthology-complete, than the gene models from MAKER predicted on genome assembly of mosquitoes. RNA-seq and MAKER genes reported in doi: 10.1126/science.1258522, 2015, Highly evolvable malaria vectors:the genomes of 16 Anopheles mosquitoes.
dongilbert is offline   Reply With Quote
Old 02-07-2017, 09:25 PM   #6
smurmu
Junior Member
 
Location: new delhi

Join Date: Oct 2016
Posts: 6
Default

Hi...

I'm using SOAPdenovo-Trans to assemble the SOLiD single-end reads of 50bp length. The input fastq file contains 112537370 reads. My config file is as follows:

#maximal read length
max_rd_len=50
[LIB]
#in which part(s) the reads are used
asm_flags=3
#fastq file for single reads
q=/path/gm.fastq

Then I ran the following command:
./SOAPdenovo-Trans all -s config_file -o outputGraph -R -L 300

It has been two days since the process started and is still continuing. But there has been to changes in the output directory. This makes me doubt if the process is stuck somewhere or the command I gave is incorrect. The only thing I can see in my command prompt is this:

The version 1.03: released on July 19th, 2013

pregraph -s soap.config -K 23 -o outputGraph
In soap.config, 1 libs, max seq len 50, max name len 256
8 thread created
read from file:
/run/media/nfb/data/sneha_nf/GLYCINE_MAX/DATA/basespace/glycine_max.fastq
--- 100000000th reads
--- 200000000th reads
--- 300000000th reads
--- 400000000th reads
--- 500000000th reads
--- 600000000th reads
--- 700000000th reads
--- 800000000th reads
--- 900000000th reads
--- 1000000000th reads
--- 1100000000th reads
--- 1200000000th reads
--- 1300000000th reads
--- 1400000000th reads
And it is still continuing. I wonder because it exceeds the number of reads that is in the input file.
smurmu is offline   Reply With Quote
Reply

Tags
rna-seq, soap denovo, transcriptome assembly

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 02:29 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO