Seqanswers Leaderboard Ad

**gringer** · 01-23-2012, 09:53 AM

If you know someone else who has done RNASeq on your organism, you could test the assembly with their data. Any de-novo assembly you do should have a really good mapping of your reads back to the assembly, but might not be so great with someone else's reads.

**arvid** · 01-24-2012, 01:35 AM

I have good results with Trinity for Illumina data, I guess it won't be too happy about 454 reads though, unless you pre-process them to correct homopolymer errors.

Metrics for de novo transcriptomes are difficult to define, we have tried to map the transcript contigs to the transcripts of similar organisms to get an idea of the completeness. You could look at the contig length distribution and compare it to that of a similar organism.

For MIRA I suggest you to ask on the mailing list, Bastien is quite fast in helping out new users there... It might choke on big Illumina sets though, make sure you have lots of RAM and time for your analysis or subset your dataset to have a manageable run.

**aliceb** · 05-08-2012, 06:41 PM

Trinity vs Mira /de novo assembly

Hi all,

I'm keen to see how others are getting on with de novo assemblies, particularly with Trinity. It's interesting to me that their Nat. Biotechnology paper doesn't mention Mira, and I was wondering if anyone has compared the two programs.

I'm doing de novo assemblies using 50bp single-read Illumina data, with a little 454 data thrown in there. When Trinity first came out, it crashed pretty quickly. But now that they have different options for the first step/inchworm (I've been trying jellyfish), I've been able to assemble 100 million reads on my local machine (24GB ram) in less than a day. This has been the case for Illumina data alone, and with the 454 data pooled. I suspect, however, that the 454 data had little impact on the outcome, because I only have about 200,000 reads!

So far, Trinity gives me more long reads and has less redundancy (according to TGICL). But it's always difficult to assess these alignments. In particular, I can't find out how much of my data is being used by Trinity. Is there a handy report file with this information somewhere? The webpage suggests using bowtie to figure out what has gone into the alignment, but this will throw out anything that aligns ambiguously. Is there an easier way? Does anyone else have experience with Trinity that they can share?

Also, Trinity is able to align all of my data at once, whereas Mira was crashing when I tried to align it all together (even on a cluster with 96GB RAM). I was getting around this by partitioning my data in mira, so it was working. But doing it all in one alignment is a plus.

And Liz- you might have found this already, but the example inputs on the mira html guide are quite useful: http://mira-assembler.sourceforge.ne...ideToMIRA.html

Thanks!
-Alice

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 18 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 22 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 17 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 49 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Pipeline for de novo RNA sequencing, and Galaxy

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News