SEQanswers

SEQanswers (http://seqanswers.com/forums/index.php)
-   Bioinformatics (http://seqanswers.com/forums/forumdisplay.php?f=18)
-   -   Assembling De Novo 454 Transcriptome Contigs and Singletons with Illumina Short Reads (http://seqanswers.com/forums/showthread.php?t=9768)

Vickenstein 03-01-2011 06:44 AM

Assembling De Novo 454 Transcriptome Contigs and Singletons with Illumina Short Reads
 
Background: I have assembled 4 million None Normalized 454 Titanium de novo transcriptome Reads using Newbler 2.5, and have gotten amazing results, 88000 contigs with N50 of 951 bp, 35000 isotigs with N50 1500 longest being 7900, and 276564 singletons. Now I am about to sequence the same transcriptome using Illumina 75+ bp platfrom.
Problem: I am contemplating the assembly software that I should be using.
I really like newber's isoform prediction, and am not sure if it possible to merge both sets of raws reads together in an good assembly.
I haven't seen any software out that are able to utilize a transcriptome reference for assembling new reads

aparna 03-02-2011 05:58 AM

I worked on Newbler on transcriptome data,while the concept of isotigs is good- 88k contigs with N50 951 is horrible.
With Illumina data I would advise you to look at Oasis/velvet and scripture.

cram 03-02-2011 08:43 AM

Quote:

I worked on Newbler on transcriptome data,while the concept of isotigs is good- 88k contigs with N50 951 is horrible.
Actually, an N50 of 951 sounds very good to me if you're doing a de-novo transcript assembly of some reasonably complex eukaryote. Remember too, Newbler uses a different definition of contig than most other assemblers, and the isotig N50 is probably a better value to use when comparing to other tools.

flxlex 03-03-2011 05:23 AM

Try adding the Illumina reads to newbler 2.5! See (my blog): http://contig.wordpress.com/2011/01/...her-platforms/

Vickenstein 03-03-2011 06:31 AM

Thanks for the reply. From the look of it I might have to reassemble the raw reads from 454 and the new reads from Illumina using both Oasis/velvet and Newbler. I will compare the results between these two methods.

BaCh 03-04-2011 04:43 AM

Quote:

Originally Posted by Vickenstein (Post 36089)
[...] Now I am about to sequence the same transcriptome using Illumina 75+ bp platfrom. [...]
I am contemplating the assembly software that I should be using.

I use the current development version of MIRA (V3.2.1.8) and just went through a RNASeq denovo 100bp with 22m reads.

Quote:

Originally Posted by Vickenstein (Post 36089)
I really like newber's isoform prediction, and am not sure if it possible to merge both sets of raws reads together in an good assembly.

It is. I regularly use MIRA for genome de-novo with 454 and Illumina (ranging from 36 to 100mers). Should also work with mixed transcriptome.

Quote:

Originally Posted by Vickenstein (Post 36089)
I haven't seen any software out that are able to utilize a transcriptome reference for assembling new reads

MIRA. No problem if your reference is a transcriptome, but stay away from trying to map RNASeq to a genome, that will fail miserably at intron/exon boundaries.

B.

Disclaimer 1: I'm the author of MIRA, your mileage may vary (but then I'd like to hear about it)
Disclaimer 2: for data sets with more than 40m reads you probably want to wait for a next version.

sklages 03-04-2011 12:22 PM

Quote:

Originally Posted by Vickenstein (Post 36300)
Thanks for the reply. From the look of it I might have to reassemble the raw reads from 454 and the new reads from Illumina using both Oasis/velvet and Newbler. I will compare the results between these two methods.

Depending on the library itself, I suspect that coverage issues might also prevent a "good assembly" with non-normalized data. I had a 3mio-titanium-reads-human-non-normalzied-cDNA dataset;one fourth of the whole library representing the same gene ... Newbler even failed to map and assemble this set (crashed during consensus calculation). Non-normalized libraries are not the best option to assemble (denovo) with NGS data .. but as I said, it depends on the library.

Sven

BaCh 03-05-2011 12:43 AM

Quote:

Originally Posted by sklages (Post 36436)
I had a 3mio-titanium-reads-human-non-normalzied-cDNA dataset;one fourth of the whole library representing the same gene ... Newbler even failed to map and assemble this set (crashed during consensus calculation).

*cough* 750k times the same gene? I would not expect any program to really assemble that de-novo if not specially primed. In mapping too, the coverage might be somewhat on the unexpected side.

B.


All times are GMT -8. The time now is 07:37 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.