SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
De Novo Assembly using Ray Farhat De novo discovery 18 05-23-2012 01:19 PM
De novo assembly mihir.karnik General 1 09-07-2011 01:49 PM
de novo assembly vs. reference assembly fadista General 3 02-15-2011 11:11 PM
de novo transcriptome assembly chenjy RNA Sequencing 4 12-06-2010 11:54 PM
de novo 454 assembly strob Bioinformatics 8 01-21-2009 10:26 AM

Reply
 
Thread Tools
Old 10-01-2010, 06:09 AM   #1
bioben
Junior Member
 
Location: IL

Join Date: Sep 2010
Posts: 6
Default problem with de novo assembly of ESTs

Hi,

I am trying to assemble ~10 million 454 ESTs and ~1 million sanger ESTs. I tried newbler, CAP3 and TGICL. They all output identical sequences more or less in the contigs and singlets files.

For example, I found the 454Isotigs.fna file contains many sequences that are 100% identical but with different lengths (i.e. one sequence contains another shorter one). Isn't this supposed not to happen. I mean they should be assembled as one?

I also tried CAP3 and TGICL. Again, they also output identical sequences more or less in the contigs and singlets files.

Does anyone know why? Thanks ...
bioben is offline   Reply With Quote
Old 10-03-2010, 10:23 PM   #2
Jose Blanca
Member
 
Location: Valencia, Spain

Join Date: Aug 2009
Posts: 70
Default

Maybe you could try with Mira, but I warn you that with no assembler you'll get perfect results. Most assemblers a focused on genomic sequences (no splicing and even coverage). They can be tweaked a little to assemble transcriptomes, but most of the times the results are not nice.
Theoretically newbler has a transcriptome mode, but last time I tried, the result was quite poor. After much testing I stuck with Mira despite its notable problems with the transcriptome.
Jose Blanca is offline   Reply With Quote
Old 10-04-2010, 06:27 AM   #3
flxlex
Moderator
 
Location: Oslo, Norway

Join Date: Nov 2008
Posts: 415
Default

The phenomenon of the almost identical sequences in the 454Isotigs.fna file is explained here and here (my blog on newbler, where I suggested running cd-hit on the isotigs for each isogroup to collapse them). At least newbler is trying...
flxlex is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 05:01 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO