SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > 454 Pyrosequencing



Similar Threads
Thread Thread Starter Forum Replies Last Post
bwa question on file outputs/inputs ikrier Bioinformatics 12 10-12-2012 08:22 AM
Bowtie. Different outputs from equivalent(?) inputs. AEB Bioinformatics 1 08-18-2011 12:34 AM
de novo 454 assembly w/ newbler ... how long? jnfass De novo discovery 7 06-21-2011 12:13 AM
Newbler de novo assembly moinul De novo discovery 3 05-27-2011 05:13 PM
Newbler de novo assembly and repeats wiart De novo discovery 2 08-19-2009 12:28 PM

Reply
 
Thread Tools
Old 10-13-2010, 01:58 AM   #1
nicolallias
Member
 
Location: France

Join Date: Jan 2010
Posts: 23
Default gsAssembly (Newbler) de novo behaviour, inputs and outputs

Hello there,
It appears that the content of the ACE file is quite inconsistent with the fasta files content and the 454NewblerMetrics file:
Sometimes, the fna files are empty, 454NewblerMetrics has no contigs nor isotigs but...
The corresponding ACE will have a hundred contigs/isotigs with a mean depth of 10 000 and minimum size of 120.
Looking into the ace file show that these sequences are fulled with "*".

So I wonder: is the ace file a temporary (unfiltered) one ?

Another point: it seems that running a de novo assembly starting from fasta and not sff improve the assembly quality (less "*" in the sequences)

Anyone has seen that ? Is there some Newbler options to use for fixing this ?
Cheers
nicolallias is offline   Reply With Quote
Old 10-13-2010, 10:51 PM   #2
flxlex
Moderator
 
Location: Oslo, Norway

Join Date: Nov 2008
Posts: 415
Default

Based on the fact that you state that the 454NewblerMetrics file reports no contigs, I would assume there is something that prevents assembly (or is this mapping?). Any contigs in the ace file are probably artifacts.

What kind of project/sample/reads do you have?

The '*' symbols represent gaps introduced to optimize the alignment, and 454 ace files have tons of them due to the homopolymer errors (or rather, variation in homopolymer length between reads). I would not deem the amount of these '*' a measure of quality. If you want to know the effect of using fasta input over sff, I would take reads from a known genome and check the correctness of the contigs relative to the reference sequence. Although I expect sff input to give better results, you never know...

If somebody has done this already, let me know :-)
flxlex is offline   Reply With Quote
Old 10-14-2010, 03:40 AM   #3
nicolallias
Member
 
Location: France

Join Date: Jan 2010
Posts: 23
Default

Hi flxlex,
I'm considering de novo assemblies only, these observations are based on the study of dozens of assemblies done with Newbler 2.3 and this problem occurs:
- on gDNA from bacteria
- on gDNA from eukarya
- on cDNA (option -cdna) from plantae
But I all cases have some example cons.

Using the same raw datas we obtained different behaviour from newbler.
If we provide a sff file, contigs will be mainly composed by '*' what is not true if we basicly convert this file into fasta and assmble it with newbler. If doing so the asmmbly will look just fine.
Where could the behaviour difference could come from ?

Still investigate...

Last edited by nicolallias; 10-15-2010 at 12:02 AM.
nicolallias is offline   Reply With Quote
Old 10-16-2010, 06:26 AM   #4
flxlex
Moderator
 
Location: Oslo, Norway

Join Date: Nov 2008
Posts: 415
Default

Quote:
Originally Posted by nicolallias View Post
Hi flxlex,
I'm considering de novo assemblies only, these observations are based on the study of dozens of assemblies done with Newbler 2.3 and this problem occurs:
- on gDNA from bacteria
- on gDNA from eukarya
- on cDNA (option -cdna) from plantae
But I all cases have some example cons.
I don't understand, you get empty fna files and ace files with many '*' contigs for all these assemblies?

Quote:
Using the same raw datas we obtained different behaviour from newbler.
If we provide a sff file, contigs will be mainly composed by '*' what is not true if we basicly convert this file into fasta and assmble it with newbler. If doing so the asmmbly will look just fine.
Where could the behaviour difference could come from ?
I am not sure, but what I would be more interested in is how correct the contigs are between assemblies using sff and fasta files. if you do this for a known genome (e.g. E. coli) what do you find?
flxlex is offline   Reply With Quote
Old 10-18-2010, 01:21 AM   #5
nicolallias
Member
 
Location: France

Join Date: Jan 2010
Posts: 23
Default

Quote:
I don't understand, you get empty fna files and ace files with many '*' contigs for all these assemblies?
Yes
Quote:
I am not sure, but what I would be more interested in is how correct the contigs are between assemblies using sff and fasta files. if you do this for a known genome (e.g. E. coli) what do you find?
I will do that.

Edit: Both alignments on sff and fasta have been proceeded on E.coli, both gave ace and fna files full of contigs. But I won't look after the best method (fasta or sff) for alignment : the main question here is why ?
Sometimes we have nothing (ace fulled of "*", fna files empty) with the sff while we have something (ace and fna files have contigs) when starting with a fasta ?

We're getting contact with Roche...

Last edited by nicolallias; 10-19-2010 at 12:42 AM.
nicolallias is offline   Reply With Quote
Old 10-22-2010, 01:45 AM   #6
nicolallias
Member
 
Location: France

Join Date: Jan 2010
Posts: 23
Default

The problem seems to be the sff file, software developers at 454 are working on this.
Hope some good news soon ;-)
nicolallias is offline   Reply With Quote
Old 10-29-2010, 12:16 AM   #7
nicolallias
Member
 
Location: France

Join Date: Jan 2010
Posts: 23
Unhappy

Some news : the matter seems to be located on the sff file generation...
nicolallias is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 03:01 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO