SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > 454 Pyrosequencing



Similar Threads
Thread Thread Starter Forum Replies Last Post
Annotate contigs with BLAST hit names; remove contigs with no hit Bueller_007 Bioinformatics 10 02-27-2013 11:22 AM
Regarding gsAssembler and different results on same dataset ganga.jeena Bioinformatics 1 02-14-2011 09:58 PM
launching gsAssembler in Fedora cbean127 454 Pyrosequencing 12 11-09-2010 05:53 AM
Contigs from gsAssembler / newbler atalon1 454 Pyrosequencing 10 09-03-2010 04:07 AM
gsassembler. sharagii 454 Pyrosequencing 1 07-20-2009 08:34 AM

Reply
 
Thread Tools
Old 11-11-2011, 07:13 AM   #1
dsenalik
Carrot Scientist
 
Location: Madison WI USA

Join Date: Nov 2009
Posts: 42
Default Bug in gsAssembler 2.6, some contigs missing

tl;dr: The bug is, some large contigs are missing from the file of output contigs 454AllContigs.fna
-----
I believe there is a bug in the current version of gsAssembler, 2.6 (20110517_1502)
I contacted Roche three weeks ago but still have not heard back. Maybe they have everyone making kits
Has anyone else here come across this bug? Any solutions?

I am assembling one plate plus titrations of an 8kb paired end run.
Here are the exact assembly parameters for the assembly, using command line:

runProject -siod -het -urt -cpu 28 -info -m -noace -a 1 -l 2000 -large -scaffold /454/assemblydir

Notice the parameter -a 1 which sets the minimum contig length to 1. I should get ALL contigs, no matter how short (and I do get ones that short)

Here are a few FASTA headers (omitting the sequence lines) from 454AllContigs.fna, notice that some contigs are missing, such as 75
...
>contig00068 length=2566 numreads=160
>contig00071 length=2081 numreads=130
>contig00072 length=776 numreads=27
>contig00073 length=1145 numreads=310
>contig00074 length=187 numreads=41
>contig00076 length=1834 numreads=456
>contig00077 length=1922 numreads=219
>contig00078 length=432 numreads=45
>contig00080 length=128 numreads=17
>contig00081 length=3488 numreads=454
>contig00082 length=2433 numreads=353
>contig00083 length=4226 numreads=351

...
>contig109403 length=1 numreads=7


Here is the corresponding section of 454ContigGraph.txt, note that contig00075 IS there, but out of order
...
68 contig00068 2566 11.2
71 contig00071 2081 11.0
72 contig00072 776 6.0
73 contig00073 1145 43.3
74 contig00074 187 20.3
76 contig00076 1834 45.1
77 contig00077 1922 16.5
78 contig00078 432 12.2
80 contig00080 128 12.4
81 contig00081 3488 23.4
82 contig00082 2433 26.1
75 contig00075 187 22.4
79 contig00079 18 7.3
83 contig00083 4226 15.8

...

Later on in that same file is the connection information, here is a summary
$ bb.454contiginfo --in=../assembly --contig=75 --out=-
>contig75
Length 187
Average Coverage 22.4
Edge 5' Connects to contig 73 3' with 28 reads
Edge 3' Connects to contig 76 5' with 25 reads
28 reads flow from 5' end of contig75 and terminate in contig 73
25 reads flow from 3' end of contig75 and terminate in contig 76
2 paired end reads flow from 5' end of contig75 and terminate in contig 105881 after passing through 7605.0 b.p. in other contig(s)
No paired end reads flow from 3' end of contig75


I want that contig! It goes between 73 and 76. Where is it?
I tried without the -scaffold parameter, contig numbers change, but there are still missing contigs.
dsenalik is offline   Reply With Quote
Old 11-11-2011, 07:48 AM   #2
nickloman
Senior Member
 
Location: Birmingham, UK

Join Date: Jul 2009
Posts: 356
Default

Not sure, it does look like a bug. What happens if you try without "-large" ?

Also if you instead output ACE files can you extract the contigs from there?
nickloman is offline   Reply With Quote
Old 11-11-2011, 10:47 AM   #3
dsenalik
Carrot Scientist
 
Location: Madison WI USA

Join Date: Nov 2009
Posts: 42
Default

Those are excellent suggestions, I will report back when the assembly is finished

Edit: No, the contig is not in the .ace file either. Trying without -large now.
I have this sudden fear, what if it is a memory error? I had a bad chip once before give crazy errors.
So I am replicating, assembling exactly the same way with the same data on a different computer to see.

Last edited by dsenalik; 11-11-2011 at 12:02 PM.
dsenalik is offline   Reply With Quote
Old 11-12-2011, 02:20 AM   #4
nickloman
Senior Member
 
Location: Birmingham, UK

Join Date: Jul 2009
Posts: 356
Default

I think if it was a memory error you'd be more likely to see segfaults or intermittent problems. If it's reproducible between runs then I think a logical error is more likely...

I have some developer contacts at Roche I can send this link to if you are still struggling.
nickloman is offline   Reply With Quote
Old 11-12-2011, 10:45 AM   #5
dsenalik
Carrot Scientist
 
Location: Madison WI USA

Join Date: Nov 2009
Posts: 42
Default

The replication on a different computer also shows missing contigs. So I think I can rule out memory problems.

The assembly without -large was taking forever for some reason, so nothing to report on that.

So, yes, please let your developer contacts know about this. Thanks ever so much!
dsenalik is offline   Reply With Quote
Old 12-28-2011, 06:45 AM   #6
dsenalik
Carrot Scientist
 
Location: Madison WI USA

Join Date: Nov 2009
Posts: 42
Default

Just an update on this bug, it was officially submitted to Roche back on Nov. 23, but I have not heard a word back.

Here, for anyone else who encounters it, is another different aspect of this bug (or a different bug?), the contig numbers in the graphical environment do NOT correspond to the contig numbers in the generated FASTA file.
For example, in the graphical environment, contig 00008 is the contig numbered 00009 in the FASTA file, as can be seen from the sequence lengths

dsenalik is offline   Reply With Quote
Old 01-10-2012, 11:26 AM   #7
vamosia
Member
 
Location: New York

Join Date: Mar 2009
Posts: 15
Default

my suspicions is that using -large will cause the algorithm to take shortcuts and not completely traverse the entire contigGraph (since well its too large). It may so happen that the missing contigs are those that are too large, thus it doesn't bother traversing the graph / generating the actual contigs
vamosia is offline   Reply With Quote
Old 02-12-2012, 11:44 PM   #8
Jeremy
Senior Member
 
Location: Pathum Thani, Thailand

Join Date: Nov 2009
Posts: 190
Default

This is probably relevant to the topic of this thread (bugs in v2.6) but not relevant to the original poster.
I performed a cDNA assembly using version 2.6 and also found what appeared to be missing contigs from 454AllContigs.fna. But when I look in the 454ContigGraph.txt they are contigs that were not used in the assembly and somehow have zero length but greater than zero read depth, some over 100 read depth.
I think they are safe to ignore, but it seems strange.
Jeremy is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 05:13 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO