SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > 454 Pyrosequencing



Similar Threads
Thread Thread Starter Forum Replies Last Post
Running assisted assemblies with runAssembly Broadie 454 Pyrosequencing 3 08-24-2011 12:31 AM
GATK count covariates error - contig lengths Hkins552 Bioinformatics 0 07-05-2011 07:16 AM
SRMA Problem SAMRecord contig does not match the current reference sequence contig gavin.oliver Bioinformatics 5 07-05-2011 05:28 AM
Multiple fragment lengths in single 454 titanium run? Tom McFarland 454 Pyrosequencing 3 05-18-2011 06:47 AM
runAssembly error mack 454 Pyrosequencing 1 07-14-2010 06:18 AM

Reply
 
Thread Tools
Old 08-20-2009, 07:49 AM   #1
DNAjunk
Member
 
Location: Western Europe

Join Date: Jun 2009
Posts: 61
Cool 454 runAssembly contig lengths

Hi

I've made assemblies of several 454 reads (in sff and fasta format) by using two different programs: runAssembly (reads in sff format) and phrap (reads in fasta format).

The contig length of the runAssembly program is much smaller than the one of phrap.

Does anybody how to set the parameters/options of the runAssembly program in order to modify the contig length?
I have tried the "notrim" option but it didn't have any effect on the contig length.

Thanks for your advice!
DNAjunk is offline   Reply With Quote
Old 08-20-2009, 10:48 PM   #2
flxlex
Moderator
 
Location: Oslo, Norway

Join Date: Nov 2008
Posts: 415
Default

I would be really careful in using phrap for assembly of 454 reads. The program was designed for assembly of Sanger reads, which have very different kind of errors than 454 reads. Even though the contigs may be longer in the phrap assembly, I would do some checks to see if they are correct (do you have a reference genome to compare the contigs to? PCR?).

runAssembly (newbler) is so far (on of) the best assembly program for 454 reads (some people might disagree?). Be sure to feed it sff files, as they contain more information (flowgrams) than just fasta and quality files.
flxlex is offline   Reply With Quote
Old 08-20-2009, 11:11 PM   #3
Tuxido
Member
 
Location: Nijmegen, Netherlands

Join Date: Jun 2009
Posts: 22
Default

We have good experience with the performance of 454 Newbler assembler. If you're doing denovo assembly however it might be a good idea to use an iterative approach. Dividing your reads into bins, assembling these separately and then assembling the contigs that you got. Especially useful if you have a bit too much coverage.

The notrim option turns off the additional read trimming in the assembly (See the 454TrimStatus.txt that is generated with the assembly).
Tuxido is offline   Reply With Quote
Old 08-21-2009, 08:09 AM   #4
DNAjunk
Member
 
Location: Western Europe

Join Date: Jun 2009
Posts: 61
Default

Thank you for your kind replies!

I appreciate any advice and suggestion because I am very new to bioinformatics and just have run my first assembly.

>flxlex:
I have compared the contigs to a reference genome. In some cases the phrap and in others the 454 contig is in agreement with the reference.
Of course, I have used quality scores also for phrap (extracted from the sff file). What information does the "flowgrams" contain?

>Tuxido:
In another study, I am doing a de novo assembly. The sequencer made 800'000 reads. I have executed runAssembly on the whole set of reads. The output was contigs ranging upto 16'000bp. Would you still bin the reads, and if so, how many into each bin?
DNAjunk is offline   Reply With Quote
Old 08-23-2009, 10:46 PM   #5
Tuxido
Member
 
Location: Nijmegen, Netherlands

Join Date: Jun 2009
Posts: 22
Default

That would depend on your genome size and your average read length (i.e. it depends on your coverage).
So starting from I guess 50x coverage it might be worthwhile to give it a try. I would just try a few different bin settings and see which one gives you the best contigs. (I only did this once myself and used 18 bins on 500Mb of sequence data for a 4.5Mb genome, which improved assembly considerably).

There's also a paper about this from Bas Dutilh
http://bioinformatics.oxfordjournals...bstract/btp377 but he uses it when working with metagenomes.
Tuxido is offline   Reply With Quote
Old 08-24-2009, 06:23 AM   #6
DNAjunk
Member
 
Location: Western Europe

Join Date: Jun 2009
Posts: 61
Default estimate genome size

>Tuxido:
Thanks for the link and advice!

It's a de novo assemly, and I don't have any reference or knowledge of the genome size. Is there a way to estimate the genome size after a 454 run with 800k reads and an average read length of 354bp?
DNAjunk is offline   Reply With Quote
Old 08-26-2009, 01:14 AM   #7
flxlex
Moderator
 
Location: Oslo, Norway

Join Date: Nov 2008
Posts: 415
Default

Quote:
Originally Posted by DNAjunk View Post
>flxlex:
I have compared the contigs to a reference genome. In some cases the phrap and in others the 454 contig is in agreement with the reference.
Of course, I have used quality scores also for phrap (extracted from the sff file). What information does the "flowgrams" contain?
The flowgram contains the signal intensity for each flow obtained during sequencing. If you run the command

sffinfo yourfile.sff

you will see among the ouput something like this:

Flowgram: 1.01 0.04 1.01 0.06 0.06 0.98 0.05 1.04
2.33 1.16 1.10 0.06 0.21 0.89 0.07 2.00 0.12 0.08
1.89 0.10 0.45 1.02 1.84 0.17 0.92 0.34 0.09 0.99

With flow order TACG, this means that at the first flow, T, the signal intensity was just over 1, meaning most likely 1 T. The next flow, A, gave no signal. etc

newbler uses this information for the assembly/quality scoring
flxlex is offline   Reply With Quote
Old 08-26-2009, 01:17 AM   #8
flxlex
Moderator
 
Location: Oslo, Norway

Join Date: Nov 2008
Posts: 415
Default

Quote:
Originally Posted by Tuxido View Post
That would depend on your genome size and your average read length (i.e. it depends on your coverage).
So starting from I guess 50x coverage it might be worthwhile to give it a try. I would just try a few different bin settings and see which one gives you the best contigs. (I only did this once myself and used 18 bins on 500Mb of sequence data for a 4.5Mb genome, which improved assembly considerably).
Tuxido, what happens to repeated regions from the genome when you do the binning approach? How do you place them in your assembly? Newbler collapses these regions into single contigs.
flxlex is offline   Reply With Quote
Old 08-30-2009, 10:55 PM   #9
Tuxido
Member
 
Location: Nijmegen, Netherlands

Join Date: Jun 2009
Posts: 22
Default

I understood that with the old Newbler repeat reads where simply not used in the assembly. While with the upgrade they are now used once. I have no idea how these end up in the final contigs. We only did such an experiment once, and purely looking at contig number and contig length, binning seemed to work very well.
Tuxido is offline   Reply With Quote
Reply

Tags
454, contig, length, runassembly

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 04:16 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO