I working on de novo sequencing of 6 bacterial strains that have a genome size ~5.5 Mb each. I have had them sequenced using 454, but only had about ~15x coverage, which left a lot of contigs. I would like to complete each of the genomes therefore I'm doing pair-end Illumina sequencing to increase the coverage depth and to improve each of the overall sequence quality. I'm a microbiologist with limited computer skills, I'm slowly learning all the bioinformatics. My question is there an assembler that can easily take the 454 data and combine it with the Illumina data to generate as intact a genome as the data will allow? Or overall, what is your expert opinions on the best way to combine the data? Thanks in advance, I could really use the help.
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
I use MIRA to do this. But I must warn you that coverage depth, certainly above a certain threshold is not the major factor in getting better genome assemblies. It's more often related to the number and length of repeats and the read length you have. So adding Illumina paired-end reads may or may not improve the assembly massively. Certainly aim for the longest reads and a decent insert, perhaps 2 x 150bp reading off a 500bp fragment. You might be better off with 454 paired-end data to scaffold the contigs.
-
Mira is probably the best option if you can get it to work.
Another, probably less optimal method is building contigs from your paired end reads with Velvet, Abyss etc and then aligning them to your 454 contigs using Blast etc.
Perhaps a third method is the scaffolder Sspace, which is apparently good.
I wouldn't expect massive improvements, the only data I've seen resulted in a 10% reduction of contigs following exactly this approach.
I wouldn't aim for completion if you have "a lot of contigs" and no close reference.
For visualisation an easy webserver called Circoletto may be helpful.
Comment
-
I actually have a similar question:
I have a 454 sequenced genome and the same genome has been sequenced with illumina PE. I was wondering whether there was some experience out there what good strategies are to sort of merge the genomes and get an improved genome.
What are good tools (open source) to merge genomes?
Or is it better the take the 454 scaffolds and add the illumina reads directly? Are their any open source tools out there to do that?
Or would it be better to throw all the reads (454 and illumina PE) into one pot and have an assembler that can handle this deal with it?
Thanks in advance for any insight.
Comment
-
With the new 2.6 software, 454 is saying Newbler can accept fastq files. They're also telling me (via tech support) that it can take Illumina reads as input.
Has anyone tried this? Using Newbler to de novo assemble from combined 454 and Illumina reads?
Comment
-
MIRA is a nice assembler but that I understand is that you used 454 first and now you want to improve your assembly with ilumina. That you will obtain is a better resolution for point mutations maybe. I do not think that you could improve your N50 using ilumina reads. You could try pair read 454. Or maybe pair mate readings. Remember that you want to resolve repeats that preclude better assemblies.
Comment
-
Originally posted by ssully View PostWith the new 2.6 software, 454 is saying Newbler can accept fastq files. They're also telling me (via tech support) that it can take Illumina reads as input.
Has anyone tried this? Using Newbler to de novo assemble from combined 454 and Illumina reads?
Comment
-
We have had exactly the same problem as the original poster:
-454 reads producing about coverage 18 of the bacterial 6.8MB genome
-Illumina 3kb mate pairs
-Illumina 300bp paired end
We tried Newbler for assembly of 454
=109 contigs
Adding Abyss de novo assemblies of 300bp and 3kb libraries to Newbler didn't achieve much
~99-103 contigs
This wasn't too satisfactory.
Alternatively, we used SSPACE with 2 Illumina libraries for scaffolding the 454 contigs.
-300bp library only + 454 contigs ~80 scaffolds
-3kbp library only + 454 contigs ~31 scaffolds
-both 3kbp and 300bp libraries + 454 contigs ~ 29 scaffolds
Really, the 29 scaffolds are 6 scaffolds with 23 singleton contigs. We're happy with SSPACE for scaffolding.
Comment
-
Hi Colindaven,
I'm the developer of SSPACE and it is sure a great result, thank you for posting it here. I've seen the same reduction during my benchmark test for SSPACE. A combination of paired-end and mate pair is a strong combination for reducing the number of contigs. For all of our testsets of bacterial genomes the number of scaffolds were less than 20 scaffolds by using a PE and MP dataset. It is great that others also see this.
Regards,
Boetsie
Originally posted by colindaven View PostWe have had exactly the same problem as the original poster:
-454 reads producing about coverage 18 of the bacterial 6.8MB genome
-Illumina 3kb mate pairs
-Illumina 300bp paired end
We tried Newbler for assembly of 454
=109 contigs
Adding Abyss de novo assemblies of 300bp and 3kb libraries to Newbler didn't achieve much
~99-103 contigs
This wasn't too satisfactory.
Alternatively, we used SSPACE with 2 Illumina libraries for scaffolding the 454 contigs.
-300bp library only + 454 contigs ~80 scaffolds
-3kbp library only + 454 contigs ~31 scaffolds
-both 3kbp and 300bp libraries + 454 contigs ~ 29 scaffolds
Really, the 29 scaffolds are 6 scaffolds with 23 singleton contigs. We're happy with SSPACE for scaffolding.
Comment
-
Originally posted by colindaven View PostWe have had exactly the same problem as the original poster:
-454 reads producing about coverage 18 of the bacterial 6.8MB genome
-Illumina 3kb mate pairs
-Illumina 300bp paired end
We tried Newbler for assembly of 454
=109 contigs
Adding Abyss de novo assemblies of 300bp and 3kb libraries to Newbler didn't achieve much
~99-103 contigs
This wasn't too satisfactory.
Alternatively, we used SSPACE with 2 Illumina libraries for scaffolding the 454 contigs.
-300bp library only + 454 contigs ~80 scaffolds
-3kbp library only + 454 contigs ~31 scaffolds
-both 3kbp and 300bp libraries + 454 contigs ~ 29 scaffolds
Really, the 29 scaffolds are 6 scaffolds with 23 singleton contigs. We're happy with SSPACE for scaffolding.
Comment
Latest Articles
Collapse
-
by seqadmin
Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...-
Channel: Articles
04-04-2024, 04:25 PM -
-
by seqadmin
Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...-
Channel: Articles
03-22-2024, 06:39 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 04-11-2024, 12:08 PM
|
0 responses
25 views
0 likes
|
Last Post
by seqadmin
04-11-2024, 12:08 PM
|
||
Started by seqadmin, 04-10-2024, 10:19 PM
|
0 responses
29 views
0 likes
|
Last Post
by seqadmin
04-10-2024, 10:19 PM
|
||
Started by seqadmin, 04-10-2024, 09:21 AM
|
0 responses
25 views
0 likes
|
Last Post
by seqadmin
04-10-2024, 09:21 AM
|
||
Started by seqadmin, 04-04-2024, 09:00 AM
|
0 responses
52 views
0 likes
|
Last Post
by seqadmin
04-04-2024, 09:00 AM
|
Comment