Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Improving 454 assembly with Illumina

    I working on de novo sequencing of 6 bacterial strains that have a genome size ~5.5 Mb each. I have had them sequenced using 454, but only had about ~15x coverage, which left a lot of contigs. I would like to complete each of the genomes therefore I'm doing pair-end Illumina sequencing to increase the coverage depth and to improve each of the overall sequence quality. I'm a microbiologist with limited computer skills, I'm slowly learning all the bioinformatics. My question is there an assembler that can easily take the 454 data and combine it with the Illumina data to generate as intact a genome as the data will allow? Or overall, what is your expert opinions on the best way to combine the data? Thanks in advance, I could really use the help.

  • #2
    I use MIRA to do this. But I must warn you that coverage depth, certainly above a certain threshold is not the major factor in getting better genome assemblies. It's more often related to the number and length of repeats and the read length you have. So adding Illumina paired-end reads may or may not improve the assembly massively. Certainly aim for the longest reads and a decent insert, perhaps 2 x 150bp reading off a 500bp fragment. You might be better off with 454 paired-end data to scaffold the contigs.

    Comment


    • #3
      Mira is probably the best option if you can get it to work.

      Another, probably less optimal method is building contigs from your paired end reads with Velvet, Abyss etc and then aligning them to your 454 contigs using Blast etc.

      Perhaps a third method is the scaffolder Sspace, which is apparently good.

      I wouldn't expect massive improvements, the only data I've seen resulted in a 10% reduction of contigs following exactly this approach.

      I wouldn't aim for completion if you have "a lot of contigs" and no close reference.

      For visualisation an easy webserver called Circoletto may be helpful.

      Comment


      • #4
        I actually have a similar question:
        I have a 454 sequenced genome and the same genome has been sequenced with illumina PE. I was wondering whether there was some experience out there what good strategies are to sort of merge the genomes and get an improved genome.

        What are good tools (open source) to merge genomes?

        Or is it better the take the 454 scaffolds and add the illumina reads directly? Are their any open source tools out there to do that?

        Or would it be better to throw all the reads (454 and illumina PE) into one pot and have an assembler that can handle this deal with it?

        Thanks in advance for any insight.

        Comment


        • #5
          With the new 2.6 software, 454 is saying Newbler can accept fastq files. They're also telling me (via tech support) that it can take Illumina reads as input.

          Has anyone tried this? Using Newbler to de novo assemble from combined 454 and Illumina reads?

          Comment


          • #6
            MIRA is a nice assembler but that I understand is that you used 454 first and now you want to improve your assembly with ilumina. That you will obtain is a better resolution for point mutations maybe. I do not think that you could improve your N50 using ilumina reads. You could try pair read 454. Or maybe pair mate readings. Remember that you want to resolve repeats that preclude better assemblies.

            Comment


            • #7
              Originally posted by ssully View Post
              With the new 2.6 software, 454 is saying Newbler can accept fastq files. They're also telling me (via tech support) that it can take Illumina reads as input.

              Has anyone tried this? Using Newbler to de novo assemble from combined 454 and Illumina reads?
              I can confirm that newbler will accept illumina reads (fastq) and assemble them, with or without 454 reads. I have not done a thorough analysis yet, though, of the assemblies, but they look OK...

              Comment


              • #8
                We have had exactly the same problem as the original poster:

                -454 reads producing about coverage 18 of the bacterial 6.8MB genome
                -Illumina 3kb mate pairs
                -Illumina 300bp paired end

                We tried Newbler for assembly of 454
                =109 contigs

                Adding Abyss de novo assemblies of 300bp and 3kb libraries to Newbler didn't achieve much
                ~99-103 contigs
                This wasn't too satisfactory.

                Alternatively, we used SSPACE with 2 Illumina libraries for scaffolding the 454 contigs.
                -300bp library only + 454 contigs ~80 scaffolds
                -3kbp library only + 454 contigs ~31 scaffolds

                -both 3kbp and 300bp libraries + 454 contigs ~ 29 scaffolds

                Really, the 29 scaffolds are 6 scaffolds with 23 singleton contigs. We're happy with SSPACE for scaffolding.

                Comment


                • #9
                  Hi Colindaven,

                  I'm the developer of SSPACE and it is sure a great result, thank you for posting it here. I've seen the same reduction during my benchmark test for SSPACE. A combination of paired-end and mate pair is a strong combination for reducing the number of contigs. For all of our testsets of bacterial genomes the number of scaffolds were less than 20 scaffolds by using a PE and MP dataset. It is great that others also see this.

                  Regards,
                  Boetsie

                  Originally posted by colindaven View Post
                  We have had exactly the same problem as the original poster:

                  -454 reads producing about coverage 18 of the bacterial 6.8MB genome
                  -Illumina 3kb mate pairs
                  -Illumina 300bp paired end

                  We tried Newbler for assembly of 454
                  =109 contigs

                  Adding Abyss de novo assemblies of 300bp and 3kb libraries to Newbler didn't achieve much
                  ~99-103 contigs
                  This wasn't too satisfactory.

                  Alternatively, we used SSPACE with 2 Illumina libraries for scaffolding the 454 contigs.
                  -300bp library only + 454 contigs ~80 scaffolds
                  -3kbp library only + 454 contigs ~31 scaffolds

                  -both 3kbp and 300bp libraries + 454 contigs ~ 29 scaffolds

                  Really, the 29 scaffolds are 6 scaffolds with 23 singleton contigs. We're happy with SSPACE for scaffolding.

                  Comment


                  • #10
                    Originally posted by colindaven View Post
                    We have had exactly the same problem as the original poster:

                    -454 reads producing about coverage 18 of the bacterial 6.8MB genome
                    -Illumina 3kb mate pairs
                    -Illumina 300bp paired end

                    We tried Newbler for assembly of 454
                    =109 contigs

                    Adding Abyss de novo assemblies of 300bp and 3kb libraries to Newbler didn't achieve much
                    ~99-103 contigs
                    This wasn't too satisfactory.

                    Alternatively, we used SSPACE with 2 Illumina libraries for scaffolding the 454 contigs.
                    -300bp library only + 454 contigs ~80 scaffolds
                    -3kbp library only + 454 contigs ~31 scaffolds

                    -both 3kbp and 300bp libraries + 454 contigs ~ 29 scaffolds

                    Really, the 29 scaffolds are 6 scaffolds with 23 singleton contigs. We're happy with SSPACE for scaffolding.
                    Thanks for the recommendation, I will definitely try SSPACE and see if we can close some of the gaps in the genome.

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Current Approaches to Protein Sequencing
                      by seqadmin


                      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                      04-04-2024, 04:25 PM
                    • seqadmin
                      Strategies for Sequencing Challenging Samples
                      by seqadmin


                      Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                      03-22-2024, 06:39 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, 04-11-2024, 12:08 PM
                    0 responses
                    25 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 10:19 PM
                    0 responses
                    29 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 09:21 AM
                    0 responses
                    25 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-04-2024, 09:00 AM
                    0 responses
                    52 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X