Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Current status de novo assembly 454 vs Solexa

    Hi All,
    We are fairly new to the nextgen seq technology and when we started our bacterial genomes the companies providing sequencing opportunities were unable to do de novo assembly of bacterial genomes with Solexa (shorter read) data. So we choose for 454 at that time.

    1) What is the current status on this since? We are planning new sequencing of strains (where we might get away with a resequencing approach) but for one bacterium there is already so much variation between strains...I guess we would need de novo for that.

    2) what kind of system requirements would we need (and software) to be able to do this? Can we use a single linux machine? We will not be doing many of these so waiting a day is not a problem.... till now.

    Thanks for your vies on this.

  • #2
    Originally posted by AlexB View Post
    1) What is the current status on this since? We are planning new sequencing of strains (where we might get away with a resequencing approach) but for one bacterium there is already so much variation between strains...I guess we would need de novo for that.
    2) what kind of system requirements would we need (and software) to be able to do this? Can we use a single linux machine? We will not be doing many of these so waiting a day is not a problem.... till now.
    We have been sequencing bacterial genomes (less than 7 Mbp each) on Illumina for 18 months now. We originally used 36bp paired-end (200 bp insert) and would usually get 50-300 scaffolds depending on repeat structure. We are moving to 80bp mate-pair (5000 bp insert) which will be very competitive with 454 FLX. Our simulations show it will get 5-30 scaffolds for the same data sets.

    You don't gain much by going above 100x coverage. You only needed 1 lane (of the 8 available on a flowcell) for a bacterial genome. With 80bp and higher yields with the GAIIx we will probably be able to multiplex 2 or even 3 bacterial genomes per lane. That's 24 bugs a fortnight!

    We use Velvet to assemble mainly, and Shrimp to map reads. Although we have an 8 core 64 GB RAM machine, a single assembly process only needs about 8 GB and 1 core to de novo assemble in about 15 minutes say. A $1000 PC can achieve that. The GAII comes with a beefy computer anyway, which you can use while prepping your samples.

    Comment


    • #3
      Thanks Torst
      this was exact the information we were looking for.
      Although we have a concern faclity with an older and a new 454 machine.....pricing seems very competitive at other core labs but they are using Solexa and couldn't do good de novo at that time.
      I will discuss this with the guys over there.
      The computer demands aren't that big indeed. We should be able to do that.

      Alex

      Comment


      • #4
        Hello,

        A few labs in our region are interested on de novo sequencing of bacterial genomes. I understand that longer inserts help resolve bigger repeats, though shouldn't paired-end reads, say from an insert library of 300bp and length around 72, do the job? I think that mate-pair is more expensive than paired-end, and even then some labs here don't have the resources to pay for paired reads. I've also been using Velvet with some simulated data, but I'm interested on checking Edena and then joining the assemblies using Minimus.

        Multiplexing de novo bacteria will be interesting ^^.

        Greetings,
        Leonardo
        L. Collado Torres, Ph.D. student in Biostatistics.

        Comment


        • #5
          Leonardo,

          Originally posted by lcollado View Post
          A few labs in our region are interested on de novo sequencing of bacterial genomes. I understand that longer inserts help resolve bigger repeats, though shouldn't paired-end reads, say from an insert library of 300bp and length around 72, do the job? I think that mate-pair is more expensive than paired-end, and even then some labs here don't have the resources to pay for paired reads. Multiplexing de novo bacteria will be interesting.
          No, an insert library of 300bp will NOT do the job in most bacterial genomes. Most insertion sequences, transposases, ribosomal RNAs etc are all larger than 300bp so you can NOT disambiguate those repeats.

          Mate pair is only slightly more expensive than paired-end. It adds about 10% to the cost only. And you can choose 2k, 5k and soon 10k and 20k insert libraries. 5k will be good for most bacterial genomes.

          We are multiplexing some bacterial genomes now, using 80bp mate pair 5k insert. It will be interesting to see if it lives up to the expectations.

          Comment


          • #6
            Wow ! A clear no. Thank you for the explanation! As I understand it, its not that it won't work but that it will generate more contigs; similar to the numbers you had before.

            I guess that if the price difference is too big (single vs mate-pair) small labs would not mind losing those genomic elements from the assembly. I just emailed those in charge of the prices here to check this point.

            I'll be looking forward to the results from multiplexing!

            Thanks again,
            Leonardo
            Last edited by lcollado; 10-06-2009, 04:30 PM. Reason: Making it clearer ^^
            L. Collado Torres, Ph.D. student in Biostatistics.

            Comment


            • #7
              Torst,

              What kind of contigs are you getting? How many, N50 size, N50 count?

              Comment


              • #8
                There is a new paper from the Broad on their assembler ALLPATHS in which they claim very high success at de novo assembly from Illumina data with a mix of 36 nt reads from short fragment libraries & 26 base reads (after subtracting mate-pair tag) from 4Kb fragment libraries.

                Comment


                • #9
                  Originally posted by kmcarr View Post
                  What kind of contigs are you getting? How many, N50 size, N50 count?
                  Our first mate-pair run is about to commence. I can only comment on paired-end runs. The results we get are totally dependent on the genome being assembled as one would expect. This ranges from 20 contigs (> 500bp) and N50 of 220kbp for simple species like Pastuerella (2.2 Mbp) all the way up to 600 contigs and N50 of 20kbp for complex species like Mycobacteria (5.6 Mbp) with 400 copies of insertion sequences like IS2404 and IS2606. These IS elements are longer than 250 bp so can not be spanned by paired-end reads, so the best you can hope for is 400 contigs.

                  Comment


                  • #10
                    Hello and happy new year ^^

                    Torst, I was wondering if you can tell us more about your mate-pair runs It's been a few months already so I'm hoping that you can comment on them. Or maybe you have a paper on the way already

                    Greetings,
                    Leonardo
                    L. Collado Torres, Ph.D. student in Biostatistics.

                    Comment


                    • #11
                      Originally posted by lcollado View Post
                      Torst, I was wondering if you can tell us more about your mate-pair runs It's been a few months already so I'm hoping that you can comment on them. Or maybe you have a paper on the way already
                      Things haven't quite gone to plans, we've had some issues with the sequencing machine. However I will have some mate-pair runs in a month or so. Some other collaborators used them to great effect with a 1.2 Mbp archaea - the MP:PE ratio was high.

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        Current Approaches to Protein Sequencing
                        by seqadmin


                        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                        04-04-2024, 04:25 PM
                      • seqadmin
                        Strategies for Sequencing Challenging Samples
                        by seqadmin


                        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                        03-22-2024, 06:39 AM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, 04-11-2024, 12:08 PM
                      0 responses
                      30 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-10-2024, 10:19 PM
                      0 responses
                      32 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-10-2024, 09:21 AM
                      0 responses
                      28 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-04-2024, 09:00 AM
                      0 responses
                      53 views
                      0 likes
                      Last Post seqadmin  
                      Working...
                      X