Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • WGS of ~200 bacteria: recommended sequencing type and size

    Hello,

    We are going to sequence about 180 different strains of bacteria. These strains are all from the same genus but cover about 7 different species.

    The idea is to study phylogeny within this genus, compare whole genome data with other typing/identification systems and to find genomic markers of virulence and markers for species identification.

    We may want to map these genomes to a reference. But since there are no good reference genomes of these particular species, and that some strains can barely be assigned a species using current methods, and also because we're talking about a number of different species that we want to compare together, we may also want to do de novo assembly of the genomes.

    Mean genome size is about 2.5 Mb. We asked for a mean coverage of 50x. The sequencing center suggested a couple of options:

    a. Paired ends library, sequencing on Miseq for 2x250bp of 2x300bp reads, two sequencing runs for each group of 96 strains (or four runs of 48 strains each, I guess).

    b. Paired ends libraries, sequencing on Hiseq for 2x150 reads (they say longer reads are possible but could have some quality issues), a single run for each group of 96 strains would suffice for coverage requested.

    I guess that for mapping purposes it won't make a difference. But having in mind that we may want to do de novo assemblies, some questions:

    1) What would be the best option? Larger reads with lower coverage or shorter reads with higher coverage? Does 2x150bp to 2x300bp make a real difference during assembly?

    2) How "essential" would it be to sequence the same genomes with different library sizes? Again, considering it would be within Illumina's range possibilitites (no Pacbio, 454 etc.).

    Thank you in advance for suggestions :-)

  • #2
    You should consider sequencing one representative from each species using PacBio to generate a good reference genome. Having a reference would simplify many of the downstream analyses you want to do. With the right quality libraries this should need only one SMRTcell (or two) per species. It would be worth the additional expense.

    With HiSeq you are going to get much more data and if you have a reference then you do not need to worry about assemblies any more.

    Comment


    • #3
      Originally posted by sebl View Post
      1) What would be the best option? Larger reads with lower coverage or shorter reads with higher coverage? Does 2x150bp to 2x300bp make a real difference during assembly?
      Yes, you'll get (possibly much) larger contigs - at least if you can make an appropriately sized library (you'd want 400-700 inserts).

      Originally posted by sebl View Post
      2) How "essential" would it be to sequence the same genomes with different library sizes? Again, considering it would be within Illumina's range possibilitites (no Pacbio, 454 etc.).
      With Illumina-only data, you'd need long-insert (3 kb or better 5-8 kb) mate pairs to resolve repetetitive regions. You could possibly skip those with the kind of application you have in mind however - they are expensive and laborious.

      I'd second GenoMax's suggestion to PacBio the representative species.

      Comment


      • #4
        Thank you for the reply.

        With Illumina-only data, you'd need long-insert (3 kb or better 5-8 kb) mate pairs to resolve repetitive regions.
        We may disregard these regions for now and work on draft genomes instead. I agree that if we would like to have good reference genomes from our collection we would need more than one library type/size. For now we are wondering what kind of run would give us the best quality draft assemblies.

        Either the Hiseq or Miseq option give us enough data for mapping. But we are wondering about de novo assembling.

        The sequence center told us that a mean coverage of 50X in fact means that about 90% of the data will be on coverage <10X and the rest would be on high coverage. That is a bit disturbing because we may "miss" entire genome regions, without knowing of it. Can anyone confirm that?

        Comment


        • #5
          For quality de novo assemblies of bacterial genomes PacBio (and their HGAP assembly protocol) is the best option (at this time). If you don't have access to a local provider there are several authorized commercial/academic sequence providers that do PacBio sequencing.

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Current Approaches to Protein Sequencing
            by seqadmin


            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
            04-04-2024, 04:25 PM
          • seqadmin
            Strategies for Sequencing Challenging Samples
            by seqadmin


            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
            03-22-2024, 06:39 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 04-11-2024, 12:08 PM
          0 responses
          18 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 10:19 PM
          0 responses
          22 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 09:21 AM
          0 responses
          17 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-04-2024, 09:00 AM
          0 responses
          49 views
          0 likes
          Last Post seqadmin  
          Working...
          X