Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Bacterial genomes with Illumina - what is the best option for good assembly?

    Hi,

    I've got about 100 strains of E. Coli that should be sequenced and assembled. What is the best option for getting semi-complete genomes? My current thougts are to use >=100bp single read + 5kb mate pair.

    As E. coli strains can differ quite a bit and we want to look at the differences between these strains both at genome (organizational) level and more local differences, I would guess that de novo assembly would be preferable if it can be done. Does anyone have experience with assembly of something like this? How is the quality of the genome compared to what you get with 454?

    Cheers,
    --
    Einar Ryeng

  • #2
    I would do PE 2x100 and a Matepair of 5Kb.

    In my experience it is atleast as good if not better than 454 (only done 1x454 genome..).

    rgds
    Mads Albertsen

    Comment


    • #3
      Originally posted by MadsAlbertsen View Post
      I would do PE 2x100 and a Matepair of 5Kb.
      Thanks for the input. I was on the same track (paired end) for a moment. Guess I should go back there then.

      You don't have any thoughts on coverage as well? I'm thinking that 100x would do on the PE, but don't really know what to go for on the MP library.

      Originally posted by MadsAlbertsen View Post
      In my experience it is atleast as good if not better than 454 (only done 1x454 genome..).
      Sounds good.

      Thanks,
      --
      Einar Ryeng

      Comment


      • #4
        We did 3 Salmonella strains using v3 HiSeq chemistry. These were resequences so assembly was not required. But we did one anyway. ABySS-PE v 1.3.0, with scaffolding turned on assembled these 2x101 reads. N50 contigs lengths (counting only those > 1 kb) was 248K-283K. 60 or less scaffolds (again, counting only scaffolds longer than 1 kb) in each assemble. Oh, this was with kmer set at 80.

        Actually we had some difficulty controlling the numbers of reads so we were probably out beyond 200x. So the high kmer setting may have had the effect of reducing the input coverage to something the assembler would handle better.

        Anyway, PE only. Not even that large inserts (~250 bp).

        --
        Phillip

        Comment


        • #5
          We've done anything from 200x-2000x (depending on if we can fill the machine..) and I do not see much difference in the assembly above 300x.

          We do not use matepairs normally as we are rarely interested in "complete" genomes. Using 2x100 PE with an insert size of approximately 300 bp the assembly (#contigs,N50) is more or less proprotional to the repeat content of the genomes using the new HiSeq chemistry.

          I guess 200x PE and a low coverage matepair (25-50x?) would be fine for denovo assembly.

          If you have loads of DNA (~5 ug/sample) you could keep the PCR cycles very low and go for even lower coverage due to less GC variation.

          However, a single HiSeq flowcell is around 300Gb and I would just fill that with 96 genomes if it was me e.g. 12 pr lane = average coverage of 625 then you'll have plenty of room for concentration variation between your samples.

          rgds
          Mads

          Comment


          • #6
            We just finished a bacterial genome of 4.2Mb with 91x coverage 50bp paired-end reads and 60x 75bp 5kb mate-pair reads. After the paired-ends, we obtained 480 contigs (>200bp) after denovo assembly. Then we scaffolded with the mate-pairs with SSPACE2 Premium with the bwa aligner and obtained 56 scaffolds at 150x coverage average. Then we used SOAP Gapcloser to fill the gaps and did a second round of SSPACE2/gapcloser and obtained 15 scaffolds, min contig size 0.5Mb, max contig siz 1.25Mb

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Strategies for Sequencing Challenging Samples
              by seqadmin


              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
              03-22-2024, 06:39 AM
            • seqadmin
              Techniques and Challenges in Conservation Genomics
              by seqadmin



              The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

              Avian Conservation
              Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
              03-08-2024, 10:41 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, Yesterday, 06:37 PM
            0 responses
            10 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, Yesterday, 06:07 PM
            0 responses
            9 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-22-2024, 10:03 AM
            0 responses
            51 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-21-2024, 07:32 AM
            0 responses
            67 views
            0 likes
            Last Post seqadmin  
            Working...
            X