Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • E coli de novo sequencing

    We are wanting to do a pathogenomic study of some E coli strains. The idea is basically to compare the genome of these strains between them and against reference genomes and analyse them for virulence factors.

    Among local NGS service providers I found these two options: either Illumina Hiseq2000, paired-ends, or 454. The 454 provider says the expected coverage is about x11-17. I wonder if it is too low? But it also provides a more complete analysis of genomes then the first provider (I'm an "end user", not a bioinformatics expert). The problem of course is cost: 454 is 3 times more expensive.

    Any suggestions or ideas? Anyone dealing with a similiar project that would like to share a bit of his experience?

    Thanks to all and have a nice day

  • #2
    We have done > 20 genomes on 454 titanium but also have a lot of Illumina data. The advantage with 454 is you can do de novo assembly or reference based.

    With Illumina reads the reference based approach would most likely be more relevant for E. coli, because you'd end up with a lot of small contigs after de novo assembly.

    However ref based assembly means you can have difficulties finding new components of the accessory genome. SNP detection vs a reference is very nice though.

    Comment


    • #3
      We are really looking for possible new components in the acessory genome and less for SNPs so we are going for de novo sequencing and then genome comparison between strains and ref. genomes.

      Comment


      • #4
        Another issue here is that of late I have seen some astounding improvements in de novo assemblers that are real game changers for small genome assembly. Using 10% of a lane of sequence from a HiScanSQ 2x100 run on a simple fragment (PE) TruSeq library assembled with ABySS-PE using kmer 70 we get a reasonable draft sequence.

        By "reasonable" I mean that for 3 Salmonella strains our N50 was >220 kb with 50% of their respective genomes in 8 or 9 contigs. Between 60 and 70 total contigs with sizes 1 kb or larger.

        This is without gap filling or mate-end libraries. Also, these are completely de novo assemblies. (Although, obviously, reference-based assemblies could have been undertaken.)

        --
        Phillip

        Comment


        • #5
          Originally posted by pmiguel View Post
          Another issue here is that of late I have seen some astounding improvements in de novo assemblers that are real game changers for small genome assembly. Using 10% of a lane of sequence from a HiScanSQ 2x100 run on a simple fragment (PE) TruSeq library assembled with ABySS-PE using kmer 70 we get a reasonable draft sequence.
          About what fold coverage of reads does this work out to?

          Comment


          • #6
            Generally >100X base coverage. In some cases we have overshot and ended up with >200X with a smaller (1 megabase) bacterial genomes--which leads to "embarrassment of riches" with the assembler. (ABySS-PE).

            --
            Phillip

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Essential Discoveries and Tools in Epitranscriptomics
              by seqadmin




              The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
              Yesterday, 07:01 AM
            • seqadmin
              Current Approaches to Protein Sequencing
              by seqadmin


              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
              04-04-2024, 04:25 PM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 04-11-2024, 12:08 PM
            0 responses
            55 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 10:19 PM
            0 responses
            52 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 09:21 AM
            0 responses
            45 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-04-2024, 09:00 AM
            0 responses
            55 views
            0 likes
            Last Post seqadmin  
            Working...
            X