Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Finding *new* regions of DNA in genome assemblies

    I am working on an experimental evolution project. We sequenced ~50 strains of bacteria that were isolated along a time course. I want to see if these strains acquired NEW genes during the experiment.

    One option is to assemble each genome de-novo, then to inspect gene content in all 50 genomes. However, the problem with this approach is that after optimizing Velvet, my assemblies only contain 4-4.5 Mb (actual genome size is 5.2 Mb). So I am missing a lot of data.

    Is there another solution to this problem? I was thinking I could make a "super assembly" by combining ALL of the reads, then using a short read alignment tool (like Maq or Bowtie) to estimate the coverage of each genome, in all positions of this "super assembly." Then, if certain positions of the assembly have matches in only a handful of genomes, these are likely to be *new* DNA.

    If anyone can offer some advice I would very much appreciate it. Thank you.

  • #2
    I've never done such a project. An interesting project.

    I would map each sample's reads versus the reference. Eliminate those reads. Then use Velvet (or other denovo assembler) to assemble the remaining per-sample reads. Use Glimmer (or other) to detect the genes.

    Your idea of a 'super assembly' is a good one however you might get better results via eliminating the reads that already map to the reference.

    Comment


    • #3
      Have you read the below paper? The paper and supplementary describes an interesting pipeline that might be useful to you.



      This links to the software pipeline they employed. We're still trying to get it to work properly in house, but we've a much larger genome then you do.

      Comment


      • #4
        An alternative approach is to assemble a "graph" of all of your samples simultaneously, and then look either at the accumulation of new variants, or for which contigs are shared by which strains. Or, alternatively, you could build an assembly of yourfirst strain by standard means, and then compare this with your joint assembly of all strains, and pull out "novel" contigs that differ from your original assembly. All of these are supported by this software (disclosure - I am an author)
        cortexassembler.sourceforge.net
        You might take a look at this paper


        which does something by assembling 164 human genomes and looking for novel sequence different from the human reference

        Comment


        • #5
          Oops, signed off too quickly - hope that made sense - feel free to email me if not (zam AT well.ox.ac.uk)

          Comment


          • #6
            Hi everyone, Thanks for the responses ! Zam, great link and interesting paper ( I was actually just thinking about this in the human population)

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Current Approaches to Protein Sequencing
              by seqadmin


              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
              04-04-2024, 04:25 PM
            • seqadmin
              Strategies for Sequencing Challenging Samples
              by seqadmin


              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
              03-22-2024, 06:39 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 04-11-2024, 12:08 PM
            0 responses
            23 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 10:19 PM
            0 responses
            24 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 09:21 AM
            0 responses
            21 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-04-2024, 09:00 AM
            0 responses
            52 views
            0 likes
            Last Post seqadmin  
            Working...
            X