Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • green tree
    Junior Member
    • Jun 2008
    • 6

    Finding *new* regions of DNA in genome assemblies

    I am working on an experimental evolution project. We sequenced ~50 strains of bacteria that were isolated along a time course. I want to see if these strains acquired NEW genes during the experiment.

    One option is to assemble each genome de-novo, then to inspect gene content in all 50 genomes. However, the problem with this approach is that after optimizing Velvet, my assemblies only contain 4-4.5 Mb (actual genome size is 5.2 Mb). So I am missing a lot of data.

    Is there another solution to this problem? I was thinking I could make a "super assembly" by combining ALL of the reads, then using a short read alignment tool (like Maq or Bowtie) to estimate the coverage of each genome, in all positions of this "super assembly." Then, if certain positions of the assembly have matches in only a handful of genomes, these are likely to be *new* DNA.

    If anyone can offer some advice I would very much appreciate it. Thank you.
  • westerman
    Rick Westerman
    • Jun 2008
    • 1104

    #2
    I've never done such a project. An interesting project.

    I would map each sample's reads versus the reference. Eliminate those reads. Then use Velvet (or other denovo assembler) to assemble the remaining per-sample reads. Use Glimmer (or other) to detect the genes.

    Your idea of a 'super assembly' is a good one however you might get better results via eliminating the reads that already map to the reference.

    Comment

    • rghan
      Junior Member
      • Mar 2011
      • 9

      #3
      Have you read the below paper? The paper and supplementary describes an interesting pipeline that might be useful to you.



      This links to the software pipeline they employed. We're still trying to get it to work properly in house, but we've a much larger genome then you do.

      Comment

      • Zam
        Member
        • Apr 2010
        • 51

        #4
        An alternative approach is to assemble a "graph" of all of your samples simultaneously, and then look either at the accumulation of new variants, or for which contigs are shared by which strains. Or, alternatively, you could build an assembly of yourfirst strain by standard means, and then compare this with your joint assembly of all strains, and pull out "novel" contigs that differ from your original assembly. All of these are supported by this software (disclosure - I am an author)
        cortexassembler.sourceforge.net
        You might take a look at this paper


        which does something by assembling 164 human genomes and looking for novel sequence different from the human reference

        Comment

        • Zam
          Member
          • Apr 2010
          • 51

          #5
          Oops, signed off too quickly - hope that made sense - feel free to email me if not (zam AT well.ox.ac.uk)

          Comment

          • green tree
            Junior Member
            • Jun 2008
            • 6

            #6
            Hi everyone, Thanks for the responses ! Zam, great link and interesting paper ( I was actually just thinking about this in the human population)

            Comment

            Latest Articles

            Collapse

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by SEQadmin2, 06-09-2026, 11:58 AM
            0 responses
            25 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 06-05-2026, 10:09 AM
            0 responses
            30 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 06-04-2026, 08:59 AM
            0 responses
            39 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 06-02-2026, 12:03 PM
            0 responses
            62 views
            0 reactions
            Last Post SEQadmin2  
            Working...