I am working on an experimental evolution project. We sequenced ~50 strains of bacteria that were isolated along a time course. I want to see if these strains acquired NEW genes during the experiment.
One option is to assemble each genome de-novo, then to inspect gene content in all 50 genomes. However, the problem with this approach is that after optimizing Velvet, my assemblies only contain 4-4.5 Mb (actual genome size is 5.2 Mb). So I am missing a lot of data.
Is there another solution to this problem? I was thinking I could make a "super assembly" by combining ALL of the reads, then using a short read alignment tool (like Maq or Bowtie) to estimate the coverage of each genome, in all positions of this "super assembly." Then, if certain positions of the assembly have matches in only a handful of genomes, these are likely to be *new* DNA.
If anyone can offer some advice I would very much appreciate it. Thank you.
One option is to assemble each genome de-novo, then to inspect gene content in all 50 genomes. However, the problem with this approach is that after optimizing Velvet, my assemblies only contain 4-4.5 Mb (actual genome size is 5.2 Mb). So I am missing a lot of data.
Is there another solution to this problem? I was thinking I could make a "super assembly" by combining ALL of the reads, then using a short read alignment tool (like Maq or Bowtie) to estimate the coverage of each genome, in all positions of this "super assembly." Then, if certain positions of the assembly have matches in only a handful of genomes, these are likely to be *new* DNA.
If anyone can offer some advice I would very much appreciate it. Thank you.
Comment