SEQanswers

Go Back   SEQanswers > Applications Forums > De novo discovery



Similar Threads
Thread Thread Starter Forum Replies Last Post
Comparing and merging genome assemblies megh Bioinformatics 5 07-23-2014 04:58 AM
How to convert genome coordinates from two assemblies TuA Bioinformatics 5 11-20-2013 08:43 PM
PubMed: DECOD: fast and accurate discriminative DNA motif finding. Newsbot! Literature Watch 0 01-11-2012 11:40 AM
Finding regions of enriched sequence tags droog_22 Bioinformatics 0 01-09-2012 08:09 AM
PubMed: Integrating genome assemblies with MAIA. Newsbot! Literature Watch 0 09-09-2010 03:00 AM

Reply
 
Thread Tools
Old 02-17-2012, 03:21 PM   #1
green tree
Junior Member
 
Location: USA

Join Date: Jun 2008
Posts: 6
Default Finding *new* regions of DNA in genome assemblies

I am working on an experimental evolution project. We sequenced ~50 strains of bacteria that were isolated along a time course. I want to see if these strains acquired NEW genes during the experiment.

One option is to assemble each genome de-novo, then to inspect gene content in all 50 genomes. However, the problem with this approach is that after optimizing Velvet, my assemblies only contain 4-4.5 Mb (actual genome size is 5.2 Mb). So I am missing a lot of data.

Is there another solution to this problem? I was thinking I could make a "super assembly" by combining ALL of the reads, then using a short read alignment tool (like Maq or Bowtie) to estimate the coverage of each genome, in all positions of this "super assembly." Then, if certain positions of the assembly have matches in only a handful of genomes, these are likely to be *new* DNA.

If anyone can offer some advice I would very much appreciate it. Thank you.
green tree is offline   Reply With Quote
Old 02-20-2012, 06:58 AM   #2
westerman
Rick Westerman
 
Location: Purdue University, Indiana, USA

Join Date: Jun 2008
Posts: 1,104
Default

I've never done such a project. An interesting project.

I would map each sample's reads versus the reference. Eliminate those reads. Then use Velvet (or other denovo assembler) to assemble the remaining per-sample reads. Use Glimmer (or other) to detect the genes.

Your idea of a 'super assembly' is a good one however you might get better results via eliminating the reads that already map to the reference.
westerman is offline   Reply With Quote
Old 02-20-2012, 08:01 AM   #3
rghan
Junior Member
 
Location: Reno

Join Date: Mar 2011
Posts: 9
Default

Have you read the below paper? The paper and supplementary describes an interesting pipeline that might be useful to you.

http://www.nature.com/nature/journal...ture10414.html

This links to the software pipeline they employed. We're still trying to get it to work properly in house, but we've a much larger genome then you do.

http://mus.well.ox.ac.uk/19genomes/IMR-DENOM/
rghan is offline   Reply With Quote
Old 02-20-2012, 12:13 PM   #4
Zam
Member
 
Location: Oxford

Join Date: Apr 2010
Posts: 51
Default

An alternative approach is to assemble a "graph" of all of your samples simultaneously, and then look either at the accumulation of new variants, or for which contigs are shared by which strains. Or, alternatively, you could build an assembly of yourfirst strain by standard means, and then compare this with your joint assembly of all strains, and pull out "novel" contigs that differ from your original assembly. All of these are supported by this software (disclosure - I am an author)
cortexassembler.sourceforge.net
You might take a look at this paper
http://dx.doi.org/10.1038/ng.1028

which does something by assembling 164 human genomes and looking for novel sequence different from the human reference
Zam is offline   Reply With Quote
Old 02-20-2012, 12:14 PM   #5
Zam
Member
 
Location: Oxford

Join Date: Apr 2010
Posts: 51
Default

Oops, signed off too quickly - hope that made sense - feel free to email me if not (zam AT well.ox.ac.uk)
Zam is offline   Reply With Quote
Old 02-20-2012, 03:19 PM   #6
green tree
Junior Member
 
Location: USA

Join Date: Jun 2008
Posts: 6
Default

Hi everyone, Thanks for the responses ! Zam, great link and interesting paper ( I was actually just thinking about this in the human population)
green tree is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 08:20 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2022, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO