View Single Post
Old 09-10-2009, 11:01 PM   #4
Location: Oslo, Norway

Join Date: Nov 2008
Posts: 415

Your extra-high coverage contig should indeed represent a repeated piece of genome, as the 454 assembly program (newbler) collapses repeats. To find out where it belongs, you can have newbler generate the contig graph file (using the -g option on the command line). This file contains information on the nodes, i.e. the contigs (length, coverage) and on the edges, i.e. the (number of) reads that cross the boundary between contigs (start in one contig and end in another). For your high-coverage contig, you will find that it has several neighbors on both ends. Your next task is to figure out which neighbors belong together (with the high-coverage contig in between), by for instance PCR. There might be sequence variants among the different copies of the repeated region, and sequencing the PCR products will tell youwhich variant belongs where. Of course, this works best for contigs that are within the length that can be sequenced by Sanger reads...

Or, you might want to invest in paired-end sequencing to place the repeats... Apparently, one run from an 8kb jumping paired end library will get you one scaffold (ordered, oriented contigs with estimates of gap sizes) for bacterial genomes (we are about to test this for one of our strains):

Good luck!
flxlex is offline   Reply With Quote