SEQanswers (
-   Bioinformatics (
-   -   Possible to force contig builds from a selected region? (

sdmoore 07-08-2014 02:31 PM

Possible to force contig builds from a selected region?
Hello SEQers,

We have a series of 10 bacterial genomes sequenced with Illumina (300-base PE reads, before read processing). We want to find SND/InDels responsible for a phenotype in 7 of them.

Using different mappers (BWA-mem, bowtie2, cushaw2) with reads processed to different extents (through trim, trimmomatic, BBduk) I can get maps that provide variant calls (mpileup or FreeBayes) for several established/known mutations (relative to the reference), but some assemblers give maps that show a change in certain genes and others don't (I can detail the differences in the maps if needed). One simple approach to interrogate the promising variant calls is to compare the same loci to those found in de novo assemblies from the same reads (an unbiased sequence).

Unfortunately, the genome assemblers we have used repeatedly spit out contigs from other regions, not the 2 or 3 specific locations we are interested in.

I don't need to build a complete genome and general gap-filling strategies do not guarantee a contig build covering the regions of interest.

Is it possible to "force" a contig builder (Velvet, BBmap, A5, etc.) to build from seeded region(s) adjacent to a locus of interest? Say, maybe start 500 bases to the left or right, then let the contig grow across the mutant locus.

All we need is some independent way (not reference-influenced) of looking at the consensus contig that covers that particular locus to strengthen our variant lists before we start Sanger sequencing potential hits.

Thanks for any help.

kopi-o 07-09-2014 02:26 AM

I believe that the PRICE assembler ( is constructed for this use case. You might want to try it. I haven't had time to test it yet, but plan to apply it to a similar scenario where we want to "flesh out" an existing contig.

sdmoore 07-09-2014 07:52 AM

Thanks kopi-o.
The description seems right, thanks! It won't compile here at work (and not in mood to deal with yet another install project), so I'll check it out when I get home.


sdmoore 07-14-2014 05:27 AM

the PRICE was right
I wanted to update the thread to say that PRICE did the trick.

At first, I had trouble getting contains to extend into certain regions (which were ambiguous in the maps as well). After consulting with the author (who was more than helpful), I ended up combining my reads into a single file and using the input option "-spfp [file.fastq]", which basically cuts each read in half and uses them as their own read pairs during assembly. This strategy avoids stalling caused by read pairs that contain substantial overlap with the partner (these were 300 base reads from a ~400 fragment library before processing, so there was likely to be a lot of overlap).

Once optimized (adjusting kmer, cut size, percent match, etc.), I could feed in a ~300 nt sequence from a region and let it grow over a region of interest. It resolved several ambiguous sites and also identified an IS insertion in one of the genes of interest (which was missed by BWA-mem, but "noisy" in Bowtie2).

I also easily assembled a plasmid that was in one of the strains using seed "contigs" from known regions. We had never fully sequenced it and this was bonus data.

Thanks again for the pointer.

All times are GMT -8. The time now is 01:44 AM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.