SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Contig vs contig or map against contig lib? JackieBadger Bioinformatics 1 05-30-2016 05:34 AM
iGenome S. pombe builds Parharn General 4 01-08-2014 04:21 AM
visualizing differences between gneome builds fadista General 4 12-16-2012 06:50 AM
Alignment to selected region of the reference genome houkto General 1 02-20-2012 05:51 AM
PacBio cuts 28% work force gallus Pacific Biosciences 3 11-10-2011 11:34 PM

Reply
 
Thread Tools
Old 07-08-2014, 02:31 PM   #1
sdmoore
Member
 
Location: Florida

Join Date: Jun 2014
Posts: 11
Default Possible to force contig builds from a selected region?

Hello SEQers,

We have a series of 10 bacterial genomes sequenced with Illumina (300-base PE reads, before read processing). We want to find SND/InDels responsible for a phenotype in 7 of them.

Using different mappers (BWA-mem, bowtie2, cushaw2) with reads processed to different extents (through trim, trimmomatic, BBduk) I can get maps that provide variant calls (mpileup or FreeBayes) for several established/known mutations (relative to the reference), but some assemblers give maps that show a change in certain genes and others don't (I can detail the differences in the maps if needed). One simple approach to interrogate the promising variant calls is to compare the same loci to those found in de novo assemblies from the same reads (an unbiased sequence).

Unfortunately, the genome assemblers we have used repeatedly spit out contigs from other regions, not the 2 or 3 specific locations we are interested in.

I don't need to build a complete genome and general gap-filling strategies do not guarantee a contig build covering the regions of interest.

Is it possible to "force" a contig builder (Velvet, BBmap, A5, etc.) to build from seeded region(s) adjacent to a locus of interest? Say, maybe start 500 bases to the left or right, then let the contig grow across the mutant locus.

All we need is some independent way (not reference-influenced) of looking at the consensus contig that covers that particular locus to strengthen our variant lists before we start Sanger sequencing potential hits.

Thanks for any help.
sdmoore is offline   Reply With Quote
Old 07-09-2014, 02:26 AM   #2
kopi-o
Senior Member
 
Location: Stockholm, Sweden

Join Date: Feb 2008
Posts: 319
Default

I believe that the PRICE assembler (http://derisilab.ucsf.edu/software/price/) is constructed for this use case. You might want to try it. I haven't had time to test it yet, but plan to apply it to a similar scenario where we want to "flesh out" an existing contig.
kopi-o is offline   Reply With Quote
Old 07-09-2014, 07:52 AM   #3
sdmoore
Member
 
Location: Florida

Join Date: Jun 2014
Posts: 11
Default

Thanks kopi-o.
The description seems right, thanks! It won't compile here at work (and not in mood to deal with yet another install project), so I'll check it out when I get home.

S
sdmoore is offline   Reply With Quote
Old 07-14-2014, 05:27 AM   #4
sdmoore
Member
 
Location: Florida

Join Date: Jun 2014
Posts: 11
Default the PRICE was right

I wanted to update the thread to say that PRICE did the trick.

At first, I had trouble getting contains to extend into certain regions (which were ambiguous in the maps as well). After consulting with the author (who was more than helpful), I ended up combining my reads into a single file and using the input option "-spfp [file.fastq]", which basically cuts each read in half and uses them as their own read pairs during assembly. This strategy avoids stalling caused by read pairs that contain substantial overlap with the partner (these were 300 base reads from a ~400 fragment library before processing, so there was likely to be a lot of overlap).

Once optimized (adjusting kmer, cut size, percent match, etc.), I could feed in a ~300 nt sequence from a region and let it grow over a region of interest. It resolved several ambiguous sites and also identified an IS insertion in one of the genes of interest (which was missed by BWA-mem, but "noisy" in Bowtie2).

I also easily assembled a plasmid that was in one of the strains using seed "contigs" from known regions. We had never fully sequenced it and this was bonus data.

Thanks again for the pointer.
sdmoore is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 10:03 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO