Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Possible to force contig builds from a selected region?

    Hello SEQers,

    We have a series of 10 bacterial genomes sequenced with Illumina (300-base PE reads, before read processing). We want to find SND/InDels responsible for a phenotype in 7 of them.

    Using different mappers (BWA-mem, bowtie2, cushaw2) with reads processed to different extents (through trim, trimmomatic, BBduk) I can get maps that provide variant calls (mpileup or FreeBayes) for several established/known mutations (relative to the reference), but some assemblers give maps that show a change in certain genes and others don't (I can detail the differences in the maps if needed). One simple approach to interrogate the promising variant calls is to compare the same loci to those found in de novo assemblies from the same reads (an unbiased sequence).

    Unfortunately, the genome assemblers we have used repeatedly spit out contigs from other regions, not the 2 or 3 specific locations we are interested in.

    I don't need to build a complete genome and general gap-filling strategies do not guarantee a contig build covering the regions of interest.

    Is it possible to "force" a contig builder (Velvet, BBmap, A5, etc.) to build from seeded region(s) adjacent to a locus of interest? Say, maybe start 500 bases to the left or right, then let the contig grow across the mutant locus.

    All we need is some independent way (not reference-influenced) of looking at the consensus contig that covers that particular locus to strengthen our variant lists before we start Sanger sequencing potential hits.

    Thanks for any help.

  • #2
    I believe that the PRICE assembler (http://derisilab.ucsf.edu/software/price/) is constructed for this use case. You might want to try it. I haven't had time to test it yet, but plan to apply it to a similar scenario where we want to "flesh out" an existing contig.

    Comment


    • #3
      Thanks kopi-o.
      The description seems right, thanks! It won't compile here at work (and not in mood to deal with yet another install project), so I'll check it out when I get home.

      S

      Comment


      • #4
        the PRICE was right

        I wanted to update the thread to say that PRICE did the trick.

        At first, I had trouble getting contains to extend into certain regions (which were ambiguous in the maps as well). After consulting with the author (who was more than helpful), I ended up combining my reads into a single file and using the input option "-spfp [file.fastq]", which basically cuts each read in half and uses them as their own read pairs during assembly. This strategy avoids stalling caused by read pairs that contain substantial overlap with the partner (these were 300 base reads from a ~400 fragment library before processing, so there was likely to be a lot of overlap).

        Once optimized (adjusting kmer, cut size, percent match, etc.), I could feed in a ~300 nt sequence from a region and let it grow over a region of interest. It resolved several ambiguous sites and also identified an IS insertion in one of the genes of interest (which was missed by BWA-mem, but "noisy" in Bowtie2).

        I also easily assembled a plasmid that was in one of the strains using seed "contigs" from known regions. We had never fully sequenced it and this was bonus data.

        Thanks again for the pointer.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM
        • seqadmin
          Techniques and Challenges in Conservation Genomics
          by seqadmin



          The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

          Avian Conservation
          Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
          03-08-2024, 10:41 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, Yesterday, 06:37 PM
        0 responses
        10 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, Yesterday, 06:07 PM
        0 responses
        9 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-22-2024, 10:03 AM
        0 responses
        51 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-21-2024, 07:32 AM
        0 responses
        67 views
        0 likes
        Last Post seqadmin  
        Working...
        X