Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Possible to force contig builds from a selected region?

    Hello SEQers,

    We have a series of 10 bacterial genomes sequenced with Illumina (300-base PE reads, before read processing). We want to find SND/InDels responsible for a phenotype in 7 of them.

    Using different mappers (BWA-mem, bowtie2, cushaw2) with reads processed to different extents (through trim, trimmomatic, BBduk) I can get maps that provide variant calls (mpileup or FreeBayes) for several established/known mutations (relative to the reference), but some assemblers give maps that show a change in certain genes and others don't (I can detail the differences in the maps if needed). One simple approach to interrogate the promising variant calls is to compare the same loci to those found in de novo assemblies from the same reads (an unbiased sequence).

    Unfortunately, the genome assemblers we have used repeatedly spit out contigs from other regions, not the 2 or 3 specific locations we are interested in.

    I don't need to build a complete genome and general gap-filling strategies do not guarantee a contig build covering the regions of interest.

    Is it possible to "force" a contig builder (Velvet, BBmap, A5, etc.) to build from seeded region(s) adjacent to a locus of interest? Say, maybe start 500 bases to the left or right, then let the contig grow across the mutant locus.

    All we need is some independent way (not reference-influenced) of looking at the consensus contig that covers that particular locus to strengthen our variant lists before we start Sanger sequencing potential hits.

    Thanks for any help.

  • #2
    I believe that the PRICE assembler (http://derisilab.ucsf.edu/software/price/) is constructed for this use case. You might want to try it. I haven't had time to test it yet, but plan to apply it to a similar scenario where we want to "flesh out" an existing contig.

    Comment


    • #3
      Thanks kopi-o.
      The description seems right, thanks! It won't compile here at work (and not in mood to deal with yet another install project), so I'll check it out when I get home.

      S

      Comment


      • #4
        the PRICE was right

        I wanted to update the thread to say that PRICE did the trick.

        At first, I had trouble getting contains to extend into certain regions (which were ambiguous in the maps as well). After consulting with the author (who was more than helpful), I ended up combining my reads into a single file and using the input option "-spfp [file.fastq]", which basically cuts each read in half and uses them as their own read pairs during assembly. This strategy avoids stalling caused by read pairs that contain substantial overlap with the partner (these were 300 base reads from a ~400 fragment library before processing, so there was likely to be a lot of overlap).

        Once optimized (adjusting kmer, cut size, percent match, etc.), I could feed in a ~300 nt sequence from a region and let it grow over a region of interest. It resolved several ambiguous sites and also identified an IS insertion in one of the genes of interest (which was missed by BWA-mem, but "noisy" in Bowtie2).

        I also easily assembled a plasmid that was in one of the strains using seed "contigs" from known regions. We had never fully sequenced it and this was bonus data.

        Thanks again for the pointer.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Essential Discoveries and Tools in Epitranscriptomics
          by seqadmin


          The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
          Yesterday, 07:01 AM
        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        55 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        51 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 09:21 AM
        0 responses
        45 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-04-2024, 09:00 AM
        0 responses
        55 views
        0 likes
        Last Post seqadmin  
        Working...
        X