Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Determine similarity from NGS data

    Hi,

    I have 250bp paired-end sequencing data (Illumina MiSeq, k-mer coverage ~18) of four E. coli strains. What I want to determine is how similar the strains are.

    I imagine this could be done on the basis of the raw data alone, thus without trying to assemble the individual genomes (at least for a first rough approximation of similarity). Can anyone suggest to me what would be the best approach for this?

    Another option would be to take the largest scaffold currently available for one strain, and map the reads of each of the strains on to this, and compare. The data is all from the same sequencing run, and on the basis of fastQ quality metric cannot by eye be held apart. It think it would be reasonable to assume that technical errors are equally distributed. Thus, after trimming and quality filtering using the same settings, dissimilarities can be assessed. For the determination of the amount of SNPs I would need to take into account the sequencing error rate though (0.80%, http://bmcgenomics.biomedcentral.com...71-2164-13-341), however, since during the assembly many sequencing errors are discarded I don't know how to disentangle the true SNPs from sequencing errors. Any suggestions how to tackle this issue are appreciated.

  • #2
    You may be able to do this by finding strain specific kmers: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4005670/
    BBMap
    suite has k-mer identification programs you can use, in addition to the programs in the paper above.

    Comment


    • #3
      You can substantially reduce sequencing error rate by error-correcting the data, which can be done for example using Tadpole. 18x is pretty low for good error-correction or assembly, though. If the reads mostly overlap, you can also achieve some degree of error-correction by merging them using e.g. BBMerge.

      I would probably assemble each strain (using adapter-trimmed, error-corrected, merged [if they mostly overlap] reads), and then do all 16 mappings of reads to assemblies to estimate SNP rates from pairwise error rates. For example if strain 1 has a 0.1% substitution rate when mapped to its own assembly and strain 2 has a 0.7% substitution rate when mapped to strain 1's assembly, then probably, there is a 0.6% SNP rate between strain 1 and strain 2.
      Last edited by Brian Bushnell; 03-15-2016, 09:25 PM.

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Strategies for Sequencing Challenging Samples
        by seqadmin


        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
        03-22-2024, 06:39 AM
      • seqadmin
        Techniques and Challenges in Conservation Genomics
        by seqadmin



        The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

        Avian Conservation
        Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
        03-08-2024, 10:41 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, 03-27-2024, 06:37 PM
      0 responses
      12 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 03-27-2024, 06:07 PM
      0 responses
      11 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 03-22-2024, 10:03 AM
      0 responses
      53 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 03-21-2024, 07:32 AM
      0 responses
      68 views
      0 likes
      Last Post seqadmin  
      Working...
      X