Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Best way to detect SNV / InDels against reference genome?

    So I've done a bit of homework but I'm still a little confused on the best way to go, hoping someone can point me in the right direction.

    I recently sequenced 11 mutant strains and 1 reference strain of Bacillus (~4MB) by NextSeq Mid 2 x 75. I'm ultimately looking to compare the mutants to the reference to detect SNV/MNV as well as InDels.

    So far, my understanding is to start with a de novo assembly using something like SPADes. Once I have the contigs, I can use OSLay to map these back to a reference strain, either the one I sequenced, or a previously available (master) one, of which there are several. I still have a few questions though:

    1) What application should I be looking at for detecting the mutations? I imagine I'd essentially need a large alignment tool, though one that can take in to account coverage and probability of mutation would be a great feature to have (assuming not all reads give the same SNV, etc).

    2) When detecting mutations, is it better to do so against the reference strain that I sequenced or against a downloaded one and just compare my reference to it as well, ignoring any commonalities?

    3) Are there better programs to use than the two I listed above? Are there any that are specifically built for my purpose, and that aren't CLC Genomics that would cost me $5k (grad student budget here)?

    Much appreciated everyone!

  • #2
    For #2, it depends how far distant the reference sequence is to your strains. It is usually best to do mapping instead of de novo assembly and if appropriate I would do that first. BBMap or Bowtie2 or BWA followed by Samtools or GATK is a good way to call SNVs. BEDtools can be used for coverage maps which will indicate longer deletions. Assembling unmapped reads can indicate longer insertions.

    There are lots of programs out there and the people on the forum may suggest other, and better, ones. But in general you should be able to do your analysis on a "grad student's budget"; e.g., not much cash but lots of time.

    Comment


    • #3
      Originally posted by camhabib View Post
      I recently sequenced 11 mutant strains and 1 reference strain of Bacillus (~4MB) by NextSeq Mid 2 x 75. I'm ultimately looking to compare the mutants to the reference to detect SNV/MNV as well as InDels.
      I assume that the 11 mutants are offsprings of the "reference strain" which have been generated by some mutagenic treatment. Now they show different phenotype and you want to find the genetic basis for that.

      You should first assemble the genome of the parent. Unfortunately, 75-nt reads are suboptimal for this. Nevertheless, I would recomment to assemble them with spades. That will take about 5 to 10 minutes on a desktop PC, just try it out. If you are lucky the contigs will cover about 90 percent of the whole genome.

      Then you can map the reads of the mutants to the contigs of the parent. Inspect the mapping in a viewer like Tablet. Its always amazing to see how clearly SNP differ from random sequencing errors.

      To identify SNP programatically, you have to compute VCF files from your read mappings. A VCF file is kind of a human readable ASCII table, which lists all the SNP. My favorite to generate VCF files is freebayes.

      If there is a finished genome available for your parent strain or from a very closely related strain, then you should use that genome as recommended by Westerman.
      Last edited by piet; 10-22-2015, 11:50 AM.

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Current Approaches to Protein Sequencing
        by seqadmin


        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
        04-04-2024, 04:25 PM
      • seqadmin
        Strategies for Sequencing Challenging Samples
        by seqadmin


        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
        03-22-2024, 06:39 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, 04-11-2024, 12:08 PM
      0 responses
      18 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 10:19 PM
      0 responses
      22 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 09:21 AM
      0 responses
      16 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-04-2024, 09:00 AM
      0 responses
      47 views
      0 likes
      Last Post seqadmin  
      Working...
      X