SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Won't assembling reads to a reference remove SNPs and InDels? sdmoore Bioinformatics 2 06-16-2014 07:32 AM
Can I use a whole-genome SNV caller on exome sequencing data Jolin Bioinformatics 5 08-11-2013 07:04 PM
Create new Reference by merging SNV list with reference genome rdoan Bioinformatics 0 10-12-2012 07:17 AM
mpileup VCF multiple base reference indels cram Bioinformatics 1 09-19-2012 10:26 PM

Reply
 
Thread Tools
Old 10-21-2015, 10:43 AM   #1
camhabib
Junior Member
 
Location: Boston, MA

Join Date: Oct 2015
Posts: 1
Default Best way to detect SNV / InDels against reference genome?

So I've done a bit of homework but I'm still a little confused on the best way to go, hoping someone can point me in the right direction.

I recently sequenced 11 mutant strains and 1 reference strain of Bacillus (~4MB) by NextSeq Mid 2 x 75. I'm ultimately looking to compare the mutants to the reference to detect SNV/MNV as well as InDels.

So far, my understanding is to start with a de novo assembly using something like SPADes. Once I have the contigs, I can use OSLay to map these back to a reference strain, either the one I sequenced, or a previously available (master) one, of which there are several. I still have a few questions though:

1) What application should I be looking at for detecting the mutations? I imagine I'd essentially need a large alignment tool, though one that can take in to account coverage and probability of mutation would be a great feature to have (assuming not all reads give the same SNV, etc).

2) When detecting mutations, is it better to do so against the reference strain that I sequenced or against a downloaded one and just compare my reference to it as well, ignoring any commonalities?

3) Are there better programs to use than the two I listed above? Are there any that are specifically built for my purpose, and that aren't CLC Genomics that would cost me $5k (grad student budget here)?

Much appreciated everyone!
camhabib is offline   Reply With Quote
Old 10-21-2015, 12:43 PM   #2
westerman
Rick Westerman
 
Location: Purdue University, Indiana, USA

Join Date: Jun 2008
Posts: 1,104
Default

For #2, it depends how far distant the reference sequence is to your strains. It is usually best to do mapping instead of de novo assembly and if appropriate I would do that first. BBMap or Bowtie2 or BWA followed by Samtools or GATK is a good way to call SNVs. BEDtools can be used for coverage maps which will indicate longer deletions. Assembling unmapped reads can indicate longer insertions.

There are lots of programs out there and the people on the forum may suggest other, and better, ones. But in general you should be able to do your analysis on a "grad student's budget"; e.g., not much cash but lots of time.
westerman is offline   Reply With Quote
Old 10-22-2015, 11:47 AM   #3
piet
Member
 
Location: planet earth

Join Date: Aug 2014
Posts: 21
Default

Quote:
Originally Posted by camhabib View Post
I recently sequenced 11 mutant strains and 1 reference strain of Bacillus (~4MB) by NextSeq Mid 2 x 75. I'm ultimately looking to compare the mutants to the reference to detect SNV/MNV as well as InDels.
I assume that the 11 mutants are offsprings of the "reference strain" which have been generated by some mutagenic treatment. Now they show different phenotype and you want to find the genetic basis for that.

You should first assemble the genome of the parent. Unfortunately, 75-nt reads are suboptimal for this. Nevertheless, I would recomment to assemble them with spades. That will take about 5 to 10 minutes on a desktop PC, just try it out. If you are lucky the contigs will cover about 90 percent of the whole genome.

Then you can map the reads of the mutants to the contigs of the parent. Inspect the mapping in a viewer like Tablet. Its always amazing to see how clearly SNP differ from random sequencing errors.

To identify SNP programatically, you have to compute VCF files from your read mappings. A VCF file is kind of a human readable ASCII table, which lists all the SNP. My favorite to generate VCF files is freebayes.

If there is a finished genome available for your parent strain or from a very closely related strain, then you should use that genome as recommended by Westerman.

Last edited by piet; 10-22-2015 at 11:50 AM.
piet is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 11:01 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO