Go Back   SEQanswers > Bioinformatics > Bioinformatics

Similar Threads
Thread Thread Starter Forum Replies Last Post
Calculate number of multi-mapped reads? KAP Bioinformatics 13 02-17-2017 07:07 AM
Use ChIP-seq input to calculate copy number? Jeannine Bioinformatics 4 11-19-2014 01:13 PM
calculate alignement differences atma_weapon Bioinformatics 0 04-11-2013 04:22 PM
Differences in tag number between input and IP Chris Gissendanner General 0 08-10-2012 12:57 PM
Calculate p-value of SNP cybercot Bioinformatics 0 04-20-2011 10:55 PM

Thread Tools
Old 02-15-2017, 01:25 AM   #1
Junior Member
Location: Amsterdam

Join Date: Mar 2015
Posts: 8
Default Calculate the number of SNP differences between two sequences


This seems like a simple enough question but I can't find a straight answer...

I want to know how many SNP differences there are between each of my samples (=genomes). My dataset is composed of 65 bacterial genomes. I used kSNP3 to call the SNPs from the genomes using the core option, and SNP-sites to generate the VCF file from the alignment. And now I am completely stuck, for something that looks really trivial.

The fasta alignment looks like:

The sequences are the concatenated core SNPs of my whole dataset. Thus I have 1 sequence per sample, and they are aligned and all of exactly the same length (roughly 40 000 bp long).

The output I am looking for is the exact number of SNPs (or similarities) between each pair of sequence:

seq1 seq2 seq3
seq1 0
seq2 1 0
seq3 2 1 0

Does anyone know a simple way to get either from the alignment or from the resulting VCF file to the disimilarity matrix ? I have been looking into different softwares for 2 days now without success...
smatamoros is offline   Reply With Quote
Old 02-16-2017, 03:23 AM   #2
tristan dubos
Location: France

Join Date: Dec 2015
Posts: 39

Hi ,
I don't know tools for that (may be in R there's something available) and i think it's more personal scripting code . Anyways i think you can do it with excel and tab links if you don't code, don't you think?

tristan dubos is offline   Reply With Quote
Old 02-16-2017, 04:01 AM   #3
Junior Member
Location: Amsterdam

Join Date: Mar 2015
Posts: 8


Yes I finally found 2 ways to do it: a short python script or a distance calculation in R. Excel is not possible because the sequences are longer than the maximum number of letters accepted in a single excel cell
smatamoros is offline   Reply With Quote

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

All times are GMT -8. The time now is 04:18 AM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO