SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Calculate number of multi-mapped reads? KAP Bioinformatics 13 02-17-2017 06:07 AM
Use ChIP-seq input to calculate copy number? Jeannine Bioinformatics 4 11-19-2014 12:13 PM
calculate alignement differences atma_weapon Bioinformatics 0 04-11-2013 03:22 PM
Differences in tag number between input and IP Chris Gissendanner General 0 08-10-2012 11:57 AM
Calculate p-value of SNP cybercot Bioinformatics 0 04-20-2011 09:55 PM

Reply
 
Thread Tools
Old 02-15-2017, 12:25 AM   #1
smatamoros
Junior Member
 
Location: Amsterdam

Join Date: Mar 2015
Posts: 8
Default Calculate the number of SNP differences between two sequences

Hello,

This seems like a simple enough question but I can't find a straight answer...

I want to know how many SNP differences there are between each of my samples (=genomes). My dataset is composed of 65 bacterial genomes. I used kSNP3 to call the SNPs from the genomes using the core option, and SNP-sites to generate the VCF file from the alignment. And now I am completely stuck, for something that looks really trivial.

The fasta alignment looks like:
>seq1
AAATTTCCCGGG
>seq2
CAATTTCCCGGG
>seq3
CAAGTTCCCGGG

The sequences are the concatenated core SNPs of my whole dataset. Thus I have 1 sequence per sample, and they are aligned and all of exactly the same length (roughly 40 000 bp long).

The output I am looking for is the exact number of SNPs (or similarities) between each pair of sequence:

seq1 seq2 seq3
seq1 0
seq2 1 0
seq3 2 1 0
etc...

Does anyone know a simple way to get either from the alignment or from the resulting VCF file to the disimilarity matrix ? I have been looking into different softwares for 2 days now without success...
smatamoros is offline   Reply With Quote
Old 02-16-2017, 02:23 AM   #2
tristan dubos
Member
 
Location: France

Join Date: Dec 2015
Posts: 39
Default

Hi ,
I don't know tools for that (may be in R there's something available) and i think it's more personal scripting code . Anyways i think you can do it with excel and tab links if you don't code, don't you think?

Tristan
tristan dubos is offline   Reply With Quote
Old 02-16-2017, 03:01 AM   #3
smatamoros
Junior Member
 
Location: Amsterdam

Join Date: Mar 2015
Posts: 8
Default

Hi,

Yes I finally found 2 ways to do it: a short python script or a distance calculation in R. Excel is not possible because the sequences are longer than the maximum number of letters accepted in a single excel cell
smatamoros is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 01:33 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO