![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Calculate number of multi-mapped reads? | KAP | Bioinformatics | 13 | 02-17-2017 07:07 AM |
Use ChIP-seq input to calculate copy number? | Jeannine | Bioinformatics | 4 | 11-19-2014 01:13 PM |
calculate alignement differences | atma_weapon | Bioinformatics | 0 | 04-11-2013 04:22 PM |
Differences in tag number between input and IP | Chris Gissendanner | General | 0 | 08-10-2012 12:57 PM |
Calculate p-value of SNP | cybercot | Bioinformatics | 0 | 04-20-2011 10:55 PM |
![]() |
|
Thread Tools |
![]() |
#1 |
Junior Member
Location: Amsterdam Join Date: Mar 2015
Posts: 8
|
![]()
Hello,
This seems like a simple enough question but I can't find a straight answer... I want to know how many SNP differences there are between each of my samples (=genomes). My dataset is composed of 65 bacterial genomes. I used kSNP3 to call the SNPs from the genomes using the core option, and SNP-sites to generate the VCF file from the alignment. And now I am completely stuck, for something that looks really trivial. The fasta alignment looks like: >seq1 AAATTTCCCGGG >seq2 CAATTTCCCGGG >seq3 CAAGTTCCCGGG The sequences are the concatenated core SNPs of my whole dataset. Thus I have 1 sequence per sample, and they are aligned and all of exactly the same length (roughly 40 000 bp long). The output I am looking for is the exact number of SNPs (or similarities) between each pair of sequence: seq1 seq2 seq3 seq1 0 seq2 1 0 seq3 2 1 0 etc... Does anyone know a simple way to get either from the alignment or from the resulting VCF file to the disimilarity matrix ? I have been looking into different softwares for 2 days now without success... |
![]() |
![]() |
![]() |
#2 |
Member
Location: France Join Date: Dec 2015
Posts: 39
|
![]()
Hi ,
I don't know tools for that (may be in R there's something available) and i think it's more personal scripting code . Anyways i think you can do it with excel and tab links if you don't code, don't you think? Tristan |
![]() |
![]() |
![]() |
#3 |
Junior Member
Location: Amsterdam Join Date: Mar 2015
Posts: 8
|
![]()
Hi,
Yes I finally found 2 ways to do it: a short python script or a distance calculation in R. Excel is not possible because the sequences are longer than the maximum number of letters accepted in a single excel cell ![]() |
![]() |
![]() |
![]() |
Thread Tools | |
|
|