Hi everyone,
I have about 300 SNPs from about 200 individuals for a region covering 6mb of a genomic region (reference available). All these individuals are wild caught, so there is no knowlege in terms of father, mother etc. These 200 individuals, however, can be split into 4 distinct population (A, B, C, D).
The SNP matrix (data structure) is as follows:
ID chr Population 3232_SNP1 4567_SNP2 7534_SNP3 ....
ind1 4 A G C A
ind1 4 A G T A
ind2 4 B A G NA
ind2 4 B T C G
ind3 4 A NA T G
ind3 4 A NA G A
.
.
.
A few comments: For each individual there are 2 lines. For some SNPs, due to little sequence coverage, I called a "haploid genotype" only (e.g. ind2, SNP3). This means, I can quite reliably say that one of the two genotypes (diploid organism) is an A, but I don't know about the other genotype. For some SNPs and individuals data are completely missing (e.g. ind3, SNP1).
For all these SNPs, the exact position in the genome is known (actually given by the number in front of the SNP-name).
Aim: I would now like to infer linkage blocks among these SNPs. I expect, that there are certain major haplotype-blocks, which I then also like to visualize.
How could I do this the easiest way? When informing myself a bit, I found that HaploView might be the program of choice. However, I'm not sue whether I can easily make my data file fit to the input requieries of this program.
Any suggestions/recommendations?
Btw: I generally work with "R", even though I'm far from being very good at it.
I have about 300 SNPs from about 200 individuals for a region covering 6mb of a genomic region (reference available). All these individuals are wild caught, so there is no knowlege in terms of father, mother etc. These 200 individuals, however, can be split into 4 distinct population (A, B, C, D).
The SNP matrix (data structure) is as follows:
ID chr Population 3232_SNP1 4567_SNP2 7534_SNP3 ....
ind1 4 A G C A
ind1 4 A G T A
ind2 4 B A G NA
ind2 4 B T C G
ind3 4 A NA T G
ind3 4 A NA G A
.
.
.
A few comments: For each individual there are 2 lines. For some SNPs, due to little sequence coverage, I called a "haploid genotype" only (e.g. ind2, SNP3). This means, I can quite reliably say that one of the two genotypes (diploid organism) is an A, but I don't know about the other genotype. For some SNPs and individuals data are completely missing (e.g. ind3, SNP1).
For all these SNPs, the exact position in the genome is known (actually given by the number in front of the SNP-name).
Aim: I would now like to infer linkage blocks among these SNPs. I expect, that there are certain major haplotype-blocks, which I then also like to visualize.
How could I do this the easiest way? When informing myself a bit, I found that HaploView might be the program of choice. However, I'm not sue whether I can easily make my data file fit to the input requieries of this program.
Any suggestions/recommendations?
Btw: I generally work with "R", even though I'm far from being very good at it.
Comment