I am trying to find the "true" genotype of several loci from target capture data. These are single nucleotide repeats between 15-18 bps long. The reads have been aligned to the reference. For each loci, the data includes the sequences that aligned to the reference, the length of the repeats, and the number of times each sequence was counted. So it could look something like this:
CTGATTAATC-GGGGGGGGGGGGGGG-CAGACGATAG
GGGGGGGGGGGG, 12, 581
GGGGGGGGGGG, 11, 303
GGGGGGGGGGGGG, 13, 239
GGGGGGGGGG 10, 72
I need to figure out the real allele length for this individual and if this individual is hetero- or homozygous at each allele.
I would appreciate any suggestions that anyone has.
CTGATTAATC-GGGGGGGGGGGGGGG-CAGACGATAG
GGGGGGGGGGGG, 12, 581
GGGGGGGGGGG, 11, 303
GGGGGGGGGGGGG, 13, 239
GGGGGGGGGG 10, 72
I need to figure out the real allele length for this individual and if this individual is hetero- or homozygous at each allele.
I would appreciate any suggestions that anyone has.