I’ve been looking at software for using PCA to analyse SNPs derived from RADseq. Software looks at allele frequency with a score of 0, 1 or 2 in most cases, but why don’t they incorporate all the data in a SNP matrix by converting them relative to IUPAC code? i.e. N=0, A=1, M(A/C)=2, R(A/G)=3, ..., T=10.
My initial tries with this provide clusters of individuals comparable with a Bayesian tree and structure results over 2 discrete data sets, while providing somewhat similar results to that from SNPrelate. Is this legit or is there a reason I shouldn't do it?
Cheers
My initial tries with this provide clusters of individuals comparable with a Bayesian tree and structure results over 2 discrete data sets, while providing somewhat similar results to that from SNPrelate. Is this legit or is there a reason I shouldn't do it?
Cheers