Dear all,
I am trying to calculate Hamming distance, dN and dS values for my data set. The NGS data I am working with contains multiple sequences of varying frequency/read counts. Some may appear 1000's of times, while others might occur 2-3 times. The redundancy has been removed from the final data set and the frequency of the read has been calculated and is given as part of the sequence identifier.
Can some one direct me to a program or method that will account for the number of times any one read is present in a sample by reading in the read count for the unqiue sequence and incorporating it into the calculation?
The overall mean sample dN or dS values are greatly influenced by such factors, and I have yet to see a simple application of standard methods to account for this additional information.
I am trying to calculate Hamming distance, dN and dS values for my data set. The NGS data I am working with contains multiple sequences of varying frequency/read counts. Some may appear 1000's of times, while others might occur 2-3 times. The redundancy has been removed from the final data set and the frequency of the read has been calculated and is given as part of the sequence identifier.
Can some one direct me to a program or method that will account for the number of times any one read is present in a sample by reading in the read count for the unqiue sequence and incorporating it into the calculation?
The overall mean sample dN or dS values are greatly influenced by such factors, and I have yet to see a simple application of standard methods to account for this additional information.