I am working with transcriptomes of two species, neither of which have a reference. The reads are 100 bp PE. My strategy has been to make a reference de novo from each library using abyss-pe. Mapping the reads was done in bwa. I map each library to its own reference and that of the other species for SNP identification (i.e. library 1 mapped to reference 1, library 2 mapped to reference 1, etc.).
This strategy has worked great, and some SNPs have been validated empirically. However, I have recently noticed a small but significant number of SNPs in each library that are mapped library 1 to library 1, and yet the variant allele is at like 98%. Given that this is the same library that generated the reference, shouldn't this not happen?
Does anyone know why this happens?
This strategy has worked great, and some SNPs have been validated empirically. However, I have recently noticed a small but significant number of SNPs in each library that are mapped library 1 to library 1, and yet the variant allele is at like 98%. Given that this is the same library that generated the reference, shouldn't this not happen?
Does anyone know why this happens?