I am looking for a tool that will mask (replace with X's) regions of the reference genome that occur multiple times. So something along the lines of BLASTing the reference genome to itself, identifying regions of a given length or longer that are identical or highly identical due to multiple occurrences within the reference genome, and then replacing those positions with X's. Does anyone know of this tool?
I've collected 36 and 40 bp short reads for a eukaryotic genome that is not yet published or annotated but I do have access to contigs and super contigs. Being able to assemble to only the unique regions will simplify our analyses.
Thanks for your help.
I've collected 36 and 40 bp short reads for a eukaryotic genome that is not yet published or annotated but I do have access to contigs and super contigs. Being able to assemble to only the unique regions will simplify our analyses.
Thanks for your help.
Comment