I had this thought too. Looking at the bwa command line, they specify the seed as -l 20bp with -k4 mismatches. I bumped up the number of mm to -k 5 (just to see) and not many more sites were found.

I think the second part of your suggestion would be quite tricky to go given the number of NGG's in the genome, but if you have a way of doing that quickly I'd be happy to hear.

Thanks for the brain storming session! I think I might try and contact the sanger guys and see if there is something I'm missing.

