I have a bunch of contigs assembled from 454 data. I also have a stack of Solexa reads from the same organism, and I would like to locate and correct homoplymeric errors using the Solexa reads. Now, using BWA with default options give me a big pile of hits. Inspecting the CIGAR field I find roughly the same amount of insertions and deletions - that was a bit surprising as I expected more deletions (homopolymers)?
Also, I don't see any mismatches - the SAM format description of CIGAR is unclear on this point. How do I locate mismatches?
What do other people do for cleaning homopolymeric errors?
Martin
Also, I don't see any mismatches - the SAM format description of CIGAR is unclear on this point. How do I locate mismatches?
What do other people do for cleaning homopolymeric errors?
Martin