View Single Post
Old 11-10-2015, 04:15 PM   #16
Brian Bushnell
Super Moderator
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707

I suggest remapping them with BBMap. You don't need to remap all of them; if speed is a big concern, and you have a lot of reads, you could randomly subsample 10% of the pairs (in BBMap, the flag would be "samplerate=0.1"), or fewer, though obviously the more data, the more accurate. If the reads have MD tags, it is theoretically possible to convert the cigar strings to X and = without remapping, but I have not yet written something to do that. It probably exists, though.


P.S. As long as the pairs are randomly sampled (rather than all from the beginning of the file), 5-10 million pairs is adequate for good recalibration. The recalibration is "soft"; where there is not enough data, it simply keeps the original quality score; and with more data, the output will asymptotically approach the measured quality score.

Last edited by Brian Bushnell; 11-10-2015 at 04:19 PM.
Brian Bushnell is offline   Reply With Quote