Hi donquijotes,
Sorry for a late reply, I'm not checking this forum very often.
First, in our practice we integrate UMIs using RT-PCR template-switching. We don't see a severe synthesis biases in UMI sequences (see
http://www.jimmunol.org/content/194/...ml?with-ds=yes figure B). Note that there is another good study covering possible biases in UMI-based sequencing (see
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3562004/).
Indeed, UMI usage distribution is log-normal. We observe it in all our datasets and explain it by PCR amplification. Once you append read mapping position to you UMI in header, you can assemble consensus sequences and forget about raw reads, counting only UMIs.
Unfortunately it is not possible to assemble reads in MIGEC based on UMI+position as it was designed for amplicon libraries.
With 10-12bp the diversity of UMIs would be 10^5 - 1.7x10^7. If you estimate it to be >> number of starting molecules, you can simply run "Checkout" and "Assemble" routines of MIGEC (see docs
here) to get a list of assembled consensuses.
Hope this helps,
Mike