I've clustered expression profiles from 20 experiments for a large group of highly related genes. I have the raw read counts and normalized this data using [total read counts uniquely matching to a gene]/[total counts in experiment]*[length of transcript]. However, because these genes have a large amount of non-unique sequence, I don't think that this method is correct. I'd like to try normalizing the expression data based on [length of unique k-mers within transcript] rather than [length of transcript]. Is there an existing tool that can calculate this?
Thanks in advance any help!
Thanks in advance any help!
Comment