Hi!
I have an interesting problem: I want to use ChIP-seq data (potentially several fold enrichment tracks) to predict a relatively rare event that we can quantify and localize with base-pair resolution. This seems to me like a supervised learning problem, in which I could choose random windows of the genome to train the model and then test it against the rest of the genome.
I don't know much about statistical/machine learning techniques, but I can foresee problems with using things like logistical regression: the ChIP seq data is of course auto-correlated, and more worryingly, because of the single bp resolution of the response data, only a tiny portion of the genome (0.6%) is above zero (is this what they mean by zero inflation?)
WWPTUMAKWTATAD? (what would people that, unlike me, actually know what they are talking about do?)
Thanks!
I have an interesting problem: I want to use ChIP-seq data (potentially several fold enrichment tracks) to predict a relatively rare event that we can quantify and localize with base-pair resolution. This seems to me like a supervised learning problem, in which I could choose random windows of the genome to train the model and then test it against the rest of the genome.
I don't know much about statistical/machine learning techniques, but I can foresee problems with using things like logistical regression: the ChIP seq data is of course auto-correlated, and more worryingly, because of the single bp resolution of the response data, only a tiny portion of the genome (0.6%) is above zero (is this what they mean by zero inflation?)
WWPTUMAKWTATAD? (what would people that, unlike me, actually know what they are talking about do?)
Thanks!