I have been trying to come up with a test for differential coverage, that given a mapping and classes for the samples ranks the genes such that genes with coverage that is as extreme or more extreme than would be predicted by a two parameter beta binomial distribution are ranked at the top. Now I am using a weighted sum over the coverages at each position in a gene, this may be less than ideal(since it doesn't take into account the probability that multiple bases could be extreme simultaneously), however it is ranking the genes in a manner which is intuitively what I expect to see.
In the following image, you can see that my test set has a gene with where the magenta, light blue, and brown colored sample have much more coverage on one end, while the red, blue, and green, don't have a jump in coverage.
Will something like this be useful? I think it will help researchers doing RNA-seq get more out of their data, than just gene level expression. Right now, I think the draw back is that the p-value if you can call it that is, essentially testing the hypothesis "Are there any genes that are more extreme", which is essentially an OR relationship, so the probabilities aren't quite as significant as one might expect, though the ranking seems to be good.
In the following image, you can see that my test set has a gene with where the magenta, light blue, and brown colored sample have much more coverage on one end, while the red, blue, and green, don't have a jump in coverage.
Will something like this be useful? I think it will help researchers doing RNA-seq get more out of their data, than just gene level expression. Right now, I think the draw back is that the p-value if you can call it that is, essentially testing the hypothesis "Are there any genes that are more extreme", which is essentially an OR relationship, so the probabilities aren't quite as significant as one might expect, though the ranking seems to be good.
Comment