Hi,
So I have a situation with some RNA-Seq data that I want to be very sure about with regard to my statistics. I hope someone can help me go in the right direction, as I'm currently going in about five directions at once.
The problem is as follows:
I have a set of numbers, representing the number of reads that ended for each position. So for one region I might have "...0,240,553,0,160,..." representing some positions where 0 reads ended, 240 reads ended, 553 reads ended, etc. for every position.
I have another set of numbers of the same length that represents if there is a predicted fold at a position. The values are 0 = unknown; 1 = single strand; 2 = double strand. The data might look like "...0,2,2,1,1,..." representing some positions where structure is unknown, double strand, double strand, single strand, etc. for every position.
Note the two data sets are the same size and represent the same positions.
What I want to do, is ask if double strand is correlated with read ends, and by how much.
Is kendall's tau the way to go? Is logistic regression an idea worth reading more into? I really want to find the best way to model the relationship and am having some trouble deciding what to do.
Thanks in advance for any advice you can offer.
Best,
-Adam P.
So I have a situation with some RNA-Seq data that I want to be very sure about with regard to my statistics. I hope someone can help me go in the right direction, as I'm currently going in about five directions at once.
The problem is as follows:
I have a set of numbers, representing the number of reads that ended for each position. So for one region I might have "...0,240,553,0,160,..." representing some positions where 0 reads ended, 240 reads ended, 553 reads ended, etc. for every position.
I have another set of numbers of the same length that represents if there is a predicted fold at a position. The values are 0 = unknown; 1 = single strand; 2 = double strand. The data might look like "...0,2,2,1,1,..." representing some positions where structure is unknown, double strand, double strand, single strand, etc. for every position.
Note the two data sets are the same size and represent the same positions.
What I want to do, is ask if double strand is correlated with read ends, and by how much.
Is kendall's tau the way to go? Is logistic regression an idea worth reading more into? I really want to find the best way to model the relationship and am having some trouble deciding what to do.
Thanks in advance for any advice you can offer.
Best,
-Adam P.