Hey guys,
I want to perform k-means clustering of expression data (already normalized). I have two conditions (CO and HS) and three developmental stages (S1,S2,S3). So in total six samples (S1-CO,S1-HS,S2-CO,S2-HS,S3-CO and S3-HS). As I am interested in the response to HS and identifying genes with a similar HS response pattern I calculated the ratio between HS and CO (HS/CO) for each stage.
Gene____S1(HS/CO)____S2(HS/CO)____S3(HS/CO)
geneX_______10____________5____________15
geneY_______20___________10____________5
geneZ_______30___________10____________10
...
I want to perform k-means clustering but I'm not sure how to perform the preprocessing. What I have read in different publications is that people often perform at first log2-transformation and sometimes also mean-centering as well as dividing by the standard deviation.
1) Is the mean-centering based on the row or column mean?
2) Similar, do I divide by row or column standard deviation?
3) Is log2-transformation sufficient or do I need the mean-centering and/or dividision by SD?
Thanks in advance
I want to perform k-means clustering of expression data (already normalized). I have two conditions (CO and HS) and three developmental stages (S1,S2,S3). So in total six samples (S1-CO,S1-HS,S2-CO,S2-HS,S3-CO and S3-HS). As I am interested in the response to HS and identifying genes with a similar HS response pattern I calculated the ratio between HS and CO (HS/CO) for each stage.
Gene____S1(HS/CO)____S2(HS/CO)____S3(HS/CO)
geneX_______10____________5____________15
geneY_______20___________10____________5
geneZ_______30___________10____________10
...
I want to perform k-means clustering but I'm not sure how to perform the preprocessing. What I have read in different publications is that people often perform at first log2-transformation and sometimes also mean-centering as well as dividing by the standard deviation.
1) Is the mean-centering based on the row or column mean?
2) Similar, do I divide by row or column standard deviation?
3) Is log2-transformation sufficient or do I need the mean-centering and/or dividision by SD?
Thanks in advance
Comment