Seqanswers Leaderboard Ad

**dpryan** · 05-29-2013, 01:28 AM

You have quite an interesting project! It's pretty common that everyone you ask will have a different opinion on what the best way to approach this is, typically asking 10 people will get you 13 different answers (none of which are necessarily wrong).

Personally, I would prefer a PCA plot over the heatmap (see the "plotPCA" function), with which I think would prove a bit simpler to eye-ball clustered samples. You might also consider just using k-means clustering (or similar) to get an idea how many groups you can optimally partition your samples into, but that'll take longer.

**Etian** · 05-29-2013, 06:05 AM

Thanks for your answer dpryan. You highlight the problem, as i get many answers with no really logical, or statistical explanations, i'm a bit confused for choosing the "best" clustering method which really fits in my case. I won't be too harsh of course, i'm the same in this field. Actually, I've tried the PCA plot but it did work well because n is too large with this warning message: In brewer.pal(nlevels(fac), "Paired").
But is the strategy and the R commands appropriate?

**mbblack** · 05-29-2013, 09:32 AM

I have no idea what software you have access to, but if you have access to SAS you can use it to perform K-means clustering for all possible values of K for your dataset (2 to 40 in your case). Then use SAS's Cubic Clustering Criterion (their CCC statistic) to determine the optimal value of K for your dataset (i.e. the K with the largest CCC value). That then will define your groups.

I don't know if CCC is implemented in R as it is a SAS creation.

**sdriscoll** · 05-29-2013, 05:21 PM

Etian - one thing my lab has found useful and informative is to produce a cluster plot with R's 'hclust' function. We've used this to cluster samples and the results have been not only logical but experimentally verified.

After your line where you make the variable 'vsd'...

vsd <- varianceStabilizingTransformation( cds )

do this...

Code:

di <- dist(t(vsd), method="euclidean")
hc <- hclust(di, method="ward")
plot(hc)

When you plot you'll get what's called a dendrogam (you can find information about them online). With a dendrogram you can observe the grouping of samples based on how close they are within the tree relative to one another. Literally, the further you'd have to trace a path from one sample to another translates to how dissimilar they are. Also with a dendrogram you can draw a horizontal line "cutting" the tree to call clusters.

this link has a couple example images that show what I'm talking about:

404 Not Found

http://www.ideal.ece.utexas.edu/~gjun/ee379k/html/clustering/hac/

**Etian** · 05-30-2013, 12:58 AM

Thanks for your advice mbblack but i don't have access to SAS (I believe it's only big lab that can afford something like that) but i'll check for the CCC via R.
Thanks a lot sdriscoll, I think i' m going to use that method hclust, the info i've found for the moment seems to legitimate that approach. I just have to use
di <- dist(t(exprs(vsd)), method="euclidean") instead of
di <- dist(t(vsd), method="euclidean")

**mbblack** · 05-30-2013, 04:35 AM

Originally posted by Etian View Post

Thanks for your advice mbblack but i don't have access to SAS (I believe it's only big lab that can afford something like that) but i'll check for the CCC via R.

Actually, at least here in the USA, most major universities will have floating site licenses for several stats packages like SAS, Matlab, SPSS and so forth.

Topics	Statistics	Last Post
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, Today, 08:47 AM	0 responses 11 views 0 likes	Last Post by seqadmin Today, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 59 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 54 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM

Seqanswers Leaderboard Ad

Announcement

RNAseq datas analysis strategy and clustering

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News