SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
PCA vs MDS plots hchang10 Bioinformatics 0 06-23-2013 02:01 AM
IUPAC for PCA instead of allele frequency craigledee Bioinformatics 0 06-13-2013 08:19 PM
PCA interpretation gene_x Bioinformatics 0 03-13-2013 11:18 AM
PCA with cufflinks data greener Bioinformatics 6 10-02-2012 01:34 PM
v3: Effect of high cluster densities on cluster PF and %Q30 pmiguel Illumina/Solexa 3 10-05-2011 05:36 AM

Reply
 
Thread Tools
Old 09-25-2013, 02:10 AM   #1
rozitaa
Member
 
Location: Sweden

Join Date: Jun 2013
Posts: 51
Red face How to cluster samples with PCA in R?

Hi,

I have RNA-seq data for 16 mouse samples. I would like to cluster cufflinks results of these samples by PCA in R. The data looks like this:


PHP Code:
gene_id    gene_short_name    FPKM_101    FPKM_102    FPKM_103    FPKM_104    FPKM_105    FPKM_106    FPKM_107    FPKM_108    FPKM_109    FPKM_110    FPKM_111    FPKM_112    FPKM_113    FPKM_114    FPKM_115    FPKM_116
uc007aeu.1    
-    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
uc007aev.1    
-    0    0    0.0095358    0.0095358    0.011704    1.48E-10    2.05E-63    0.0083273    0.014457    0.0068505    0.0053635    0.022235    0.0047757    0.018794    0    0.01661
uc007aew.1    
-    0    0    0    0    0    1.2568    0.27389    0    0    0    0    0    0    0    0    0
uc007aex.2    
-    0    0    0    0    0    7.1538    0.0096687    0    0    0    0    0    0    0    0.0050925    0
uc007aey.1    
-    8.27E-07    0.00049201    0.00043141    0.00043141    0.00079353    0    0.00074324    1.56E-09    3.39E-20    1.16E-61    1.80E-20    1.72E-09    1.56E-96    5.13E-07    5.34E-07    4.78E-07
uc007afb.1    
-    0.40549    2.08E-248    1.11E-19    1.11E-19    2.40E-93    0    0.49777    0.10711    1.22E-12    0.014644    6.48E-13    0.11777    0.02695    0.25169    0.08951    0.080144
uc007afc.1    
-    1.93E-06    0.38845    0.34061    0.34061    0.31315    0    0    5.01E-09    1.04E-19    0.00046243    5.53E-20    5.51E-09    0.17202    1.20E-06    1.16E-06    1.04E-06 
Commands in R:
PHP Code:
data=read.csv('raw_cuff_data.csv'header=TRUE)
data_pca <- prcomp(data[, 3:18]) 
How should I plot them in order to see clustered samples. I know my pca data are in data_pca$x, but how should I cluster them? and plot one point for each sample?

Thanks
rozitaa is offline   Reply With Quote
Old 09-25-2013, 02:31 AM   #2
dariober
Senior Member
 
Location: Cambridge, UK

Join Date: May 2010
Posts: 311
Default

Quote:
Originally Posted by rozitaa View Post
Hi,

I have RNA-seq data for 16 mouse samples. I would like to cluster cufflinks results of these samples by PCA in R. The data looks like this:

How should I plot them in order to see clustered samples. I know my pca data are in data_pca$x, but how should I cluster them? and plot one point for each sample?

Thanks
Hi - Try something along these lines...

Code:
## Test data:
dat<- matrix(data= rnorm(n= 10000), ncol= 10)
colnames(dat)<- paste('sample_', 1:ncol(dat), sep= '')

## PCA
pcaResult<-prcomp(t(dat))

## Set up plot
plot(pcaResult$x,
    main= 'Principal components of samples',
    xlab= sprintf('PC1 (sd: %s%%)', round(100 * (pcaResult$sdev[1] / sum(pcaResult$sdev)))),
    ylab= sprintf('PC2 (sd: %s%%)', round(100 * (pcaResult$sdev[2] / sum(pcaResult$sdev)))),
    type= 'n'
)

## Plot labels
text(x= pcaResult$x[,1], y= pcaResult$x[,2], labels= rownames(pcaResult$x), cex= 0.5)
Samples similar to each other will group together in the plot but keep in mind that PCA doesn't do any clustering.

Dario
dariober is offline   Reply With Quote
Old 09-25-2013, 03:15 AM   #3
rozitaa
Member
 
Location: Sweden

Join Date: Jun 2013
Posts: 51
Default

Thanks, a lot. I also transposed my data and I got the exact pca plot that I wanted. I just wasn't sure if that's the correct way.
rozitaa is offline   Reply With Quote
Old 09-25-2013, 07:01 PM   #4
Jeremy
Senior Member
 
Location: Pathum Thani, Thailand

Join Date: Nov 2009
Posts: 190
Default

There is a nice PCA generated in the DESeq2 vignette that you could look at and apply to your cufflinks data.
Jeremy is offline   Reply With Quote
Reply

Tags
clustering, pca plot, r programming

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 04:36 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO