Seqanswers Leaderboard Ad

**DPCook** · 11-25-2015, 05:35 AM

Hi Silvia,

What scripts are you using at the moment? That may help narrow in on the problem, if one exists. If you're not, scaling the expression values (eg. Z-score transformation) for each gene can clean up the clustering quite a bit if there are large differences between genes. And if you're scaling with the heatmap function, just make sure that it's not clustering prior to scaling (which heatmap2 does).

If you have a matrix of FPKM/TPM values with samples in the columns and genes in the rows, the following will do the transformation:

Code:

z.mat <- t(scale(t(mat), scale=TRUE, center=TRUE))

Then use the transformed matrix for the heatmap/clustering

David

**silviajserrano** · 11-25-2015, 03:34 PM

Thank you very much for your answer!!!
So, should I Z-transform the FPKM values or read counts? After I transform, should i also scale before clustering?

Silvia

**DPCook** · 11-25-2015, 04:35 PM

No problem!

Transform one of the values that takes into account total number of reads (ie. FPKM or TPM) so that differences in sequencing depth between samples don't affect things.

The Z-transformation is the scaling that you do prior to clustering. So after that, all you have to do is generate the heatmap/clustering. If you were using a scale option built into the heatmap function (something like heatmap(scale="row"....), you no longer need to.

I don't know how your data is currently set up, but ideally you can get to a matrix where each column name represents a single cancer sample, and each row represents a single gene. From there, I would use something like the following:

Code:

library(gplots) #I like the heatmap.2 function, but not necessary

#Our matrix is stored in a variable called 'mat'
z.mat <- t(scale(t(mat), center=TRUE, scale=TRUE) #Z-transform--the scale function transforms column values, but we want to transform our rows, hence the double transpositions
heatmap.2(z.mat, dendrogram="both", scale="none", trace="none")

And adjust the options for the heatmap as you see fit. I'm not sure that this will be the magic fix you're looking for, but hopefully it helps!

David

**silviajserrano** · 11-25-2015, 07:01 PM

ok!!! Nice!!
So.. (I'm sorry to repeat everything again)
When I do the Z-transformation there is no need of log-transformation?
After this transformation there is no need to scale again

These are the commands I have been using

install.packages("gplots")
install.packages("RColorBrewer")
library (gplots)
library (RColorBrewer)

x1 <- read.csv("x.csv", header = T, row.names = 1)
log_x1=log1p(x2)
todos <- as.matrix(log_x1)
myheatcol <- rev(redgreen(75))
heatmap.2(todos, key=T, symkey=F, trace="none", scale="column", dendrogram = c("column"), Rowv=T, keysize=2, cexRow=0.5, cexCol=0.5, col=myheatcol)

**DPCook** · 11-26-2015, 03:55 AM

Yeah, log transformation may be a good thing to do prior to Z-score calculation

Because you have dendrogram="column", I'm guessing you have samples in the columns, right? I notice two things about the heatmap.2 line that may be messing things up a bit. First, you have scale="column", so assuming samplesIDs are along the columns, you'll be doing calculating Z-scores within one sample, and across all genes. We want to scale the rows, calculating Z-score within genes, but across samples. This is because you want these two hypothetical genes to be weighed the same when clustering:
-Gene A: Mean FPKM=30, SD=5, FPKM in Sample A=40 (Z-score=2)
-Gene B: Mean FPKM=200, SD=30 FPKM in Sample A=260 (Z-score=2)
So relative to the underlying distribution of Gene A and B, you can see that they are equally enriched in Sample A. But if you don't do the transformation, Gene B will have more of an affect on the clustering because the difference in absolute FPKM is much more than the difference of Gene A.

The second thing is that heatmap.2 clusters prior to its internal scaling option. So if these large difference described above are present, it will throw clustering regardless of whether you use the scale option or not. That's why I recommend scaling prior to using heatmap.2.

Try the following changes and let me know how it goes:

Code:

x1 <- read.csv("x.csv", header = T, row.names = 1)
log_x1=log1p(x2)
todos <- as.matrix(log_x1)
[B]z.tudos <- t(scale(t(tudos), scale=TRUE, center=TRUE) #Z transformation[/B]
myheatcol <- rev(redgreen(75))
heatmap.2([B]z.todos[/B], key=T, symkey=F, trace="none", scale="[B]none[/B]", dendrogram = c("column"), Rowv=T, keysize=2, cexRow=0.5, cexCol=0.5, col=myheatcol)

**gringer** · 11-27-2015, 02:13 AM

Look at the density plot of your expression values:

Code:

plot(density(todos));

If it looks vaguely like a normal distribution, then a heatmap will probably look good, as long as you're not scaling any values -- I notice that you *are* scaling by column; don't do that. I think heatmap by default scales by row, which also doesn't look good.

I've got the best success with heatmaps (and PCAs) by using the VST transformation of DESeq2, but first filtering out genes with low counts, and also normalising by transcript length (I call it VSTPk).

**silviajserrano** · 12-02-2015, 01:39 PM

I did the plot and the distribution looks normal, I'm trying now with the heatmap. I think I will try to analyze my data using DEseq2 because i used RSEM but I would like to try using different methods.. DEseq2 allow yoy to filter genes with low counts?

Originally posted by gringer View Post

Look at the density plot of your expression values:

Code:

plot(density(todos));

If it looks vaguely like a normal distribution, then a heatmap will probably look good, as long as you're not scaling any values -- I notice that you *are* scaling by column; don't do that. I think heatmap by default scales by row, which also doesn't look good.

I've got the best success with heatmaps (and PCAs) by using the VST transformation of DESeq2, but first filtering out genes with low counts, and also normalising by transcript length (I call it VSTPk).

**gringer** · 12-02-2015, 01:42 PM

For generating my VSTPk matrix, I did a manual low-count filtering first before feeding into DESeq2 to generate the matrix.

For differential expression, DESeq2 does it's own filtering to deal with low counts, but bear in mind that the model requires you to be using real gene-level counts, rather than estimated (e.g. isoform) counts. This probably also applies to the VST matrix. The results you get from feeding RSEM/cufflinks results into DESeq2 may not be statistically sound.

**silviajserrano** · 12-02-2015, 01:52 PM

Heyy!!
I'm trying to do the changes you suggested but when i do the Z-transformation it appears "+" symbol after
z.todos <- t(scale(t(todos), scale=TRUE, center=TRUE) #Z transformation

> z.todos <- t(scale(t(todos), scale=TRUE, center=TRUE) #Z transformation
+

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 25 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 28 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 24 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 52 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

RNA-seq data and HEATMAP

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News