SEQanswers

SEQanswers (http://seqanswers.com/forums/index.php)
-   Bioinformatics (http://seqanswers.com/forums/forumdisplay.php?f=18)
-   -   Heatmap of RNA-Seq Data in R (http://seqanswers.com/forums/showthread.php?t=24839)

cpleis 11-08-2012 08:32 AM

Heatmap of RNA-Seq Data in R
 
Hello,
I have a large data set from RNA sequencing and I am trying to make a heatmap of my data. I have am having issues formatting my heatmap figure. My data set is large with the log2 fold change for over 6oo genes across 4 treatments. My csv file is formatted as such:
Gene Drought Ozone Temp1 Temp2
Glyma# -0.130545875 -0.098349739 0.170508007 0.091996284
....
So far I have gotten an image, but I can't seem to get the gene names to
display properly. Here is my code:

heatdata <- read.csv("logFC_bin17.csv", sep=",")
heatdata <- heatdata[,2:5]
heatdata_matrix <- data.matrix(heatdata)
rownames(heatdata_matrix) = paste("Gene", 2:655)
jpeg("Heatmap_bin17.jpeg", width=8, height=8, units="in", res=300,
quality=100)
data_heatmap <- heatmap.2(heatdata_matrix, col=redgreen(75), scale="row",
key=TRUE, symkey=FALSE, density.info="none", trace="none", margins=c(10,10), labRow =rownames(heatdata_matrix),cexRow=0.5)
dev.off()

Everything about my image is fine except the right axis with the gene ID labels is blurred together (I am having issues uploading the image). It looks like 4 thin rectangles covering up the data. I want it to look like this: http://www.r-bloggers.com/r-heatmaps-with-gplots/

Any suggestions for how to modify the code would be great. I've done a lot of R searches and can't seem to find any modifications that fix my data.
Thanks!

usad 11-08-2012 01:24 PM

difficult to say what you mean with these boxes.
However, if you have 600 genes, it will be neigh impossible to display them on a screen. Let's assume your gene label needs to be 5 pixels high, then displaying 600 genes would require 3000 pixels in y-direction much more than even a 'retina' macbook has. Using the absolute minimum of 4 pixels (which looks really bad just google 4px font, I am not sure if R has such a font) you still weren't able to display this correctly. (And in your jpeg you would just manage 8inch high by 300dpi =2400 pixels = 4* 600)

If you are fine with panning and zooming around in your jpeg you might try the different cex parameters. (But increase the size, and best use pixels as image size, as this is easier to calculate)


OTOH the soy gene you give as an example is probably not even worth showing (hardly any changes, is it significant??) so you might want to look at
some more filtering.


Cheers
b

Gig77 11-09-2012 02:12 PM

Try to increase the height of your jpeg and see what you get.

SHeaph 01-18-2013 06:19 AM

Quote:

Originally Posted by cpleis (Post 88844)
Hello,
I have a large data set from RNA sequencing and I am trying to make a heatmap of my data. I have am having issues formatting my heatmap figure. My data set is large with the log2 fold change for over 6oo genes across 4 treatments. My csv file is formatted as such:
Gene Drought Ozone Temp1 Temp2
Glyma# -0.130545875 -0.098349739 0.170508007 0.091996284
....
So far I have gotten an image, but I can't seem to get the gene names to
display properly. Here is my code:

heatdata <- read.csv("logFC_bin17.csv", sep=",")
heatdata <- heatdata[,2:5]
heatdata_matrix <- data.matrix(heatdata)
rownames(heatdata_matrix) = paste("Gene", 2:655)
jpeg("Heatmap_bin17.jpeg", width=8, height=8, units="in", res=300,
quality=100)
data_heatmap <- heatmap.2(heatdata_matrix, col=redgreen(75), scale="row",
key=TRUE, symkey=FALSE, density.info="none", trace="none", margins=c(10,10), labRow =rownames(heatdata_matrix),cexRow=0.5)
dev.off()

Everything about my image is fine except the right axis with the gene ID labels is blurred together (I am having issues uploading the image). It looks like 4 thin rectangles covering up the data. I want it to look like this: http://www.r-bloggers.com/r-heatmaps-with-gplots/

Any suggestions for how to modify the code would be great. I've done a lot of R searches and can't seem to find any modifications that fix my data.
Thanks!

Hi cpleis,

I was just wondering if you were able to resolve this issue? I am having similar difficulties at the moment, I have just over 200 hits and the gene ID label is also blurred. I am new to cummeRbund and R and I created the map with the following command:

> h.rep<-csHeatmap(myGenes,cluster='both',replicates=T)
> h.rep

Any help would be great!
Many thanks!

cpleis 01-18-2013 07:15 AM

1 Attachment(s)
Hi SHeaph,
Unfortunately I was not able to resolve the issue. I think that the heat map function in R isn't able to resolved such a large number of data labels. I simply switched to showing the functional groups instead of individual genes and the heat map turned out fine. I've attached the final image I created and the code I used to create it (below).

**Using logFC data for only sig genes (FDR < 0.05)

install.packages("gplots")
library(gplots)
source("http://bioconductor.org/biocLite.R")
biocLite("ALL")

heatdata <- read.csv("Avg_log2FC_allbins_sig.csv", sep=",")
heatdata <- heatdata[,2:5]
heatdata_matrix <- data.matrix(heatdata)
pdf("Heatmap_all.pdf", width=10, height=5, paper="a4r")
data_heatmap <- heatmap.2(heatdata_matrix, col=redblue(75), scale="row",
key=TRUE, symkey=FALSE, density.info="none", trace="none", margins=c(10,10), labRow =rownames(heatdata_matrix),cexRow=0.5)
axis(4,
at=2:NROW(heatdata_matrix),
labels=rownames(heatdata_matrix[data_heatmap$rowInd] ),
cex=0.5)
dev.off()

If you do find another solution let me know!

Courtney

SHeaph 01-18-2013 07:58 AM

That's great! Thanks for your help.

Stephen

jp. 07-28-2013 05:23 PM

Hi cpleis
First of all, thank you for your kind reply here about Heatmap.
I want to know about .csv files, how did you make it ?
I have cufflinks output and used to make csHeatmap and also useg gplots after normalization of cuffdiff, but your Heatmap looks great, however, I can not find .csv ?
May you please reply on how to make .csv file which can be used for generating Heatmap.
Thank you

Quote:

Originally Posted by cpleis (Post 94217)
Hi SHeaph,
Unfortunately I was not able to resolve the issue. I think that the heat map function in R isn't able to resolved such a large number of data labels. I simply switched to showing the functional groups instead of individual genes and the heat map turned out fine. I've attached the final image I created and the code I used to create it (below).

**Using logFC data for only sig genes (FDR < 0.05)

install.packages("gplots")
library(gplots)
source("http://bioconductor.org/biocLite.R")
biocLite("ALL")

heatdata <- read.csv("Avg_log2FC_allbins_sig.csv", sep=",")
heatdata <- heatdata[,2:5]
heatdata_matrix <- data.matrix(heatdata)
pdf("Heatmap_all.pdf", width=10, height=5, paper="a4r")
data_heatmap <- heatmap.2(heatdata_matrix, col=redblue(75), scale="row",
key=TRUE, symkey=FALSE, density.info="none", trace="none", margins=c(10,10), labRow =rownames(heatdata_matrix),cexRow=0.5)
axis(4,
at=2:NROW(heatdata_matrix),
labels=rownames(heatdata_matrix[data_heatmap$rowInd] ),
cex=0.5)
dev.off()

If you do find another solution let me know!

Courtney


cpleis 07-30-2013 08:21 AM

1 Attachment(s)
JP,
Below is the format for the csv file that corresponds to the heat map attached. I actually used SAS to get significant gene list, then calculated the log2-fold change. Then I took my functional bin file and averaged the log2FC for all gene in each functional bin. For you this may be different if you have a small enough gene list to put them all into R for the heat map. I averaged the log2FC per functional bin in excel using the AVERAGEIF function. The csv file I used in the R code is shown below (first few columns):

BINS Temp O3 Dri
Photosynthesis -0.072744049 -0.08157151 -0.058550079
Major Carbohydrates 0.388367819 0.024638472 0.055159953
Minor Carbohydrates 0.122830981 0.148760853 -0.048417891

I hope this helps!
Courtney

jmw86069 08-01-2013 06:20 AM

A couple simple things to try that may help:
1. Creating a PDF instead of a JPEG will store a vector based image, which will not be blurry when you zoom in. You do have to adjust the font size accordingly, which for 600 genes makes it extremely small. But if you have to, you can zoom in and read the labels.
2. Run a quick k-means clustering on the data, then draw heatmaps of each cluster separately. I see better results using Kmeans() from the amap package, since it can use correlation as a distance metric. Default R kmeans() does not offer correlation, though you can work around that.

Note that when you make a heatmap in R, or use any function that ultimately calls the R image() function, if you have more rows of data than pixels to display them, the heatmap cells tend to overwrite each other. There is an option "useRaster=TRUE" which creates a rasterized image then uses image resizing to shrink it down. But it works best for data sent to a PDF than onscreen. It also doesn't account for asymmetric matrices, but again you can work around that if you have to.


All times are GMT -8. The time now is 09:24 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.