Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Heatmap of RNA-Seq Data in R

    Hello,
    I have a large data set from RNA sequencing and I am trying to make a heatmap of my data. I have am having issues formatting my heatmap figure. My data set is large with the log2 fold change for over 6oo genes across 4 treatments. My csv file is formatted as such:
    Gene Drought Ozone Temp1 Temp2
    Glyma# -0.130545875 -0.098349739 0.170508007 0.091996284
    ....
    So far I have gotten an image, but I can't seem to get the gene names to
    display properly. Here is my code:

    heatdata <- read.csv("logFC_bin17.csv", sep=",")
    heatdata <- heatdata[,2:5]
    heatdata_matrix <- data.matrix(heatdata)
    rownames(heatdata_matrix) = paste("Gene", 2:655)
    jpeg("Heatmap_bin17.jpeg", width=8, height=8, units="in", res=300,
    quality=100)
    data_heatmap <- heatmap.2(heatdata_matrix, col=redgreen(75), scale="row",
    key=TRUE, symkey=FALSE, density.info="none", trace="none", margins=c(10,10), labRow =rownames(heatdata_matrix),cexRow=0.5)
    dev.off()

    Everything about my image is fine except the right axis with the gene ID labels is blurred together (I am having issues uploading the image). It looks like 4 thin rectangles covering up the data. I want it to look like this: http://www.r-bloggers.com/r-heatmaps-with-gplots/

    Any suggestions for how to modify the code would be great. I've done a lot of R searches and can't seem to find any modifications that fix my data.
    Thanks!

  • #2
    difficult to say what you mean with these boxes.
    However, if you have 600 genes, it will be neigh impossible to display them on a screen. Let's assume your gene label needs to be 5 pixels high, then displaying 600 genes would require 3000 pixels in y-direction much more than even a 'retina' macbook has. Using the absolute minimum of 4 pixels (which looks really bad just google 4px font, I am not sure if R has such a font) you still weren't able to display this correctly. (And in your jpeg you would just manage 8inch high by 300dpi =2400 pixels = 4* 600)

    If you are fine with panning and zooming around in your jpeg you might try the different cex parameters. (But increase the size, and best use pixels as image size, as this is easier to calculate)


    OTOH the soy gene you give as an example is probably not even worth showing (hardly any changes, is it significant??) so you might want to look at
    some more filtering.


    Cheers
    b

    Comment


    • #3
      Try to increase the height of your jpeg and see what you get.

      Comment


      • #4
        Originally posted by cpleis View Post
        Hello,
        I have a large data set from RNA sequencing and I am trying to make a heatmap of my data. I have am having issues formatting my heatmap figure. My data set is large with the log2 fold change for over 6oo genes across 4 treatments. My csv file is formatted as such:
        Gene Drought Ozone Temp1 Temp2
        Glyma# -0.130545875 -0.098349739 0.170508007 0.091996284
        ....
        So far I have gotten an image, but I can't seem to get the gene names to
        display properly. Here is my code:

        heatdata <- read.csv("logFC_bin17.csv", sep=",")
        heatdata <- heatdata[,2:5]
        heatdata_matrix <- data.matrix(heatdata)
        rownames(heatdata_matrix) = paste("Gene", 2:655)
        jpeg("Heatmap_bin17.jpeg", width=8, height=8, units="in", res=300,
        quality=100)
        data_heatmap <- heatmap.2(heatdata_matrix, col=redgreen(75), scale="row",
        key=TRUE, symkey=FALSE, density.info="none", trace="none", margins=c(10,10), labRow =rownames(heatdata_matrix),cexRow=0.5)
        dev.off()

        Everything about my image is fine except the right axis with the gene ID labels is blurred together (I am having issues uploading the image). It looks like 4 thin rectangles covering up the data. I want it to look like this: http://www.r-bloggers.com/r-heatmaps-with-gplots/

        Any suggestions for how to modify the code would be great. I've done a lot of R searches and can't seem to find any modifications that fix my data.
        Thanks!
        Hi cpleis,

        I was just wondering if you were able to resolve this issue? I am having similar difficulties at the moment, I have just over 200 hits and the gene ID label is also blurred. I am new to cummeRbund and R and I created the map with the following command:

        > h.rep<-csHeatmap(myGenes,cluster='both',replicates=T)
        > h.rep

        Any help would be great!
        Many thanks!

        Comment


        • #5
          Hi SHeaph,
          Unfortunately I was not able to resolve the issue. I think that the heat map function in R isn't able to resolved such a large number of data labels. I simply switched to showing the functional groups instead of individual genes and the heat map turned out fine. I've attached the final image I created and the code I used to create it (below).

          **Using logFC data for only sig genes (FDR < 0.05)

          install.packages("gplots")
          library(gplots)
          source("http://bioconductor.org/biocLite.R")
          biocLite("ALL")

          heatdata <- read.csv("Avg_log2FC_allbins_sig.csv", sep=",")
          heatdata <- heatdata[,2:5]
          heatdata_matrix <- data.matrix(heatdata)
          pdf("Heatmap_all.pdf", width=10, height=5, paper="a4r")
          data_heatmap <- heatmap.2(heatdata_matrix, col=redblue(75), scale="row",
          key=TRUE, symkey=FALSE, density.info="none", trace="none", margins=c(10,10), labRow =rownames(heatdata_matrix),cexRow=0.5)
          axis(4,
          at=2:NROW(heatdata_matrix),
          labels=rownames(heatdata_matrix[data_heatmap$rowInd] ),
          cex=0.5)
          dev.off()

          If you do find another solution let me know!

          Courtney
          Attached Files

          Comment


          • #6
            That's great! Thanks for your help.

            Stephen

            Comment


            • #7
              Hi cpleis
              First of all, thank you for your kind reply here about Heatmap.
              I want to know about .csv files, how did you make it ?
              I have cufflinks output and used to make csHeatmap and also useg gplots after normalization of cuffdiff, but your Heatmap looks great, however, I can not find .csv ?
              May you please reply on how to make .csv file which can be used for generating Heatmap.
              Thank you

              Originally posted by cpleis View Post
              Hi SHeaph,
              Unfortunately I was not able to resolve the issue. I think that the heat map function in R isn't able to resolved such a large number of data labels. I simply switched to showing the functional groups instead of individual genes and the heat map turned out fine. I've attached the final image I created and the code I used to create it (below).

              **Using logFC data for only sig genes (FDR < 0.05)

              install.packages("gplots")
              library(gplots)
              source("http://bioconductor.org/biocLite.R")
              biocLite("ALL")

              heatdata <- read.csv("Avg_log2FC_allbins_sig.csv", sep=",")
              heatdata <- heatdata[,2:5]
              heatdata_matrix <- data.matrix(heatdata)
              pdf("Heatmap_all.pdf", width=10, height=5, paper="a4r")
              data_heatmap <- heatmap.2(heatdata_matrix, col=redblue(75), scale="row",
              key=TRUE, symkey=FALSE, density.info="none", trace="none", margins=c(10,10), labRow =rownames(heatdata_matrix),cexRow=0.5)
              axis(4,
              at=2:NROW(heatdata_matrix),
              labels=rownames(heatdata_matrix[data_heatmap$rowInd] ),
              cex=0.5)
              dev.off()

              If you do find another solution let me know!

              Courtney

              Comment


              • #8
                JP,
                Below is the format for the csv file that corresponds to the heat map attached. I actually used SAS to get significant gene list, then calculated the log2-fold change. Then I took my functional bin file and averaged the log2FC for all gene in each functional bin. For you this may be different if you have a small enough gene list to put them all into R for the heat map. I averaged the log2FC per functional bin in excel using the AVERAGEIF function. The csv file I used in the R code is shown below (first few columns):

                BINS Temp O3 Dri
                Photosynthesis -0.072744049 -0.08157151 -0.058550079
                Major Carbohydrates 0.388367819 0.024638472 0.055159953
                Minor Carbohydrates 0.122830981 0.148760853 -0.048417891

                I hope this helps!
                Courtney
                Attached Files

                Comment


                • #9
                  A couple simple things to try that may help:
                  1. Creating a PDF instead of a JPEG will store a vector based image, which will not be blurry when you zoom in. You do have to adjust the font size accordingly, which for 600 genes makes it extremely small. But if you have to, you can zoom in and read the labels.
                  2. Run a quick k-means clustering on the data, then draw heatmaps of each cluster separately. I see better results using Kmeans() from the amap package, since it can use correlation as a distance metric. Default R kmeans() does not offer correlation, though you can work around that.

                  Note that when you make a heatmap in R, or use any function that ultimately calls the R image() function, if you have more rows of data than pixels to display them, the heatmap cells tend to overwrite each other. There is an option "useRaster=TRUE" which creates a rasterized image then uses image resizing to shrink it down. But it works best for data sent to a PDF than onscreen. It also doesn't account for asymmetric matrices, but again you can work around that if you have to.

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Strategies for Sequencing Challenging Samples
                    by seqadmin


                    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                    03-22-2024, 06:39 AM
                  • seqadmin
                    Techniques and Challenges in Conservation Genomics
                    by seqadmin



                    The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                    Avian Conservation
                    Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                    03-08-2024, 10:41 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, Yesterday, 06:37 PM
                  0 responses
                  11 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, Yesterday, 06:07 PM
                  0 responses
                  10 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 03-22-2024, 10:03 AM
                  0 responses
                  51 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 03-21-2024, 07:32 AM
                  0 responses
                  68 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X