Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • RNA-seq data and HEATMAP

    Hello
    I’m very new in this topic and I’m trying to learn how to analyze RNA-seq data. So this is my problem, I have RNA-seq data from 60 breast cancer patients. I assigned each case to a subtype according to the expression of immunohistochemistry markers but I would like (using my RNA-seq data) to cluster the samples according to the expression of 50 genes that have been previously published to distinguish breast cancer subtypes. I have the read counts, FPKM values, TPM values. I’m using R to generate the heatmap. The problem is that when I do the heatmap the clustering is terrible, all subtypes are mixed together.. is not what I’m expecting and I don’t know what I’m doing wrong…

    I would really appreciate your help!!

    Silvia

  • #2
    Hi Silvia,

    What scripts are you using at the moment? That may help narrow in on the problem, if one exists. If you're not, scaling the expression values (eg. Z-score transformation) for each gene can clean up the clustering quite a bit if there are large differences between genes. And if you're scaling with the heatmap function, just make sure that it's not clustering prior to scaling (which heatmap2 does).

    If you have a matrix of FPKM/TPM values with samples in the columns and genes in the rows, the following will do the transformation:
    Code:
    z.mat <- t(scale(t(mat), scale=TRUE, center=TRUE))
    Then use the transformed matrix for the heatmap/clustering

    David

    Comment


    • #3
      Thank you very much for your answer!!!
      So, should I Z-transform the FPKM values or read counts? After I transform, should i also scale before clustering?

      Silvia

      Comment


      • #4
        No problem!

        Transform one of the values that takes into account total number of reads (ie. FPKM or TPM) so that differences in sequencing depth between samples don't affect things.

        The Z-transformation is the scaling that you do prior to clustering. So after that, all you have to do is generate the heatmap/clustering. If you were using a scale option built into the heatmap function (something like heatmap(scale="row"....), you no longer need to.

        I don't know how your data is currently set up, but ideally you can get to a matrix where each column name represents a single cancer sample, and each row represents a single gene. From there, I would use something like the following:

        Code:
        library(gplots) #I like the heatmap.2 function, but not necessary
        
        #Our matrix is stored in a variable called 'mat'
        z.mat <- t(scale(t(mat), center=TRUE, scale=TRUE) #Z-transform--the scale function transforms column values, but we want to transform our rows, hence the double transpositions
        heatmap.2(z.mat, dendrogram="both", scale="none", trace="none")
        And adjust the options for the heatmap as you see fit. I'm not sure that this will be the magic fix you're looking for, but hopefully it helps!

        David

        Comment


        • #5
          ok!!! Nice!!
          So.. (I'm sorry to repeat everything again)
          When I do the Z-transformation there is no need of log-transformation?
          After this transformation there is no need to scale again

          These are the commands I have been using

          install.packages("gplots")
          install.packages("RColorBrewer")
          library (gplots)
          library (RColorBrewer)

          x1 <- read.csv("x.csv", header = T, row.names = 1)
          log_x1=log1p(x2)
          todos <- as.matrix(log_x1)
          myheatcol <- rev(redgreen(75))
          heatmap.2(todos, key=T, symkey=F, trace="none", scale="column", dendrogram = c("column"), Rowv=T, keysize=2, cexRow=0.5, cexCol=0.5, col=myheatcol)

          Comment


          • #6
            Yeah, log transformation may be a good thing to do prior to Z-score calculation

            Because you have dendrogram="column", I'm guessing you have samples in the columns, right? I notice two things about the heatmap.2 line that may be messing things up a bit. First, you have scale="column", so assuming samplesIDs are along the columns, you'll be doing calculating Z-scores within one sample, and across all genes. We want to scale the rows, calculating Z-score within genes, but across samples. This is because you want these two hypothetical genes to be weighed the same when clustering:
            -Gene A: Mean FPKM=30, SD=5, FPKM in Sample A=40 (Z-score=2)
            -Gene B: Mean FPKM=200, SD=30 FPKM in Sample A=260 (Z-score=2)
            So relative to the underlying distribution of Gene A and B, you can see that they are equally enriched in Sample A. But if you don't do the transformation, Gene B will have more of an affect on the clustering because the difference in absolute FPKM is much more than the difference of Gene A.

            The second thing is that heatmap.2 clusters prior to its internal scaling option. So if these large difference described above are present, it will throw clustering regardless of whether you use the scale option or not. That's why I recommend scaling prior to using heatmap.2.

            Try the following changes and let me know how it goes:
            Code:
            x1 <- read.csv("x.csv", header = T, row.names = 1)
            log_x1=log1p(x2)
            todos <- as.matrix(log_x1)
            [B]z.tudos <- t(scale(t(tudos), scale=TRUE, center=TRUE) #Z transformation[/B]
            myheatcol <- rev(redgreen(75))
            heatmap.2([B]z.todos[/B], key=T, symkey=F, trace="none", scale="[B]none[/B]", dendrogram = c("column"), Rowv=T, keysize=2, cexRow=0.5, cexCol=0.5, col=myheatcol)

            Comment


            • #7
              Look at the density plot of your expression values:
              Code:
              plot(density(todos));
              If it looks vaguely like a normal distribution, then a heatmap will probably look good, as long as you're not scaling any values -- I notice that you *are* scaling by column; don't do that. I think heatmap by default scales by row, which also doesn't look good.

              I've got the best success with heatmaps (and PCAs) by using the VST transformation of DESeq2, but first filtering out genes with low counts, and also normalising by transcript length (I call it VSTPk).

              Comment


              • #8
                I did the plot and the distribution looks normal, I'm trying now with the heatmap. I think I will try to analyze my data using DEseq2 because i used RSEM but I would like to try using different methods.. DEseq2 allow yoy to filter genes with low counts?

                Originally posted by gringer View Post
                Look at the density plot of your expression values:
                Code:
                plot(density(todos));
                If it looks vaguely like a normal distribution, then a heatmap will probably look good, as long as you're not scaling any values -- I notice that you *are* scaling by column; don't do that. I think heatmap by default scales by row, which also doesn't look good.

                I've got the best success with heatmaps (and PCAs) by using the VST transformation of DESeq2, but first filtering out genes with low counts, and also normalising by transcript length (I call it VSTPk).

                Comment


                • #9
                  For generating my VSTPk matrix, I did a manual low-count filtering first before feeding into DESeq2 to generate the matrix.

                  For differential expression, DESeq2 does it's own filtering to deal with low counts, but bear in mind that the model requires you to be using real gene-level counts, rather than estimated (e.g. isoform) counts. This probably also applies to the VST matrix. The results you get from feeding RSEM/cufflinks results into DESeq2 may not be statistically sound.
                  Last edited by gringer; 12-02-2015, 01:44 PM.

                  Comment


                  • #10
                    Heyy!!
                    I'm trying to do the changes you suggested but when i do the Z-transformation it appears "+" symbol after
                    z.todos <- t(scale(t(todos), scale=TRUE, center=TRUE) #Z transformation

                    > z.todos <- t(scale(t(todos), scale=TRUE, center=TRUE) #Z transformation
                    +

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Current Approaches to Protein Sequencing
                      by seqadmin


                      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                      04-04-2024, 04:25 PM
                    • seqadmin
                      Strategies for Sequencing Challenging Samples
                      by seqadmin


                      Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                      03-22-2024, 06:39 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, 04-11-2024, 12:08 PM
                    0 responses
                    25 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 10:19 PM
                    0 responses
                    28 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 09:21 AM
                    0 responses
                    24 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-04-2024, 09:00 AM
                    0 responses
                    52 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X