Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Summarizing PCA in DESeq2

    I am interested in knowing the proportion of variance that my components describe in the Principle Component Analysis in DESeq2. I have successfully been able to do the rlogtransformation and the variancestablizedtransformation, and plotPCA to see the clustering of my samples. Now I am interested in the std dev, proportion of variance, and cumulative proportion of this PCA... similar to the summary if you ran:

    > pca <- princomp(data, scores=TRUE, cor=TRUE)
    > summary(pca)

    Any suggestions for getting this information, or for changing the rld SummarizedExperiment into a regular data frame or matrix so that I can run princomp and summary as usual.

  • #2
    hi,

    In the vignette, we have:

    The two functions return SummarizedExperiment objects, as the data are no longer counts. The assay function is used to extract the matrix of normalized values.

    Comment


    • #3
      Hi,

      I'm not sure to understand your answer, Michael...

      I'm having the same issue: the PCA plot is fine (and quite nice in my case!), but I really want to get the contribution percentage of PCA1 and PCA2 like I get with every other PCA analysis (non-related with transcriptomics) I perform. The DESeq2 package has to calculate it at some point to be able to draw the graph, but I can't find a way to access it...

      Plus I'd love to be able to draw the 3D-PCA plot (PCA1,2,3), but I can't find info on that on the DESeq2 user's guide.

      Any thoughts? Thank you!

      Comment


      • #4
        hi Pauline,

        The previous question was how to get a matrix of values from the SummarizedExperiment. The answer is:

        Code:
        mat <- assay(rld)
        Your question is more of a general R question, once you have a matrix, how to get contributions from each PC.

        Inside the plotPCA function we have code similar to the following (with the 'select' variable used to pick out the top genes by variance):

        Code:
        rv = apply(mat, 1, var)
        select = order(rv, decreasing=TRUE)[seq_len(min(ntop, length(rv)))]
        pca = prcomp(t(mat[select,]))
        Check the help file for ?prcomp. This base R function gives you a list containing the results of the PCA. You are interested in the standard deviations of each component:

        Code:
        variances = pca$sdev ^ 2
        total.variance = sum(variances)
        variances/total.variance
        I don't have a recommendation on how to make 3D plots (I find it hard to see what's going on in these).

        Comment


        • #5
          OK! Then I get it.
          Thank you for deciphering the "inside" of the plotPCA function. I got the contributions from each PC right. And if I understand R the code you gave me and my R output well enough, the following output should be the %age of variation explained by PC1 to PC9 (please do correct me if I'm wrong!)

          > variances/total.variance
          [1] 7.254373e-01 1.269088e-01 8.017268e-02 3.342993e-02 1.212378e-02 1.140305e-02 6.017477e-03 4.507064e-03 1.299178e-31
          >

          I think I'm finally going to enjoy my first "on-my-own" RNAseq analysis not ;-)

          Comment


          • #6
            The simplest solution to your problems with DESeq2's plotPCA function?
            Don't use it.
            Use the PCA and plot.PCA functions in FactoMineR.
            I love the graph I got. Much more informative.

            Comment


            • #7
              Problem is I already tried with FactoMineR because it's the one I use for mormphometrics-related PCAs, but my count table is too big. I need to perform that on a better computer than my office computer, but the waiting list was too big to get to the clusters we have to run huge jobs... :-)

              Comment


              • #8
                Hi, Dr. Love,

                Thank you very much for your explanation.

                I just checked your plotPCA function in deseq2,

                your code: rv=rowVars(assay(x)), which is actually different from what you post here.

                I already calculated:

                vsd=varianceStabilizingTransformation(dds)
                when I ran: rv=rowVars(assay(vsd)),

                I got error message: could not find function "rowVars"

                I could get the PCA plot though.

                Could you please explain why?

                Thanks

                Comment


                • #9
                  rowVars is from genefilter, so just "genefilter::rowVars".

                  Note that he wrote that the code he posted was "similar", not identical, to that in plotPCA. The rv variable is just used to subset things for computing the PCA. The PCA will work just fine without subsetting.

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Essential Discoveries and Tools in Epitranscriptomics
                    by seqadmin




                    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                    04-22-2024, 07:01 AM
                  • seqadmin
                    Current Approaches to Protein Sequencing
                    by seqadmin


                    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                    04-04-2024, 04:25 PM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, Yesterday, 08:47 AM
                  0 responses
                  14 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-11-2024, 12:08 PM
                  0 responses
                  60 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 10:19 PM
                  0 responses
                  60 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 09:21 AM
                  0 responses
                  54 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X