Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Interpreting a PCA plot produced with DESeq2

    I've used DESeq2 to analyse a few RNA-Seq samples. I followed pretty closely the manual and using the following code

    Code:
    vsd <- varianceStabilizingTransformation(dds)
    data <- plotPCA(vsd, intgroup=c("condition"), returnData=TRUE)
    percentVar <- round(100 * attr(data, "percentVar"))
    plotPCA <- ggplot(data, aes(PC1, PC2, color=condition)) +
      geom_point(size=3) +
      xlab(paste0("PC1: ",percentVar[1],"% variance")) +
      ylab(paste0("PC2: ",percentVar[2],"% variance")) +
      geom_text(aes(label=names),hjust=0.25, vjust=-0.5, show_guide = F)
    ggsave("PCA.pdf", plot = plotPCA)
    I created the following PCA plot:



    What I don't understand is - what are the units that are recorded on the x/y axes? What's their meaning?
    Last edited by feralBiologist; 02-14-2015, 05:30 PM.

  • #2
    The axes are dimensionless, they have no units.

    Comment


    • #3
      Originally posted by dpryan View Post
      The axes are dimensionless, they have no units.
      Thanks for your reply! So you are saying the units printed on the x/y axes have no meaning? Do you know how can I edit the code to remove them?

      Comment


      • #4
        plotPCA <- plotPCA + theme(axis.text.x = element_blank(), axis.text.y=element_blank())

        Or something like that. Check the ggplot2 documentation.

        Comment


        • #5
          So you are saying the units printed on the x/y axes have no meaning?
          They are the first (x-axis) and second (y-axis) principal components, respectively. It's a projection of your data onto a subspace that captures the maximum variance/minimum error. We can likely say they have no units, but they certainly do have meaning.

          Comment


          • #6
            Originally posted by Fatt View Post
            They are the first (x-axis) and second (y-axis) principal components, respectively. It's a projection of your data onto a subspace that captures the maximum variance/minimum error. We can likely say they have no units, but they certainly do have meaning.
            Yes, I know these are the principal components. It is just that DESeq2 prints units on these axes (you can check the link to the plot in my first post) and I could not make any sense of these. I also saw a lot of other PCA plots (presumably produced by other programs) displaying units on the axes so wondered what these are - just do image search on Google for "PCA plot" and you will see a plenty of graphs displaying units on the axes. It is the units that I find confusing, not the axes themselves.

            Comment


            • #7
              I found the answer to this on CrossValidated: it seems the units denote the raw component scores. As the components are themselves linear combinations of multiple genes it is hard to interpret these raw scores biologically. They are just coordinates in the two-dimensional PC space and are helpful to simply place the individual samples in that space.

              Comment


              • #8
                I might be saying this a bit late, but the % of intertia (ie total variance) captured by each PC is a very useful "quality control" measure for PCA and should be included if possible in the figure. The axis units, as said above, are not very interpretable.

                BTW, that's a whopping big 1st PC! Nice!

                Comment


                • #9
                  So, how much % of the total variance should be expected?

                  Comment


                  • #10
                    @student-t: I replied to this on biostars.

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Current Approaches to Protein Sequencing
                      by seqadmin


                      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                      04-04-2024, 04:25 PM
                    • seqadmin
                      Strategies for Sequencing Challenging Samples
                      by seqadmin


                      Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                      03-22-2024, 06:39 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, 04-11-2024, 12:08 PM
                    0 responses
                    27 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 10:19 PM
                    0 responses
                    31 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 09:21 AM
                    0 responses
                    27 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-04-2024, 09:00 AM
                    0 responses
                    52 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X