Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • DESeq2 PCA Plots

    Hello all,

    I am running DESeq2 like so in R:
    Code:
    library(DESeq2)
    sTable = data.frame(sampleName = files, fileName = files, condition = cond)
    dds <- DESeqDataSetFromHTSeqCount(sampleTable = sTable, directory = "", design = ~condition)
    dds <- DESeq(dds)
    res <- results(dds)
    resOrdered <- res[order(res$padj),]
    rld <- rlogTransformation(dds, blind=TRUE)
    print(plotPCA(rld, intgroup="condition"))
    And I am getting a PCA plot that looks like so where 138 genes are padj <0.05 between the blue and red conditions.

    I would expect for the blue replicates to be clustered and the red as well. Given that there were a fair amount of significant genes, I think that I a plotting this PCA wrong.

    When I check the columns to make sure I am using the right I get this:
    Code:
    > colData(dds)
    DataFrame with 6 rows and 2 columns
                                                                 condition
                                                                 <factor>
    ID_18_1.bam_sorted.bam_htseq_out.txt     ID18
    ID_18_2.bam_sorted.bam_htseq_out.txt     ID18
    ID_18_3.bam_sorted.bam_htseq_out.txt     ID18
    GP_18_1.bam_sorted.bam_htseq_out.txt    GP18
    GP_18_2.bam_sorted.bam_htseq_out.txt    GP18
    GP_18_3.bam_sorted.bam_htseq_out.txt    GP18
    Is this something to be concerned about or is this the wrong way to plot PCA?

    Thanks in advance
    -R

  • #2
    I've had the best results from PCAs based on DESeq2 results when I used the VST and did an additional correction for transcript length (i.e. divide the by the longest transcript per gene in kb). This was before the rlogTransformation was visible/usable, so it might be that rlog works better for that.

    What was your experimental design? Were all these six samples separate biological replicates? It's concerning that your samples are clustering by ID first and by treatment second. In our case samples clustered primarily by cell population first, and by treatment second. If your ID18_X and GP18_X come from the same (or similar) samples, or were sequenced/extracted in batches (we've noticed sequencing batch effects as well), that might explain why they're clustering together.

    As a sanity check for PCAs, it's a good idea to make sure that the data you're generating the PCA from fits a normal distribution. You can do this by running qqnorm(<data>); values should generally be a straight line along the diagonal, usually with a bit of deviation at the extremities. If the qqnorm plot isn't approximately a straight line, then the data will need additional normalisation applied before running a PCA.

    Comment


    • #3
      Hi Gringer,

      The ID18_x were biological replicates from the same batch and GP18_x were biological replicates from the same batch. They did not come from the same samples or the same batch.

      I was not able to generate a distribution with the regularized log transformation (unsure how to extract the values from the data.frame), but came up with a plot of variance over the read counts which shows that there does not seem to be a dependence of the variance on the mean.

      Comment


      • #4
        By plotting the rank (assuming you have actually plotted the rank), you've removed any parametric factors from the plot. If you're doing a PCA on the rank then this would be fine, but I suspect your PCA is being done on something else. You need to make sure that the same values are plotted that are observed by the PCA calculation.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM
        • seqadmin
          Techniques and Challenges in Conservation Genomics
          by seqadmin



          The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

          Avian Conservation
          Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
          03-08-2024, 10:41 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, Yesterday, 06:37 PM
        0 responses
        10 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, Yesterday, 06:07 PM
        0 responses
        10 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-22-2024, 10:03 AM
        0 responses
        51 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-21-2024, 07:32 AM
        0 responses
        67 views
        0 likes
        Last Post seqadmin  
        Working...
        X