SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
PCA plots based on nucleotide word frequency (k-mer) of assembled contigs &/or raw re chayan Bioinformatics 3 03-02-2017 06:28 AM
Interpreting a PCA plot produced with DESeq2 feralBiologist Bioinformatics 9 11-17-2015 03:24 AM
pca in DESeq2 Tom2013 Bioinformatics 4 07-03-2015 08:58 AM
Summarizing PCA in DESeq2 cacti Bioinformatics 8 09-03-2014 12:39 AM
PCA vs MDS plots hchang10 Bioinformatics 0 06-23-2013 03:01 AM

Reply
 
Thread Tools
Old 05-24-2017, 11:21 AM   #1
ronaldrcutler
Member
 
Location: Virginia

Join Date: May 2016
Posts: 80
Default DESeq2 PCA Plots

Hello all,

I am running DESeq2 like so in R:
Code:
library(DESeq2)
sTable = data.frame(sampleName = files, fileName = files, condition = cond)
dds <- DESeqDataSetFromHTSeqCount(sampleTable = sTable, directory = "", design = ~condition)
dds <- DESeq(dds)
res <- results(dds)
resOrdered <- res[order(res$padj),]
rld <- rlogTransformation(dds, blind=TRUE)
print(plotPCA(rld, intgroup="condition"))
And I am getting a PCA plot that looks like so where 138 genes are padj <0.05 between the blue and red conditions.

I would expect for the blue replicates to be clustered and the red as well. Given that there were a fair amount of significant genes, I think that I a plotting this PCA wrong.

When I check the columns to make sure I am using the right I get this:
Code:
> colData(dds)
DataFrame with 6 rows and 2 columns
                                                             condition
                                                             <factor>
ID_18_1.bam_sorted.bam_htseq_out.txt     ID18
ID_18_2.bam_sorted.bam_htseq_out.txt     ID18
ID_18_3.bam_sorted.bam_htseq_out.txt     ID18
GP_18_1.bam_sorted.bam_htseq_out.txt    GP18
GP_18_2.bam_sorted.bam_htseq_out.txt    GP18
GP_18_3.bam_sorted.bam_htseq_out.txt    GP18
Is this something to be concerned about or is this the wrong way to plot PCA?

Thanks in advance
-R
ronaldrcutler is offline   Reply With Quote
Old 05-26-2017, 09:12 PM   #2
gringer
David Eccles (gringer)
 
Location: Wellington, New Zealand

Join Date: May 2011
Posts: 823
Default

I've had the best results from PCAs based on DESeq2 results when I used the VST and did an additional correction for transcript length (i.e. divide the by the longest transcript per gene in kb). This was before the rlogTransformation was visible/usable, so it might be that rlog works better for that.

What was your experimental design? Were all these six samples separate biological replicates? It's concerning that your samples are clustering by ID first and by treatment second. In our case samples clustered primarily by cell population first, and by treatment second. If your ID18_X and GP18_X come from the same (or similar) samples, or were sequenced/extracted in batches (we've noticed sequencing batch effects as well), that might explain why they're clustering together.

As a sanity check for PCAs, it's a good idea to make sure that the data you're generating the PCA from fits a normal distribution. You can do this by running qqnorm(<data>); values should generally be a straight line along the diagonal, usually with a bit of deviation at the extremities. If the qqnorm plot isn't approximately a straight line, then the data will need additional normalisation applied before running a PCA.
gringer is offline   Reply With Quote
Old 05-29-2017, 10:51 AM   #3
ronaldrcutler
Member
 
Location: Virginia

Join Date: May 2016
Posts: 80
Default

Hi Gringer,

The ID18_x were biological replicates from the same batch and GP18_x were biological replicates from the same batch. They did not come from the same samples or the same batch.

I was not able to generate a distribution with the regularized log transformation (unsure how to extract the values from the data.frame), but came up with a plot of variance over the read counts which shows that there does not seem to be a dependence of the variance on the mean.
ronaldrcutler is offline   Reply With Quote
Old 05-29-2017, 01:19 PM   #4
gringer
David Eccles (gringer)
 
Location: Wellington, New Zealand

Join Date: May 2011
Posts: 823
Default

By plotting the rank (assuming you have actually plotted the rank), you've removed any parametric factors from the plot. If you're doing a PCA on the rank then this would be fine, but I suspect your PCA is being done on something else. You need to make sure that the same values are plotted that are observed by the PCA calculation.
gringer is offline   Reply With Quote
Reply

Tags
deseq2, pca plot

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 05:34 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO