Go Back   SEQanswers > Bioinformatics > Bioinformatics

Similar Threads
Thread Thread Starter Forum Replies Last Post
extract dendrogram information from heatmap generated by heatmap.2 crazyhottommy Bioinformatics 6 11-24-2014 09:45 AM
Summarizing PCA in DESeq2 cacti Bioinformatics 8 09-02-2014 11:39 PM
DESeq2 - varianace stabilized pseudocounts and PCA plot KHubbard Bioinformatics 1 11-05-2013 02:35 AM
PCA analysis wilson90 Bioinformatics 1 10-07-2013 02:48 AM
what's wrong with heatmap.2, heatmap turns to blue after finishing plotting crazyhottommy Bioinformatics 3 08-20-2013 11:02 AM

Thread Tools
Old 08-08-2014, 09:31 AM   #1
Location: VA

Join Date: Jul 2011
Posts: 17
Default DEseq2 analysis: Seeming incongruity between PCA & distance heatmap

Hello all,

I'm running an RNAseq analysis with DESeq2 (R version 3.1.0, DESeq2_1.4.5 ). Looking at my QC plots, I noticed an odd discrepancy between the PCA plot and the distance heatmap.

One of the samples (labeled Sample_4 in the attached images) clusters right among the other samples on the PCA, but on the heatmap it appears to be an outlier compared to the other samples.

I've run a lot of analyses like this using DESeq2 with very similar code, and I've never seen a discrepancy this big between these two plots before. Has anyone encountered this situation before, or have a good idea as to what might explain this?

Could it have to do with the relatively small amount of total variation explained by PC1 and PC2 (18.4% & 17.6% respectively)?


Attached Images
File Type: png PCA.png (137.6 KB, 53 views)
File Type: png dist_heat.png (127.4 KB, 36 views)
afkoeppel is offline   Reply With Quote
Old 08-09-2014, 07:49 PM   #2
Michael Love
Senior Member
Location: Boston

Join Date: Jul 2013
Posts: 333

The distance between samples on the PCA plot is an approximation of the distance using all the genes, and the quality of the approximation depends on how much variance is captured by PC1 and PC2 (here only ~35%).

Also, if you were using plotPCA (it looks like you are not though), the PCA is calculated on the top n genes ranked by variance, instead of all the genes.
Michael Love is offline   Reply With Quote
Old 08-11-2014, 05:44 AM   #3
Senior Member
Location: Charlottesville, VA

Join Date: May 2011
Posts: 112

Hey Mike. Thanks for responding. I'm working with Alex on this. We also used the plotPCA function and got the same results (the code used here to make the PCA plots was based on plotPCA - we had difficulty changing the default colors at one point in the plocPCA's history).

I believe the sample distance heatmap was made using some code that may have been part of the DESeq vignette at one point - something like `heatmap.2(as.matrix(dist(t(assay(rld)))))` where rld is the regularized log transformed dataset.

So I think I'm hearing you say we're seeing this because the first two PCs aren't explaining that much variance. Still seems odd to me that the distance matrix based on all genes (or top N based on variance) still shows this particular sample as a pretty obvious outlier where PCA did not.

Looking forward to those time-series examples in the vignette you promised last month in Boston
turnersd is offline   Reply With Quote
Old 08-11-2014, 05:59 AM   #4
Michael Love
Senior Member
Location: Boston

Join Date: Jul 2013
Posts: 333

hi Stephen,

Something to explore: look into the other PC's to see if this sample sticks out in one of those.

Yes, the time series dataset I submitted to Bioc is currently in review, and once that's done I can write up a workflow. It's a fission yeast time series.
Michael Love is offline   Reply With Quote

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

All times are GMT -8. The time now is 06:23 AM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO