![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
How to normalize the RNA seq data for the purpose of correlation analysis | bioinfor | RNA Sequencing | 0 | 04-15-2013 01:03 PM |
RNA-Seq: Canonical correlation analysis for RNA-seq co-expression networks. | Newsbot! | Literature Watch | 0 | 03-06-2013 03:00 AM |
the correlation of RNA-seq data. | kentnf | Bioinformatics | 6 | 07-17-2012 11:08 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Junior Member
Location: UK Join Date: Aug 2013
Posts: 5
|
![]()
Hi Guys,
I am doing some differential expression analysis of rna seq data using deseq2 . I have 12 different samples and i am using the raw count data and then inputting the matrix in deseq2. my question is that if i wanted to compare a correlation of Gene A and Gene B within samples (not between samples - as they are co-expressed): do I do this on the raw counts or normalized counts. so I have 12 values for Gene A across 12 samples and 12 values for Gene B across 12 samples doing a raw count correlation gives me around rho 0.8 something however normalizing using the method in DESeq2 will scale each sample differently by size factors and the rho goes down to 0.5 anyway i am not sure how i should be doing the correlation (raw or normalized), if normalized then which method is preferable for within sample comparisons for 2 different genes. Thank you for taking the time to read this and hope someone can give me some advice. |
![]() |
![]() |
![]() |
#2 |
Junior Member
Location: UK Join Date: Aug 2013
Posts: 5
|
![]()
just bumping the post up - as i posted it very late in the evening.
|
![]() |
![]() |
![]() |
#3 |
Senior Member
Location: Boston Join Date: Jul 2013
Posts: 333
|
![]()
hi,
Note that DESeq2 doesn't really help you out with this question, as it focuses on gene-by-gene differential expression, and the transformations are most useful for visualizing and clustering samples. You don't want the sequencing depth as a factor in the correlation. Consider a situation where gene A and B are not correlated, but you sequence the samples so that each sample has double the number of reads as the previous sample. Then you will get a really high correlation which has no biological significance. So you could* do: nc <- counts(dds,normalized=TRUE) cor(nc[idx,]) where idx gives the index of genes you want to find correlations for. *However, I would also consider batch effects if you are calculating gene-gene correlations and the samples were processed in batches. This would be another way to get spurious large-in-absolute-value correlations. You can check for batch effects using either of the transformations and the plotPCA workflow in the DESeq2 vignette. If the samples cluster by batch, then the cqn package vignette explains how to get "normalized expression values", where the normalization takes care of sequencing depth, GC-content bias and gene length bias: http://www.bioconductor.org/packages.../html/cqn.html |
![]() |
![]() |
![]() |
#4 |
Junior Member
Location: UK Join Date: Aug 2013
Posts: 5
|
![]()
hi michael,
thank you for clearing this up and giving a comprehensive response to this problem. i was thinking along the same lines. someone suggested to use the vsd transformed data from deseq2 and then plot these correlations - that however gives really high correlations, almost in line with non-normalized data. i understand that transformations from deseq2 are useful if we want to perform clustering - the method that you suggest i.e. using the normalized data makes sense to me. and then look for batch effects as well. thanks again ![]() |
![]() |
![]() |
![]() |
Tags |
correlation, deseq2, rna-seq normalization |
Thread Tools | |
|
|