Recently I found some papers normalized RNA-seq/Microarray gene expression matrix for following eQTL mapping like this:
First quantile normalize across samples, then centered and standardized for each gene (usually by subtracting median and divided by standard deviation).
I'm not sure the assumption underlying this normalization approach. After centering and standardization, every gene has 0 mean and 1 sd, but in common sense some genes must have high variance across samples, and there are genes stably expressed and so have less variance. SO, why we should make the variance the same?
Moreover, if we want to apply some following analysis like MDS, PCA or ICA to remove batch effects or confounding factors in raw expression data, is it right to normalize data like that? OR, the normalization method used depends on personal preference. Is there a standard way? Thank you so much for your advice !
There is also a paper centered and standardized for each sample rather than for each gene! I'm more confused.
First quantile normalize across samples, then centered and standardized for each gene (usually by subtracting median and divided by standard deviation).
I'm not sure the assumption underlying this normalization approach. After centering and standardization, every gene has 0 mean and 1 sd, but in common sense some genes must have high variance across samples, and there are genes stably expressed and so have less variance. SO, why we should make the variance the same?
Moreover, if we want to apply some following analysis like MDS, PCA or ICA to remove batch effects or confounding factors in raw expression data, is it right to normalize data like that? OR, the normalization method used depends on personal preference. Is there a standard way? Thank you so much for your advice !
There is also a paper centered and standardized for each sample rather than for each gene! I'm more confused.