Seqanswers Leaderboard Ad

**dariober** · 07-04-2013, 02:21 PM

Originally posted by Fernas View Post

Hi all,

I have a gene expression data matrix consists of about 10000 rows that represent genes and 20 columns represent 20 samples belonging to 2 different groups (10 samples per group).

I want to prove that the variance between the 10 samples of the first group is more than the variance between the 10 samples of the second group.

Hi there,

Here's my suggestion, quite simple really. First count how many genes have variance in the 1st group greater than the variance in the 2nd group. Then apply a binomial test the null hypothesis that this count is equal to 50%. This is a sample R code:

Code:

## Some test data: 100 genes (rows), 20 samples (columns)
set.seed(1234)
ngenes<- 100
grp1<- 1:10
grp2<- 11:20
grp<- cbind(
    matrix(nrow= ngenes, ncol= length(grp1), data= rnorm(ngenes * length(grp1), sd= 1.1)),
    matrix(nrow= ngenes, ncol= length(grp2), data= rnorm(ngenes * length(grp2), sd= 1))
)

grpvar<- apply(grp, 1, function(x) ifelse(var(x[grp1]) > var(x[grp2]), TRUE, FALSE))
btest<- binom.test(table(grpvar))
btest

	Exact binomial test

data:  table(grpvar)
number of successes = 39, number of trials = 100, p-value = 0.0352
alternative hypothesis: true probability of success is not equal to 0.5
95 percent confidence interval:
 0.2940104 0.4926855
sample estimates:
probability of success 
                  0.39

I would check that there is no particular sample that gives a lot of variation to either group. Do this by repeating the above but leaving out one sample at a time and checking that the output p-values are in the same range as the initial p-value:

Code:

jkn<- vector(length= ncol(grp))
for(i in 1:ncol(grp)){
    if(i <= max(grp1)){
        grp1b<- grp1[which(grp1 != i)]
    } else {
        grp2b<- grp2[which(grp2 != i)]
    }
    grpvar<- apply(grp, 1, function(x) ifelse(var(x[grp1b]) > var(x[grp2b]), TRUE, FALSE))
    xtest<- binom.test(table(grpvar))
    jkn[i]<- xtest$p.value
}
## Dots should align more or less on a straight line
qqnorm(-log10(c(btest$p.value, jkn)))

Similarly, I would leave out groups of genes or sample genes to assess whether the initial result is due to a particular set of genes. However, I guess you would expect most of the genes to have the same variance?

Just a thought...

Dario

**Fernas** · 07-05-2013, 01:27 AM

Thank you very much indeed dariober for this clear explanation and suggestion.

I like the idea and I found it (kind of) comparable to (Rank Sum) test. In this test we calculate the variance of each gene in each group. So, we have to column of variances (column 1 contains variances of all genes in group1, and column 2 has variances of all genes in group2). Then, apply Rank Sum test to test whether both vectors (columns) come from continuous distributions with the same median against the alternative hypothesis that one significantly differ than the other.
What do you think? which of these two methods looks more related to the question I want to answer?

Regarding your suggestion: Do I need to normalize the expression matrix row-wise in the beginning or it will not change the results.?

Thanks again for the informative suggestion.

**Jean** · 07-05-2013, 05:24 AM

Hi.
We have an RNAseq tool we are working on called ALDEx (ANOVA-like Differential Expression) which will infer variance per sample and per group. If you'd like to give it a try it is available as an R package here:

Google Code Archive - Long-term storage for Google Code Project Hosting.

https://code.google.com/p/aldex/

**Jean** · 07-05-2013, 05:29 AM

Hi.
We have an RNAseq tool we are working on called ALDEx (ANOVA-like Differential Expression) which will infer expression and variance per group. If you'd like to give it a try it is available as an R package here:

Google Code Archive - Long-term storage for Google Code Project Hosting.

https://code.google.com/p/aldex/

**Fernas** · 07-05-2013, 05:57 AM

Originally posted by Jean View Post

Hi.
We have an RNAseq tool we are working on called ALDEx (ANOVA-like Differential Expression) which will infer expression and variance per group. If you'd like to give it a try it is available as an R package here:

https://code.google.com/p/aldex/

Thanks Jean for your reply.
I went quickly through the manual of ALDEx. As the purpose is to study variance between samples within a group (not differentialy expressed genes between two groups), I am not sure if ALDEx tool's functions can provide such information.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 39 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 41 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 35 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 55 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Analysis of Variance between Biological Samples

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News