Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Analysis of Variance between Biological Samples

    Hi all,

    I have a gene expression data matrix consists of about 10000 rows that represent genes and 20 columns represent 20 samples belonging to 2 different groups (10 samples per group).

    I want to prove that the variance between the 10 samples of the first group is more than the variance between the 10 samples of the second group.

    Analysis of Variance (ANOVA) methods may not work here (not sure of that) because it gives bad p-value as the number of variables (genes) are much much more than the number of observations (samples). Do I need to remove correlated genes, or cluster genes? is there any better solution for that?

    Thanks in advance.

  • #2
    Originally posted by Fernas View Post
    Hi all,

    I have a gene expression data matrix consists of about 10000 rows that represent genes and 20 columns represent 20 samples belonging to 2 different groups (10 samples per group).

    I want to prove that the variance between the 10 samples of the first group is more than the variance between the 10 samples of the second group.
    Hi there,

    Here's my suggestion, quite simple really. First count how many genes have variance in the 1st group greater than the variance in the 2nd group. Then apply a binomial test the null hypothesis that this count is equal to 50%. This is a sample R code:

    Code:
    ## Some test data: 100 genes (rows), 20 samples (columns)
    set.seed(1234)
    ngenes<- 100
    grp1<- 1:10
    grp2<- 11:20
    grp<- cbind(
        matrix(nrow= ngenes, ncol= length(grp1), data= rnorm(ngenes * length(grp1), sd= 1.1)),
        matrix(nrow= ngenes, ncol= length(grp2), data= rnorm(ngenes * length(grp2), sd= 1))
    )
    
    grpvar<- apply(grp, 1, function(x) ifelse(var(x[grp1]) > var(x[grp2]), TRUE, FALSE))
    btest<- binom.test(table(grpvar))
    btest
    
    	Exact binomial test
    
    data:  table(grpvar)
    number of successes = 39, number of trials = 100, p-value = 0.0352
    alternative hypothesis: true probability of success is not equal to 0.5
    95 percent confidence interval:
     0.2940104 0.4926855
    sample estimates:
    probability of success 
                      0.39
    I would check that there is no particular sample that gives a lot of variation to either group. Do this by repeating the above but leaving out one sample at a time and checking that the output p-values are in the same range as the initial p-value:

    Code:
    jkn<- vector(length= ncol(grp))
    for(i in 1:ncol(grp)){
        if(i <= max(grp1)){
            grp1b<- grp1[which(grp1 != i)]
        } else {
            grp2b<- grp2[which(grp2 != i)]
        }
        grpvar<- apply(grp, 1, function(x) ifelse(var(x[grp1b]) > var(x[grp2b]), TRUE, FALSE))
        xtest<- binom.test(table(grpvar))
        jkn[i]<- xtest$p.value
    }
    ## Dots should align more or less on a straight line
    qqnorm(-log10(c(btest$p.value, jkn)))
    Similarly, I would leave out groups of genes or sample genes to assess whether the initial result is due to a particular set of genes. However, I guess you would expect most of the genes to have the same variance?

    Just a thought...

    Dario

    Comment


    • #3
      Thank you very much indeed dariober for this clear explanation and suggestion.

      I like the idea and I found it (kind of) comparable to (Rank Sum) test. In this test we calculate the variance of each gene in each group. So, we have to column of variances (column 1 contains variances of all genes in group1, and column 2 has variances of all genes in group2). Then, apply Rank Sum test to test whether both vectors (columns) come from continuous distributions with the same median against the alternative hypothesis that one significantly differ than the other.
      What do you think? which of these two methods looks more related to the question I want to answer?

      Regarding your suggestion: Do I need to normalize the expression matrix row-wise in the beginning or it will not change the results.?

      Thanks again for the informative suggestion.

      Comment


      • #4
        Hi.
        We have an RNAseq tool we are working on called ALDEx (ANOVA-like Differential Expression) which will infer variance per sample and per group. If you'd like to give it a try it is available as an R package here:

        Comment


        • #5
          Hi.
          We have an RNAseq tool we are working on called ALDEx (ANOVA-like Differential Expression) which will infer expression and variance per group. If you'd like to give it a try it is available as an R package here:

          Comment


          • #6
            Originally posted by Jean View Post
            Hi.
            We have an RNAseq tool we are working on called ALDEx (ANOVA-like Differential Expression) which will infer expression and variance per group. If you'd like to give it a try it is available as an R package here:

            https://code.google.com/p/aldex/
            Thanks Jean for your reply.
            I went quickly through the manual of ALDEx. As the purpose is to study variance between samples within a group (not differentialy expressed genes between two groups), I am not sure if ALDEx tool's functions can provide such information.

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Essential Discoveries and Tools in Epitranscriptomics
              by seqadmin


              The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
              Yesterday, 07:01 AM
            • seqadmin
              Current Approaches to Protein Sequencing
              by seqadmin


              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
              04-04-2024, 04:25 PM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 04-11-2024, 12:08 PM
            0 responses
            39 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 10:19 PM
            0 responses
            41 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 09:21 AM
            0 responses
            35 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-04-2024, 09:00 AM
            0 responses
            55 views
            0 likes
            Last Post seqadmin  
            Working...
            X