Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • DE analysis with multiple groups

    Hi,

    How can I do DE analysis on multiple groups. I did small RNA seq (miRNA) and I've fours group. I want to do a statisticak test like ANOVA or Kruskal-Wallis do detect a difference of expression between the groups . DESeq ? edgeR ?

    Thanks

    N.

  • #2
    nobody knows ?

    Comment


    • #3
      DESeq and edgeR both offer ANODEV tests.

      Comment


      • #4
        what's the function to do this ?

        I've a dataframe A with 20 samples ( 4 groups of 5 samples )

        Code:
        groups <- c(rep("A",5),rep("B",5),rep("C",5),rep("D",5))
        cds <- newCountDataSet(A,groups)
        cds <- estimateSizeFactors(cds)
        cds <- estimateVarianceFunctions(cds)
        
        res <- nbinomTest(cds,condA="A",condB="B") # two group test

        Comment


        • #5
          Sorry, I need to add this to the vignette. In your case, this should work:

          Code:
          groups <- c( rep("A",5), rep("B",5), rep("C",5), rep("D",5) )
          cds <- newCountDataSet( A, groups )
          cds <- estimateSizeFactors( cds )
          cds <- estimateVarianceFunctions( cds, "pooled" )
          
          fit0 <- nbinomFitGLM( cds, count ~ 1 )
          fit1 <- nbinomFitGLM( cds, count ~ condition )
          pvals <- nbinomGLMTest( fit1, fit0 )
          Note that this is a good deal slower than the standard test.

          If you have more than one factor, pass a data frame to 'newCountDataSet' instead of 'groups' and then, you can write down more interesting model formulae than the one in this example, using the data frame's column names instead of 'condition'.

          Simon

          Comment


          • #6
            ok I'll try this.

            Thanks Simon

            Comment


            • #7
              Dear Simon,
              Is there any place where I can find more information about performing ANODEV analysis using DESeq?
              Alternatively, when do you think the vignette will include this information?
              Thanks!
              RSK

              Comment


              • #8
                Dear Simon,

                I am currently working with RNA-seq datasets in 5 conditions without replicates. It is so lucky for me to see this vignette, Howerer, I found that I can't use "pooled" methods in estimateVarianceFunctions. Or alternatively, I can't go through nbinomFitGLM with "blind" methods in estimateVarianceFunctions. Both error msgs is attached below. But in the nbinomFitGLM function description, it says:

                Use this function to estimate coefficients and calculate deviance
                from a GLM for each gene. The GLM uses the ‘nbkd.sf’ family, with
                the dispersion estimate according to ‘getVarianceFunction(cds)’.
                Note that this requires that the variance functions were estimated
                with method "pooled" or "blind".

                When I was trying to use "method=pooled" in estimateVarianceFunctions

                Code:
                Error in rowSums(sapply(tapply((1:ncol(counts))[replicated_sample], factor(conditions[replicated_sample]),  : 
                  'x' must be an array of at least two dimensions
                When I was using "method=blind" instead.

                Code:
                Error in nbinomFitGLM(cds, count ~ 1) : 
                  No pooled variance function found. Have you called 'estimateVarianceFunctions' with 'method="pooled"'?



                my code is:

                Code:
                library(DESeq);
                list=read.table("../data/5_condition.txt",row.names=1,header=T);
                conds <- c("CFGU","CFGW","CFHG","CFHH","CFHI");
                cds <- newCountDataSet( list,conds);
                cds <- estimateSizeFactors( cds );
                cds <- estimateVarianceFunctions(cds, method='blind');
                fit0 <- nbinomFitGLM( cds, count ~ 1 )
                fit1 <- nbinomFitGLM( cds, count ~ condition )
                result <- nbinomGLMTest( fit1, fit0 )
                Could you tell me what is my mistake here?
                Thanks in advance,

                Hao Chen



                Originally posted by Simon Anders View Post
                Sorry, I need to add this to the vignette. In your case, this should work:

                Code:
                groups <- c( rep("A",5), rep("B",5), rep("C",5), rep("D",5) )
                cds <- newCountDataSet( A, groups )
                cds <- estimateSizeFactors( cds )
                cds <- estimateVarianceFunctions( cds, "pooled" )
                
                fit0 <- nbinomFitGLM( cds, count ~ 1 )
                fit1 <- nbinomFitGLM( cds, count ~ condition )
                pvals <- nbinomGLMTest( fit1, fit0 )
                Note that this is a good deal slower than the standard test.

                If you have more than one factor, pass a data frame to 'newCountDataSet' instead of 'groups' and then, you can write down more interesting model formulae than the one in this example, using the data frame's column names instead of 'condition'.

                Simon

                Comment


                • #9
                  hi,

                  just for reference the function is now called
                  fitNbinomGLMs

                  Comment


                  • #10
                    Differential expression DESeq using GLM for timecourse experiment

                    Dear Members
                    I am currently working on NGS data and performed some differential expression analysis using DESeq. At the moment I am trying to find differentially expressed miRNAs using GLM functionality.
                    My experimental design consists of 4 samples (4stages of developing flower) with 2 biological replicates each. I would like to identify the differentially expressed miRNA during different developmental stages(Stage 1,2,3 and 4). However, I have some doubts regarding my final data interpretation and here is the code I used and the output.

                    My Dataset:
                    >Str(SSSA)
                    Dataframe: 662 obs. of 8 variables:
                    $ Flower1 : int 1 1 6 0 0 0 6709 0 3 0 ...
                    $ Flower1.1: int 1 1 1 0 0 0 2261 2 1 2 ...
                    $ Flower2 : int 0 4 49 0 0 0 7311 0 4 0 ...
                    $ Flower2.1: int 1 14 69 0 0 0 11211 0 1 0 ...
                    $ Flower3 : int 0 8 89 1 0 0 1978 0 0 1 ...
                    $ Flower3.1: int 0 4 72 0 0 0 5392 0 1 1 ...
                    $ Flower4 : int 0 1 67 0 0 0 4675 0 0 0 ...
                    $ Flower4.1: int 0 0 21 0 0 0 4629 0 1 5 ...


                    My code
                    groups <- c( rep("F1",2), rep("F2",2), rep("F3",2), rep("F4",2) )
                    cds <- newCountDataSet(SSSA , groups )
                    cds <- estimateSizeFactors( cds )
                    cds <- estimateDispersions( cds, "pooled" )
                    fit0 <- fitNbinomGLMs ( cds, count ~ 1 )
                    fit1 <- fitNbinomGLMs ( cds, count ~ groups )
                    pvals <- nbinomGLMTest( fit1, fit0 )
                    padjGLM <- p.adjust( pvals, method="BH" )
                    fit1.pval <- cbind(fit1, padjGLM)

                    My Result
                    > str(fit1.pval)
                    'data.frame': 662 obs. of 7 variables:
                    $ (Intercept): num 1.23 1.16 2.77 -32.46 NA ...
                    $ groupsF2 : num -3.17 1.07 2.17 -2.1 NA ...
                    $ groupsF3 : num -32.569 0.859 2.973 30.973 NA ...
                    $ groupsF4 : num -31.675 -2.008 2.835 -0.875 NA ...
                    $ deviance : num 0.746 1.895 2.041 0.234 NA ...
                    $ converged : logi TRUE TRUE TRUE TRUE NA NA ...
                    $ pval : num 0.28395 0.44052 0.00622 0.86603 NA ...

                    My doubts.....
                    1.In my case, is it better to set the conditions as a data frame instead of a vector- “groups <- c( rep("F1",2), rep("F2",2), rep("F3",2), rep("F4",2) )”?
                    2.In the code, I calculated the object fit0 setting the value 1 after the tilde -“fit0 <- fitNbinomGLMs ( cds, count ~ 1 )” but I am not very clear.. what count ~ 1 stand for/refers to ?

                    3.My final results( above) shows 7 variables(including the adjusted pvalue) in the data.frame and I am not clear in interpreting this data. In particular , by using ‘groups’ in calling fit 1 i.e. fit1 <- fitNbinomGLMs ( cds, count ~ groups ),I can’t understand if I really compared F1 against all other stages F2,F3,F4, as I intend or If I made any other comparisions.

                    Could anyone help me on this?

                    Jay

                    Comment


                    • #11
                      No body knows ?

                      Comment


                      • #12
                        Please do not cross-post. The reply is in the other thread.

                        Comment

                        Latest Articles

                        Collapse

                        • seqadmin
                          Essential Discoveries and Tools in Epitranscriptomics
                          by seqadmin




                          The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                          04-22-2024, 07:01 AM
                        • seqadmin
                          Current Approaches to Protein Sequencing
                          by seqadmin


                          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                          04-04-2024, 04:25 PM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by seqadmin, Yesterday, 08:47 AM
                        0 responses
                        14 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-11-2024, 12:08 PM
                        0 responses
                        60 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-10-2024, 10:19 PM
                        0 responses
                        60 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-10-2024, 09:21 AM
                        0 responses
                        54 views
                        0 likes
                        Last Post seqadmin  
                        Working...
                        X