Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Nested Model in DESeq

    Hello all;

    I have a RNA-SEQ data set that has three genotypes that were each exposed to three different treatments (2 pathogens and a control). I know the infection outcome for each of my genotypes when they are exposed to one or the other pathogen. I would like to know if the variation in gene expression is better explained by including the infection status nested within genotype. in other words, I would like to test between:

    fit0<- counts~genotype
    fit1<- counts~genotype%in%infection_status

    However, I am unable to obtain a fit1. I get the error message (in addition to some warnings):
    Error: all(na.omit(res[, "df.residual"] == df.residual)) is not TRUE

    I would appreciate any help.

    Seanna

  • #2
    Dear Seanna

    thank you for reporting this. Could you please send the complete script to reproduce the error, all of R's output (incl. warnings) and the output of sessionInfo(). Also, please make sure that you are using the latest release of DESeq. With that information, a more precise answer will be possible for us to help you.

    It will also allow us to update DESeq so that in the future it will be more helpful, or at least produce a more specific error message in such cases.

    Thank you and best wishes
    Wolfgang
    Wolfgang Huber
    EMBL

    Comment


    • #3
      Nested models in DESeq

      Dear Wolfgang;

      Thank you for your prompt reply. I have attached the script, the output and the sessionInfo() as you have requested.

      Many thanks for your help,

      Seanna
      Attached Files

      Comment


      • #4
        Dear Seanna

        thank you. Two suggestions:
        1. Update to R 2.15.0 and DESeq 1.8.1
        The Bioconductor project aims to develop and share open source software for precise and repeatable analysis of biological data. We foster an inclusive and collaborative community of developers and data scientists.

        The code has been continuously improved, and the problems you encounter may be dealt with more robustly now.

        2.

        Code:
        table(design$infection_status, design$host_genotype)
            
             16 3 4
          I   0 6 3
          NI  6 0 3
        indicates that some combinations of these factors have no data. What happens if you define

        Code:
        design$host_genotype_merged = factor(c(`3`='16or3', `16`='16or3', `4`='4')[as.character(design$host_genotype)])
        
        table(design$infection_status, design$host_genotype_merged)                                                           
             16or3 4
          I      6 3
          NI     6 3
        and then use host_genotype_merged instead of host_genotype in the analysis?

        Best wishes
        Wolfgang
        Wolfgang Huber
        EMBL

        Comment


        • #5
          Nested models in DESeq

          Dear Wolfgang;

          Thank you for your suggestion. I have updated R and DESeq as you suggested, and I have tried merging the genotypes, but I am still getting the error: Error: all(na.omit(res[, "df.residual"] == df.residual)) is not TRUE.

          I am attaching the code and the error + warnings, just in case you have any other ideas. Would it help for you to have the data as well?

          Many thanks for your time, I really appreciate it.

          Seanna
          Attached Files

          Comment


          • #6
            Dear Seanna

            thank you. Yes, can you please send me the data file needed to reproduce this problem, by email to [email protected]

            Thank you and best wishes
            Wolfgang
            Wolfgang Huber
            EMBL

            Comment


            • #7
              The problem was caused by a bug that is triggered when zero counts make parts (but not all) of the coefficients not estimable. I've fixed this in release (version 1.8.2) and devel (version 1.9.4).

              Comment


              • #8
                Dear Simon;

                Thank-you for fixing this. I look forward to being able to try out the nested model whenever the developer version is released.

                Cheers,

                Seanna

                Comment


                • #9
                  Dear Simon and Wolfgang;

                  Your fix has sorted the problem Simon- the model can now be fitted.

                  Wolfgang, I am now wondering if I misinterpreted your suggestion to merge the genotypes: I assumed that you thought that the nesting was failing since there were combinations of factors that contained no data. When I merge the genotypes and run the nested (full) compared to the un-nested (reduced) model, I get quite a few significant genes, whose expression, from eyeballing the data, look believably different between infected and not infected phenotypes. However, if I do not merge the genotypes, then I do not get any significant genes. Do you have any thoughts on why this would be so?

                  Cheers,

                  Seanna

                  Comment


                  • #10
                    Hey guys,

                    I have a similar problem, however my data is a double nested design (if that makes sense, I'm new to analysing RNAseq data).

                    Which means that I have 24 samples:
                    12 are parents (as controls) and 12 are plant lines
                    Within these, there are early and late flowering phenotypes
                    and again within these there are 2 sampling points.

                    genotype flowering sampling
                    Bur_1_1 parent late tp1
                    Bur_1_2 parent late tp1
                    Bur_1_3 parent late tp1
                    Bur_2_1 parent late tp2
                    Bur_2_2 parent late tp2
                    Bur_2_3 parent late tp2
                    Col_1_1 parent early tp1
                    Col_1_2 parent early tp1
                    Col_1_3 parent early tp1
                    Col_2_1 parent early tp2
                    Col_2_2 parent early tp2
                    Col_2_3 parent early tp2
                    pool01 pool_lines early tp1
                    pool02 pool_lines early tp1
                    pool03 pool_lines early tp1
                    pool04 pool_lines late tp1
                    pool05 pool_lines late tp1
                    pool06 pool_lines late tp1
                    pool07 pool_lines early tp2
                    pool08 pool_lines early tp2
                    pool09 pool_lines early tp2
                    pool10 pool_lines late tp2
                    pool11 pool_lines late tp2
                    pool12 pool_lines late tp2

                    I tried your suggestions from above but couldn't make it work.
                    It would be great if somebody could help me or give me a hint as to what I may have done wrong.

                    Thanks so much!

                    I attach the code, output with error message and session info in a .txt
                    Attached Files

                    Comment


                    • #11
                      - In your code, you first assign a formula with "%in%" operators to fit0 and fit1, then overwrite this with the results from the calls to nbinomGLMFit, in which you specify a different formula (with "+" instead of "%in%).

                      - Please consider using DEeq2 instead of DESeq.

                      - To tell you whether your formulas are right, we would need to know the biological question you are adressing.

                      Comment


                      • #12
                        Originally posted by Simon Anders View Post
                        - In your code, you first assign a formula with "%in%" operators to fit0 and fit1, then overwrite this with the results from the calls to nbinomGLMFit, in which you specify a different formula (with "+" instead of "%in%).

                        - Please consider using DEeq2 instead of DESeq.

                        - To tell you whether your formulas are right, we would need to know the biological question you are adressing.
                        Dear Mr Anders,

                        Thank you for your suggestion. I have since switched to using DESeq2, however, I am still having trouble with the nested design.

                        My data.frame looks like this:

                        > ExpDesign
                        genotype flowering sampling
                        Bur_1_1 parent late tp1
                        Bur_1_2 parent late tp1
                        Bur_1_3 parent late tp1
                        Bur_2_1 parent late tp2
                        Bur_2_2 parent late tp2
                        Bur_2_3 parent late tp2
                        Col_1_1 parent early tp1
                        Col_1_2 parent early tp1
                        Col_1_3 parent early tp1
                        Col_2_1 parent early tp2
                        Col_2_2 parent early tp2
                        Col_2_3 parent early tp2
                        pool01 pool_lines early tp1
                        pool02 pool_lines early tp1
                        pool03 pool_lines early tp1
                        pool04 pool_lines late tp1
                        pool05 pool_lines late tp1
                        pool06 pool_lines late tp1
                        pool07 pool_lines early tp2
                        pool08 pool_lines early tp2
                        pool09 pool_lines early tp2
                        pool10 pool_lines late tp2
                        pool11 pool_lines late tp2
                        pool12 pool_lines late tp2

                        Because I have sampled each of my 24 samples at two subsequent timepoints (tp 1 and tp2), these are always samples of the SAME individual.

                        How would I best implement this in my analysis?

                        Thanks again! :-)

                        ETA: I attached the sessionInfo and script...
                        Attached Files
                        Last edited by Sciurus; 04-15-2014, 02:56 AM.

                        Comment

                        Latest Articles

                        Collapse

                        • seqadmin
                          Current Approaches to Protein Sequencing
                          by seqadmin


                          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                          04-04-2024, 04:25 PM
                        • seqadmin
                          Strategies for Sequencing Challenging Samples
                          by seqadmin


                          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                          03-22-2024, 06:39 AM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by seqadmin, 04-11-2024, 12:08 PM
                        0 responses
                        17 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-10-2024, 10:19 PM
                        0 responses
                        22 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-10-2024, 09:21 AM
                        0 responses
                        16 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-04-2024, 09:00 AM
                        0 responses
                        46 views
                        0 likes
                        Last Post seqadmin  
                        Working...
                        X