Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Differential Expression in different patients

    Dear all,

    I have a project to analyze where the researchers used 3 different patients (human primary cells) and applied different stimuli on each of them. Here is a short description of the experiment:
    Patient A: no stimulus, stimulus 1, stimuli 1+2
    Patient B: no stimulus, stimulus 1, stimuli 1+2
    Patient C: no stimulus, stimulus 1, stimuli 1+2

    Then, I have biological triplicates for each condition (no stimulus, stimulus 1 and stimuli 1+2) but by qPCR they noticed the basal level of some of their genes or interest were no similar but after the different stimuli, when they look at the fold-changes (normalized by the no stimulus condition), they get similar fold-changes between the patients.

    Now, I have to analyze mRNAseq of these patients in each condition and I wanted to do a Differential Expression analysis (with DESeq) but I have a doubt. If I take the 3 patients in one condition, the raw counts will be different for a gene (actually even in RPKM)... Then, I am not sure I can really consider my 3 patients as triplicates.

    Or, should I do DE analysis of each patient separatly considering I do not have replicates and then compare the obtained fold-changes?

    Or even better, is there any DE package (I love R) which may take into consideration the patients?

    Sylvain

  • #2
    1) Use DESeq2 rather than DESeq

    2) Don't use RPKM for statistics unless you really really know what you're doing (in which case you probably wouldn't want to use it anyway).

    3) Use patient as a blocking factor in a GLM (you can do this with DESeq2, edgeR, limma, etc.).

    4) Human experiments are pretty noisy, advise the biologists to drastically increase the number of patients (if possible) next time.

    Comment


    • #3
      Hi dpryan,

      Yes, I will use the raw counts as explained in both manuals of DESeq and DESeq2.

      I noticed in the vignette of DESeq2 that you can use "patient" as parameter and I wondered if it will do what I want. So you confirm I can NOT do it with DESeq... (pity because I'm using an old version of R, version 2.14, because of lack of compatibility with R 3.0 with some of other packages I really need for other analyses. But it's ok, I will see if I can install two versions of R on the same machine).

      Considering the noise, that's also why I first wanted to really consider all the triplicates without the patient factor... expecting that the noise would be not significant and then be able to focus on the real changes due to the different stimuli; but actually, doing so, I do not have in the differentially expressed genes one of their interest genes (and confirmed by qPCR).

      These experiments are much more complexed than I did described (many stimuli and time-courses) so getting triplicates was already not that bad considering they work with primary cells...

      Comment


      • #4
        You can do that with DESeq as well, it just lacks a lot of beneficial functionality.

        Comment


        • #5
          How can I do it with DESeq? Can I add the "patient" parameter in addition of the "condition" one, like count ~patient + condition Sorry if my question seem naive...

          Comment


          • #6
            Yup, exactly. You'll have to use the GLM commands rather than nbinomLRT (or whatever it's called in DESeq), but that's not a problem. I should note that the same design (~patient+condition or more complicated things like ~patient+treatment*condition*time) can be used in edgeR, limma, and pretty much any other similar R package.

            Comment


            • #7
              Hi,

              I have an additional question: Once I did both fit on the full and the reduced models (in my case patient + condition; and condition alone since I want to see how taking into consideration the patient improves the DE analysis) and calculating the adjusted pvalues, I am a bit stuck...

              Here is the part of my script...

              fit1 <- fitNbinomGLMs(cds, count ~ patient + condition)
              fit0 <- fitNbinomGLMs(cds, count ~ condition)
              pvalsGLM <- nbinomGLMTest(fit1, fit0)
              padjGLM <- p.adjust(pvalsGLM, method="BH")

              Now, how can I get the final fold-changes? Are they the same than simply calculated by the function nbinomTest (in this case the fitNbinomGLMs I guess) and I have to replace the pvalues by the padlGLM?

              I looked in the vignette and it is still unclear for me...

              Sorry if my question is really too stupid.

              Sylvain

              Comment


              • #8
                Actually I have some doubts...

                When I look at the DE genes between two conditions (without considering the patient), I get 450 significant differentially expressed genes...
                When I do the script above, I have only 13!!

                So it seems obvious I do something wrong. I am a bit comcerned about the fact that when I compare patient + condition to condition only, I will get as output the genes which are behaving differently between the patients. And it is exactly the opposite I want, but I am not really sure how to get the interest guys.

                So what I want are the genes which have similar fold-changes after the stimulus, but these genes may have different basal expression between the patients...

                I wonder if I should not just consider every patient separately, then do DE without replicates and then simply compare the 3 fold-changes obtained by each gene in each patient... Not the most elegant and probably not the fastest...

                Sylvain

                Comment


                • #9
                  It's quite possible that what you're seeing is an effect due to the uncontrolled patient effect (i.e., false positives) in the result with 450 DE genes. This is particularly true given the N=3 (your residual degrees of freedom are pretty much shot).

                  The comparison you're interested is ~patient+condition vs. ~patient (or just use a Wald test and look at the appropriate coefficient in the ~patient+condition test).

                  Comment


                  • #10
                    Hi dpryan,

                    so to summarize, if I want to get the genes affected by a stimulus in all 3 patients, meaning in fold-changes, even if the basal levels of these genes may differ in all 3 patients, I have to compare ¬condition+patient to ~patient...

                    And then I take the padjGLM to know which genes are significant and the fold-changes will be in the fit1?

                    Is it ok doing like that?

                    The differences between no patient taken into consideration and being taken into consideration is really more important than I thought... Really good to know!!

                    Thanks in advance

                    s.

                    Comment


                    • #11
                      Note that comparing ~patient+condition versus ~patient will be looking for changes due to any condition (it's like a one-way ANOVA). Maybe you want this, maybe not, depends on the question you're trying to answer. But yes, the "patient" factor absorbs the differences in basal expression.

                      Yes, use p.adjust() and use the adjusted p-values to determine significance (note that you can use a looser cut off than the normal 0.05).

                      Comment


                      • #12
                        Thanks,

                        actually, after calculating the dispersion on the whole dataset, I wanted to subset it only to the two conditions I want to compare to get only the changes due to one stimulus (oif compared to the "no stimulus" condition).

                        s.

                        Comment


                        • #13
                          Hi,

                          after one night thinking about this analysis, I have one more question (sorry to bother but I really dislike doing something without understanding at least a minimum):
                          When I compare ~patient+condition to ~patient, I understand I will get in the output (meaning after sorting by pvalue threshold) the guys which are differentially expressed between my conditions, even if the basal level of expression is different between the patients... I want to be sure I will also get the guys affected by the treatment, even if the basal expressions are similar in the patients...

                          To make it sure, will I get all the genes having similar fold-chanages after the treatment, with or without similar basal expression levels between patients?

                          Many thanks in advance,

                          s.

                          Comment


                          • #14
                            I assume that "treatment" and "condition" are the same here (otherwise, just show an example of how your samples are grouped).

                            Comparing ~patient+condition to ~patient will find everything affected by condition regardless of whether there's a patient-effect. So don't worry that you'll be screwing things up for genes for which there is no patient effect (you will get slightly different results for them, but I think most people would argue that that's OK).

                            Comment


                            • #15
                              Yes, treatment and condition are the same...

                              My data are pooled like that:
                              sample#1: patient A, no stimulus
                              sample#2: patient A, stimulus 1
                              sample#3: patient A, stimulus 2
                              sample#4: patient B, no stimulus
                              sample#5: patient B, stimulus 1
                              sample#6: patient B, stimulus 2
                              sample#7: patient C, no stimulus
                              sample#8: patient C, stimulus 1
                              sample#9: patient C, stimulus 2

                              Let's say for gene a, the basal level of expression (in the no stimulus condition) are the same in all the patients, but this gene is induced 10-fold by stimulus 1 in each patient. I expect to have this guy in the significant genes (if the basal expression level is high enough...).
                              Now, for gene b, let's say this guy is still induced 10-fold by stimulus1 in all the patients, but it's level in patient B was 5-fold lower than in patients A and C (so it's expression level in patient B in the stimulus 1 condition is still more or less 5-fold lower than in both patients A and C in the stimulus 1 condition). I still expect to have this guy in the significantly differentially expressed genes...

                              s.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Essential Discoveries and Tools in Epitranscriptomics
                                by seqadmin




                                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                                Yesterday, 07:01 AM
                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              58 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 10:19 PM
                              0 responses
                              53 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 09:21 AM
                              0 responses
                              45 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-04-2024, 09:00 AM
                              0 responses
                              55 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X