Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • austinpa
    Junior Member
    • Aug 2010
    • 4

    DESeq without replicates

    Hello,

    I am unable to get method=blind to work to calculate variance without replicates. Or actually any "method" argument from the vignette. Is this now obsolete? Are the only variance calculations supported now "pooled" or with replicates? Has anyone else had trouble with this recently?

    Thanks,
    Austin
  • RockChalkJayhawk
    Senior Member
    • Mar 2009
    • 192

    #2
    Originally posted by austinpa View Post
    Hello,

    I am unable to get method=blind to work to calculate variance without replicates. Or actually any "method" argument from the vignette. Is this now obsolete? Are the only variance calculations supported now "pooled" or with replicates? Has anyone else had trouble with this recently?

    Thanks,
    Austin
    try
    Code:
     method="blind"

    Comment

    • Simon Anders
      Senior Member
      • Feb 2010
      • 995

      #3
      ... and make sure you use a current version of DESeq. I've added the 'method' argument only a few months ago.

      Comment

      • RockChalkJayhawk
        Senior Member
        • Mar 2009
        • 192

        #4
        Originally posted by Simon Anders View Post
        ... and make sure you use a current version of DESeq. I've added the 'method' argument only a few months ago.
        Do you also have some more documentation on the 'method' argument? I can't seem to find it.

        Comment

        • Simon Anders
          Senior Member
          • Feb 2010
          • 995

          #5
          Originally posted by RockChalkJayhawk View Post
          Do you also have some more documentation on the 'method' argument? I can't seem to find it.
          Start R, load DESeq, and type "?estimateVarianceFunctions". If you don't see anything there about 'metho', you have an old DESeq version.

          Simon

          Comment

          • austinpa
            Junior Member
            • Aug 2010
            • 4

            #6
            DESeq without replicates

            Great. Thanks Simon. I just needed the newer version of DESeq and R.

            Austin


            Originally posted by Simon Anders View Post
            ... and make sure you use a current version of DESeq. I've added the 'method' argument only a few months ago.

            Comment

            • taoxiang180
              Junior Member
              • Jun 2010
              • 9

              #7
              Hi, everyone.
              I am a nevice to R and DESeq. Working without any replicates, I went through each step of differential gene expression analysis from the DESeq manual, But the results really puzzled me: some gene have a pval<0.05, but have a padj of 1, how the padj was calculated? and how to set the padj threshold?Any pointers will be appreciated.Thanks!
              Last edited by taoxiang180; 01-03-2011, 09:47 PM.

              Comment

              • taoxiang180
                Junior Member
                • Jun 2010
                • 9

                #8
                Hi, everyone.
                I am a novice to R and DESeq. Working without any replicates, I went through each step of differential gene expression analysis from the DESeq manual, But the results really puzzled me: some gene have a pval<0.05, but have a padj of 1, how the padj was calculated? and how to set the padj threshold?Any pointers will be appreciated.Thanks!

                Comment

                • RockChalkJayhawk
                  Senior Member
                  • Mar 2009
                  • 192

                  #9
                  Originally posted by taoxiang180 View Post
                  Hi, everyone.
                  I am a novice to R and DESeq. Working without any replicates, I went through each step of differential gene expression analysis from the DESeq manual, But the results really puzzled me: some gene have a pval<0.05, but have a padj of 1, how the padj was calculated? and how to set the padj threshold?Any pointers will be appreciated.Thanks!
                  The padj corrects for multiple testing to limit your false discovery rate. Using this value, you can essentially interpret P<0.05 to mean that you are 95% sure there is a difference in expression.

                  Comment

                  • Simon Anders
                    Senior Member
                    • Feb 2010
                    • 995

                    #10
                    Originally posted by RockChalkJayhawk View Post
                    The padj corrects for multiple testing to limit your false discovery rate. Using this value, you can essentially interpret P<0.05 to mean that you are 95% sure there is a difference in expression.
                    Not quite. Especially, your phrasing "95% sure" is to imprecise to capture the difference between an unadjusted p value and one adjusted for FDR control.

                    I've just googled a bit, looking for a good primer to explain this concept (which is of vital importance: don't even think about analysing genomics data if you don't know what the multiple hypothesis testing problem is), but most of the stuff is too technical for a reader without statistics training. This paper here might explain it reasonably well, but is a bit lengthy:

                    Pounds SB. Estimation and control of multiple testing error rates for microarray studies.
                    Briefings in Bioinformatics. 2006;7(1):25-36.


                    (DESeq uses Benjamini and Hochberg's method. See the article to learn what this means.)

                    Simon

                    Comment

                    • RockChalkJayhawk
                      Senior Member
                      • Mar 2009
                      • 192

                      #11
                      Originally posted by Simon Anders View Post
                      Not quite. Especially, your phrasing "95% sure" is to imprecise to capture the difference between an unadjusted p value and one adjusted for FDR control.

                      I've just googled a bit, looking for a good primer to explain this concept (which is of vital importance: don't even think about analysing genomics data if you don't know what the multiple hypothesis testing problem is), but most of the stuff is too technical for a reader without statistics training. This paper here might explain it reasonably well, but is a bit lengthy:

                      Pounds SB. Estimation and control of multiple testing error rates for microarray studies.
                      Briefings in Bioinformatics. 2006;7(1):25-36.


                      (DESeq uses Benjamini and Hochberg's method. See the article to learn what this means.)

                      Simon
                      Thanks for pointing that out. That's what I get for not finishing my coffee before I type.

                      Perhaps a better way to say it is that the padj changes the pvalues so that only 5% are likely false positives - it has nothing to do with the specific gene being differentially expressed.

                      Comment

                      • Simon Anders
                        Senior Member
                        • Feb 2010
                        • 995

                        #12
                        Exactly.

                        So for those to lazy to read the paper:

                        If you have 10,000 genes and you use a threshold of 0.05 on the raw p values, you should get around 500 false positives (5% of 10,000). So if there are 1000 genes with p<0.05, about half of them might be false positives.

                        If you use a threshold of 0.05 on the adjusted p values, you will find fewer genes, let's say, 100, but now, you know only 5% of these are false positives (here: 5 of 100 genes).

                        Simon

                        Comment

                        • Jeremy
                          Senior Member
                          • Nov 2009
                          • 190

                          #13
                          The main problem is no replicates, this means the variance is calculated using both your samples pooled so a large difference will have a large variance and thus less statistical significance. You may want to ignore the p-values completely and just look at highest and lowest fold change, because you have no replicates you will need to confirm pretty much everything with (replicate) qPCR.

                          Comment

                          • taoxiang180
                            Junior Member
                            • Jun 2010
                            • 9

                            #14
                            Thanks for so much recommendation!
                            When work without any replicates and have a padj=1, what can I do next using DESeq?
                            (Obviously, we can not make a conclution that there are no differential expression between two groups)
                            According to the Yoav Benjamini's method(2001), once the P-value are
                            available from any statistical software, the extra FDR calculation can be done easily within a spreadsheet software such as excel using the built-in functions, can I calculate FDR in this way?

                            Comment

                            • Simon Anders
                              Senior Member
                              • Feb 2010
                              • 995

                              #15
                              As Jeremy said, adjusted p values are of little use in your situation. Doing experiments without replicates is simply a bad idea.

                              Just start from the largest fold change (or maybe from the smallest unadjusted p value) and keep doing qPCR (with at least, say, five, biological replicates!) for all genes until you run out of patience.

                              And you don't have to use Excel, you can use R's 'p.adjust' function for the multiple testing calculation, For your convenience, DESeq runs 'p.adjust' with 'method="BH"' (Benjamini-Hochberg) on the 'pval' column of the result and reports this in the 'padj' column.

                              Alternatively, you may want to give this, quite new, idea a try:

                              Wu Z, Jenkins BD, Rynearson TA, et al. Empirical Bayes Analysis of Sequencing-based Transcriptional Profiling without Replicates.
                              BMC Bioinformatics. 2010;11(1):564. http://www.biomedcentral.com/1471-2105/11/564

                              Comment

                              Latest Articles

                              Collapse

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, 06-09-2026, 11:58 AM
                              0 responses
                              13 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-05-2026, 10:09 AM
                              0 responses
                              26 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-04-2026, 08:59 AM
                              0 responses
                              36 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-02-2026, 12:03 PM
                              0 responses
                              60 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...