Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Originally posted by Simon Anders View Post
    As Jeremy said, adjusted p values are of little use in your situation. Doing experiments without replicates is simply a bad idea.

    Wu Z, Jenkins BD, Rynearson TA, et al. Empirical Bayes Analysis of Sequencing-based Transcriptional Profiling without Replicates.
    BMC Bioinformatics. 2010;11(1):564. http://www.biomedcentral.com/1471-2105/11/564

    Hi,Simon,Thanks again for your help.
    For me, it is really a miserable work.
    I will try it again following you advice.

    Comment


    • #17
      From the DESeq paper:
      If neither condition has replicates, one can still perform an analysis based on the assumption that for most genes, there is no true differential abundance, and that a valid mean-variance relationship can be estimated from treating the two samples as if they were replicates.

      Code:
                 SampleA    SampleB
      Gene1        10000      20000
      Gene2        15000      25000
      Does this mean the variance for Gene1SampleA is calculated from 10000 and 20000

      --or--

      The variance for Gene1SampleA depends on the whole SampleA variance involving 10000 and 15000?
      --
      Jeremy Leipzig
      Bioinformatics Programmer
      --
      My blog
      Twitter

      Comment


      • #18
        There is no such thing as a "SampleA variance". Just calculating the sample variance of all the numbers in sample A would not give any meaningful number.

        What DESeq does is the following. For each gene, it calculates the variance of the counts from all sample of one treatment group (or, in your case, simply from all samples). (Special care is taken here to take into account that different samples may have been sequenced to different depth.) As this variance is obtained from a very small number of samples, it is very imprecise. We now assume that genes of similar expression strength have similar variance and so take an average of all such genes to get a variance value to be used for a certain expression strength. (Technically, this is done with a local regression of the gamma family.)

        Simon

        Comment


        • #19
          (Hi Simon, emailed you but that might have been blocked with attachments)

          I notice DESeq is calling low or zero fold change genes as significantly differentially expressed in situations with no replicates and a small number of genes (e.g. miRNAs). Is this something you have encountered?

          Last edited by Zigster; 02-23-2011, 01:33 PM.
          --
          Jeremy Leipzig
          Bioinformatics Programmer
          --
          My blog
          Twitter

          Comment


          • #20
            This shouldn't happen. Maybe you send me some more details.

            There is one issue with some numerical instability in the p value calculation that I have not yet got fully straightened out. Especially in samples with very large variance, one very rarely encounters the situation that hardly any genes are differentially expressed, and some genes with really small log fold changes get flagged as differentially expressed even though there are genes with much stronger fold change nearby which are not called. If this happen, please try to call 'nbinomTest' with the optional parameter 'eps=1e-8' (or an even lower value).

            Simon

            Comment


            • #21
              Where to use method?

              Hi Simon,
              I just started to use your package today. I too dont have any replicates (shy...). Your vignettes did not talk about method="blind". Perhaps its a new addition (you said that already).

              Just wondering where should I use it. Being naive to both expression analysis and to R, i ask this dump question. Is it at when calculating padj values?

              Thanks very much in advance,
              Gowthaman

              Comment


              • #22
                Originally posted by ragowthaman View Post
                Your vignettes did not talk about method="blind". Perhaps its a new addition (you said that already).
                You are reading an outdated version of the vignette. The current version of DESeq is here:
                The Bioconductor project aims to develop and share open source software for precise and repeatable analysis of biological data. We foster an inclusive and collaborative community of developers and data scientists.

                Comment


                • #23
                  DESeq without replicates

                  hi,simon
                  I am trying to use DESeq, i have 42058 genes,get 64 DE genes the resulte is that the DE gene is very little.
                  1.>head(ab)
                  Num_reads.a Num_reads.b
                  Glyma01g00270.1 8 0
                  Glyma01g00320.1 833 1019
                  Glyma01g00380.1 1430 2019
                  Glyma01g00400.1 1275 1135
                  Glyma01g00400.2 236 108
                  Glyma01g00400.3 12 7
                  2.> conds<- c("A","B")
                  3.> cds<- newCountDataSet(ab,conds)
                  4.>cds <- estimateSizeFactors( cds )
                  5.> cds <- estimateVarianceFunctions( cds ,method='blind')
                  6.>res2 <- nbinomTest( cds, "A", "B" )
                  7.>> plot(
                  + res2$baseMean,
                  + res2$log2FoldChange,
                  + log="x", pch=20, cex=.1,
                  + col = ifelse( res2$padj < .1, "red", "black" ) )
                  8.>table( res_sig = res2$padj < .1, res2_sig = res2$padj < .1 )
                  res2_sig
                  res_sig FALSE TRUE
                  FALSE 41994 0
                  TRUE 0 64

                  I know it's very dangerous to jump to conclusions with no replicates,but i think i can get more DE genes. can i think about P-value and padj ?
                  i do not how to do it ? can you give me Any suggestions?
                  thanks !

                  lei

                  Comment


                  • #24
                    The purpose of the 'blind' method was never to offer a proper analysis method for experiments without replication, because is is simply not possible (not just "dangerous") to get conclusions. The whole point of replicates is to allow you to draw the line for significance, i.e., to know how much fold change you need to see to consider an effect real. Without replicates, you can guess, of course, but it has to be a wild guess, unless you are happy with the extremely over-careful guess that e.g. the "blind" method give you.

                    Just out of curiosity: Why don't you have replicates? Every other post here, somebody wants to do DE analysis without replication, and I am genuinely puzzled why. It cannot be budget reasons, because with multiplexing, sequencing two samples to half the depth is not that much more expensive than one sample to full depth.

                    Comment


                    • #25
                      Hi,
                      I' d like to use DESeq to analyze miRNome data by next generation sequencing.
                      Unfortunately I haven't any replicates.
                      After reading the paper "Differential expression analysis for sequence count data" I have two doubts:
                      first - Is the miRNA dataset too small to consider the assumption that for
                      most of them there is no true differential abundance?
                      second - Without replicates resVarA and resVarB are both NA (probably due to the ratio 1/m-1 where m is the number of replicates). How the program calculates the p-value if the parameters sigmaA and sigmaB, related to negative binomial distribution, are "incomplete"?

                      Thanks in advanced

                      Comment


                      • #26
                        first - Is the miRNA dataset too small to consider the assumption that for
                        most of them there is no true differential abundance?
                        The normalization is quite robust with respect to this. The test for differential expression without replicates will not get you very far without replicates, unless you expect that there only very few but very strong effects.
                        second - Without replicates resVarA and resVarB are both NA (probably due to the ratio 1/m-1 where m is the number of replicates). How the program calculates the p-value if the parameters sigmaA and sigmaB, related to negative binomial distribution, are "incomplete"?[/QUOTE]

                        It calculates one sigma, by pretending that the two samples re replicates. See the paper for details.

                        How comes you do not have replicates?

                        [QUOTE]

                        Comment


                        • #27
                          Okay since this topic is active I guess I will throw my relatively unique dataset into the mix regarding how to properly calculate variances in DESeq. I have the following dataset
                          Cell line untreated (2 biological reps)
                          Cell line treated X (2 biological reps)
                          Human sample untreated (1 rep)
                          Human sample treated X (1 rep)
                          Human sample treated Y (1 rep)

                          I certainly understand that it would be great to have multiple replicates for the human samples, but without going into details, let's just say it isn't gonna happen. What I have done so far is to essentially do 3 DESeq variance estimations before the nbinom analysis.
                          1st - Cell line treated + untreated - using the replicates
                          2nd - Human treated X + Human untreated - using the "blind" parameter
                          3rd - Human treated Y + Human untreated - again using the "blind" parameter

                          Since these samples are all similar, I wonder if it would be advisable to calculate the variance simultaneously on all samples, then do the individual comparisons at the nbinom step? Essentially wondering if adding this extra data might provide a somewhat better variance calculation for those samples without replicates...

                          Also any recommendation on what heatmap R package that would allow me to include all these samples on a single heatmap?
                          Last edited by Gators; 11-08-2011, 11:59 AM.

                          Comment


                          • #28
                            Your cell line replicates are probably isogenic while your human samples are from different humans, and hence, the variation between humans will be much larger than what you expect between the cell lines. I hence wonder what you mean by "these samples are all similar".

                            Comment


                            • #29
                              Actually the human samples were all one donor. Cells were isolated from a human donor, then were either treated with X, treated with Y, or untreated. The cell line I would expect to be different, however they (the cell line and human-derived cells) are all the same cell type, so on some level they should be similar despite the fact that the cell line has been immortalized.
                              Last edited by Gators; 11-08-2011, 12:49 PM.

                              Comment


                              • #30
                                Originally posted by Simon Anders View Post
                                Start R, load DESeq, and type "?estimateVarianceFunctions". If you don't see anything there about 'metho', you have an old DESeq version.

                                Simon
                                Hi,

                                I've installed the new version of DESeq (1.6.0), but when I type "?estimateVarianceFunctions"
                                this is what I get:

                                estimateVarianceFunctions packageESeq R Documentation

                                REMOVED

                                Description:

                                This function has been removed. Instead, use
                                ‘estimateDispersions’.

                                So it has been removed from the new version, or what does it mean?

                                Thanks

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Strategies for Sequencing Challenging Samples
                                  by seqadmin


                                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                  03-22-2024, 06:39 AM
                                • seqadmin
                                  Techniques and Challenges in Conservation Genomics
                                  by seqadmin



                                  The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                  Avian Conservation
                                  Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                  03-08-2024, 10:41 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, Yesterday, 06:37 PM
                                0 responses
                                8 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, Yesterday, 06:07 PM
                                0 responses
                                8 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-22-2024, 10:03 AM
                                0 responses
                                49 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-21-2024, 07:32 AM
                                0 responses
                                67 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X