Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Any software or R package can do this?

    Hello, I have several samples (8 samples from 8 sampling sites) to from metagenomic sequencing. These samples are from different temperature (environemntal samples). I would like to find over-abundant genes in highest temperature sample.

    However, the R software package such as baySeq/DEseq/DEseq2 must have a replicate at each site in order to detect the over-abundant genes and get the P value. Which means that I have to sequence twice for each sample, but I didn't. I only have 8 sequencing datasets by far

    Any one knows any package does not require replicates?

    Thanks,
    Ben

  • #2
    DESeq2 doesn't need replicates, but they are strongly encouraged, and most bioinformaticians would advise the use of replicates if consulted prior to an experimental study. Without knowledge of the biological variation, it is difficult to establish whether or not an observed difference is statistically significant.

    You may be able to treat different temperature brackets as biological replicates, but it would depend on the specifics of your investigation as well as what you are looking to get out of the experiment.

    Comment


    • #3
      Originally posted by gringer View Post
      DESeq2 doesn't need replicates, but they are strongly encouraged, and most bioinformaticians would advise the use of replicates if consulted prior to an experimental study. Without knowledge of the biological variation, it is difficult to establish whether or not an observed difference is statistically significant.

      You may be able to treat different temperature brackets as biological replicates, but it would depend on the specifics of your investigation as well as what you are looking to get out of the experiment.
      Hi Gringer,

      Are you sure about this (DESeq2) doesn't need replicates? I remember I emailed the author long time, he said it need replicates. At that time, it might be DEseq (DeSeq2 has not come out).

      Personally, I don't remember any papers that I read didn't do replicates in their experiments. If you have read any papers without replicates in the analysis, I appreciate if you could share with me. I just want to make sure.

      My 8 samples sites are from 8 different temperature. For example 80 C, 70 C .... 0 C. I would like to treat them independently. I really hate to group arbitrarily. For example, I could group them in two groups 4 sites > 40 degree C. 4 sites < 40 degree C. I would like to treat them as 8 individual groups.

      Comment


      • #4
        Originally posted by SDPA_Pet View Post
        Are you sure about this (DESeq2) doesn't need replicates?
        From the DESeq2 documentation, frequently asked questions, section 5.8:

        Can I use DESeq2 to analyze a dataset without replicates?

        If a DESeqDataSet is provided with an experimental design without replicates, a warning is printed, that the samples are treated as replicates for estimation of dispersion. This kind of analysis is only useful for exploring the data, but will not provide the kind of proper statistical inference on differences between groups. Without biological replicates, it is not possible to estimate the biological variability of each gene. More details can be found in the manual page for ?DESeq.
        Last edited by gringer; 07-02-2016, 12:32 PM.

        Comment


        • #5
          Well, it says without replicates, it's only useful for exploring the data. I need some of statistical results.

          Comment


          • #6
            Originally posted by SDPA_Pet View Post
            Well, it says without replicates, it's only useful for exploring the data. I need some of statistical results.
            Even though this thread is about microarray data the pointers are universally applicable. https://www.biostars.org/p/14130/

            Comment


            • #7
              Good discussion, thanks GenoMax.

              From my point of view, you'll either be using programs written by people that say replicates are recommended, or you'll be using programs written by people who don't have a good understanding of the issues associated with replicate-free analysis.

              But as mentioned in that biostars discussion, there's no reason why you can't use your results to generate a good hypothesis about your data, then proceed with follow-up studies to explore that hypothesis. Statistics can supplement other ideas and hypotheses, but should not be the sole determinant of whether or not a particular result is useful. The ASA statement on the p-value is a worthwhile read in that regard:



              Researchers should bring many contextual factors into play to derive scientific inferences, including the design of a study, the quality of the measurements, the external evidence for the phenomenon under study, and the validity of assumptions that underlie the data analysis. Pragmatic considerations often require binary, “yes-no” decisions, but this does not mean that p-values alone can ensure that a decision is correct or incorrect. The widespread use of “statistical significance” (generally interpreted as p < 0.05) as a license for making a claim of a scientific finding (or implied truth) leads to considerable distortion of the scientific process.

              Comment


              • #8
                Hi SDPA_Pet,

                You could try the software STAMP (http://kiwi.cs.dal.ca/Software/STAMP), it supports tests for comparing pairs of samples or samples organized into two or more treatment groups.

                Very useful for projects (see NCBI for published studies) with limited number of samples or no replicates (n=1).

                From the author:

                STAMP is a software package for analyzing taxonomic or metabolic profiles that promotes ‘best practices’ in choosing appropriate statistical techniques and reporting results. Statistical hypothesis tests for pairs of samples or groups of samples is support along with a wide range of exploratory plots. STAMP encourages the use of effect sizes and confidence intervals in assessing biological importance. A user friendly, graphical interface permits easy exploration of statistical results and generation of publication quality plots for inferring the biological relevance of features in a metagenomic profile. STAMP is open source, extensible via a plugin framework, and available for all major platforms.

                Comment


                • #9
                  Originally posted by gringer View Post
                  From my point of view, you'll either be using programs written by people that say replicates are recommended, or you'll be using programs written by people who don't have a good understanding of the issues associated with replicate-free analysis.
                  Although I respect your opinion, sometimes programs were written for specific purpose/data; for example the lack of replicates. There are various quantitative techniques (see below 'Replication, lies and lesser-known truths regarding experimental design in environmental microbiology') that, when used properly, will allow scientists (e.g., environmental microbiologists) to make strong statistical conclusions from experimental and comparative data (Lennon 2011).



                  Originally posted by gringer View Post
                  But as mentioned in that biostars discussion, there's no reason why you can't use your results to generate a good hypothesis about your data, then proceed with follow-up studies to explore that hypothesis. Statistics can supplement other ideas and hypotheses, but should not be the sole determinant of whether or not a particular result is useful. The ASA statement on the p-value is a worthwhile read in that regard:

                  http://amstat.tandfonline.com/doi/ab...5.2016.1154108

                  Following this discussion, there are two amazing opinions published in Environmental Microbiology. The first, by James I. Prosser (Replicate or lie http://onlinelibrary.wiley.com/doi/1...0.02201.x/full) which discussed the lack of replication in current studies and a rebuttal by Jay T. Lennon (http://onlinelibrary.wiley.com/doi/1...445.x/abstract) which stated that 'although replication is an important component of experimental design, it is possible to do good science without replication".
                  Last edited by vingomez; 07-05-2016, 11:52 AM. Reason: typo

                  Comment


                  • #10
                    Although I respect your opinion, sometimes programs were written for specific purpose/data; for example the lack of replicates.
                    Fair enough. I'd like to clarify that I don't think it's essential, just that it's a good idea if possible. The "replication" is also not necessarily an attempt at doing an identical experiment, but I think there should be some way to make a good guess at biological variation. Without that guess, I have difficulty in seeing how the importance of a particular result (if of marginal difference) can be established.

                    This replication, or estimation of biological variation, also doesn't need to involve further sampling. It could be something as simple as comparing with results from a public dataset for the same organism (but not necessarily the same experiment).

                    I notice that the second paper talks about a time series dataset with single observations for each time point, which might be similar to the temperature situation that SPDA_Pet has:

                    The researchers overcame this hurdle using a Bayesian technique called dynamic linear modelling (DLM), which explicitly deals with the non-independence of time-series data (Pole et al., 1994).
                    Last edited by gringer; 07-05-2016, 12:34 PM.

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Essential Discoveries and Tools in Epitranscriptomics
                      by seqadmin




                      The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                      04-22-2024, 07:01 AM
                    • seqadmin
                      Current Approaches to Protein Sequencing
                      by seqadmin


                      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                      04-04-2024, 04:25 PM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, Yesterday, 11:49 AM
                    0 responses
                    13 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-24-2024, 08:47 AM
                    0 responses
                    16 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-11-2024, 12:08 PM
                    0 responses
                    61 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 10:19 PM
                    0 responses
                    60 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X