Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • DESeq questions

    Hello,

    I have 16 samples from 16 different human tissues (say "A","B",...,"P"), so no biological replicates (75bp single end reads). I want to study the expression levels of a specific group of genes for a specific tissue (let's say this tissue is "B"). A few questions on this scenario:

    1. Is it valid to treat tissue B as 1 condition and the other 15 tissues as replicates for the "non-B" condition?

    2. Although I'm only interested in a specific group of genes, is it recommended to supply DESeq with a count table of all genes instead of only the few genes, I'm interested in? (in order to give DESeq more information about the samples)

    3. If 1. is ok: Is the following configuration a good choice for the dispersion estimation?
    Code:
    estimateDispersions(cds,method="blind",sharingMode="fit-only",fitType="local")
    4. What would be the explanatory power of the analysis? (My hope is that 16 different samples give enough information to obtain meaningful results, despite the absence of replicates?!)

    5. Would it be a great improvement to use technical replicates? (either 50bp paired end or 100bp stranded)

    Thank you.

    edit: added point 5
    Last edited by hanshart; 02-25-2012, 10:52 AM.

  • #2
    Can anyone help me with this, please?

    Comment


    • #3
      Just to get it started I can try to answer some of your points:

      Hello,

      I have 16 samples from 16 different human tissues (say "A","B",...,"P"), so no biological replicates (75bp single end reads). I want to study the expression levels of a specific group of genes for a specific tissue (let's say this tissue is "B"). A few questions on this scenario:

      1. Is it valid to treat tissue B as 1 condition and the other 15 tissues as replicates for the "non-B" condition?
      If your questions is "what genes are uniquely/differentially expressed in tissue B" then I'd say yes this is valid. However, many DE RNAseq tools have hiccups when you feed them highly diverse "biological replicates", and I think this should be a concern if you choose this type of analysis.

      2. Although I'm only interested in a specific group of genes, is it recommended to supply DESeq with a count table of all genes instead of only the few genes, I'm interested in? (in order to give DESeq more information about the samples)
      You will bias your samples by picking and choosing this way. The number of reads/gene are PROPORTIONAL to all other sequence that has been sampled and you should include all this information.

      3. If 1. is ok: Is the following configuration a good choice for the dispersion estimation?
      Code:
      estimateDispersions(cds,method="blind",sharingMode="fit-only",fitType="local")
      This is specific to DESeq so someone else might be able to answer.

      4. What would be the explanatory power of the analysis? (My hope is that 16 different samples give enough information to obtain meaningful results, despite the absence of replicates?!)
      More samples does not make up for lack of biological replicates. You have very little power in your question (what is unique to tissue B) except for the big obvious differences that you can hardly miss. Biological replicates are important.

      5. Would it be a great improvement to use technical replicates? (either 50bp paired end or 100bp stranded)
      It's been shown several times in the literature that technical replicates in RNAsq are very tight and almost unnecessary for an experiment. This also depends on how deeply you've sequenced (if you don't have enough coverage, your variation is much higher). Tech reps are nice when you can afford them, but biological reps would be far more useful if you had to choose one.

      Comment


      • #4
        What do you mean by "expression levels of a specific group of genes for a specific tissue." Are you wanting to know if they are differentially expressed in comparison to other tissues? Or maybe their expression level within that specific tissue in comparison to other genes within the same tissue?

        For example, does it matter that gene A is expressed in tissue B and not tissue C, but at lower levels than tissue D? Or do you just want to know that the gene A is expressed more strongly than gene B in tissue B? Or do you just want to know that the gene is expressed at some level of significance?

        Also, for #2 use all the genes and look at your genes of interest at the end.

        Comment


        • #5
          You can sequence these samples. You can analyze them. And you might even find something interesting. But you'll never be able to publish it (or at least you shouldn't be able to). And you'll always be wondering if it is reproducible and what you would find if you had the replicates you need.

          So go ahead and sequence them but your main priority should be in figuring out how you are going to get the replicates you need. If it's not possible, you should reconsider putting your time to something potentially more productive.
          --------------
          Ethan

          Comment


          • #6
            You definitely need to explain more about your design before one can answer you questions well. Are these samples from the same individual or each tissue from a different one? Do you expect that your genes of interest will typically have similar expression in all but one tissue? What do you hope to do with your result?

            Comment


            • #7
              Thank you all for your answers and sorry for the ambiguous explanations. I will try to explain it more detailed now.

              I have 16 different samples from 16 different human tissues (e.g. liver, brain, lung,...). Each sample comprises the total mRNA and is from a different individual. Say 's1',...,'s16' are the names of the samples.
              The initial question was, whether a specific gene (say 'g8') is differentially expressed in a particular sample (say 's3') (that was the guess).

              'g8' belongs to the family of AARS genes. Just because of interest I studied the normalized read counts not only for 'g8' but also for all other AARS-genes over all samples. This gave me 16 normalized counts for each AARS-gen (including 'g8'). Analysing the normalized read counts for each gene with a Boxplot shows that some of the genes have an outlier, corresponding to some sample. Also the gene 'g8' has one outlier. And this outlier is exactly correspondig to the sample 's3'. So for gene 'g8': the normalized read count of sample 's3' seems to be indeed _different_ than the normalized read counts of the other samples. I thought a Boxplot is not the best way to be really sure if this outlier is indeed _different_ from the other read counts or possible by chance. So I want to apply differential expression analysis. Therefore I provided DESeq with the read counts of all AARS genes (just to have not only one gene for each sample, which would give no information about the distribution of the counts).


              1.) I treated 's3' as one sample with no replicates and all other samples ('s1','s2','s4',...,'s16') as replicates for a second sample under consideration. -> is this valid?
              (In fact, the variance over all the other 15 tissues should be at least as high as the variance for 15 biological replicates of the same tissue (?!). So observing that gene 'g8' is differentially expressed between 's3' and a second sample comprising the 15 other tissues, should have at least the same statistical power as I would have compared 's3' against a particular 15-times-replicated tissue!?

              IF 1.) is valid, than I have a few technical questions about the implementation ...

              2.) is it enough to provide DESeq with the counts of only the 37 AARS Genes? (if _not_ -> how "much" genes should I take into account? (Consider that I cannot use _all_ genes, as many of them would fluctuate to much between the 15 samples, which results in that no gene would be differentially expressed (because of the much to high variation in the "replicated" sample)

              3.) In DESeq: Is the following configuration a good choice for the dispersion estimation? (providing as little as possible assumptions, I hope)
              Code:
              estimateDispersions(cds,method="blind",sharingMode="fit-only",fitType="local")
              4.) deleted

              5.) Would it be a great improvement to use technical replicates? (either 50bp paired end or 100bp stranded)

              I hope everything is clear now.
              Many thanks again!

              Comment


              • #8
                @ Jean
                1.) I also observed the hiccups you mentioned in cufflinks. DESeq didn't show any problems with the configuration I stated above.

                4.) As you said, I can only observe the "big obvious differences" with this method. But as I'm only interested to make sure that the observation is significant, it should be ok, to do so !?

                5.) thanks to make this clear to me

                @ ETHANol
                Thanks for this hint. But I simply do not have any biological replicates. I have just the data I have and for my diploma thesis I am, beside other tasks, supposed to analyse if this gene is differentially expressed. I don't know if my observation is even not enough for a statement in the thesis?

                Thanks to all again

                Comment


                • #9
                  In short, there is not much statistics you can do. If a gene sticks out in tissue X derived from subject Y, it may be because this gene is special for this tissue or because this gene is special for this person. You are willing to always attribute the difference to the tissue rather than to the person.

                  In other words, you are bargaining on the assumption that differences in gene expression are much stronger between different tissues of the same subject than between samples from the same tissue taken from different subjects. This is a risky assumption that is certainly untenable for quite a few genes, and due to your lack of replicates there is no way whatsoever to check it.
                  Last edited by Simon Anders; 03-07-2012, 05:22 AM. Reason: corrected

                  Comment


                  • #10
                    Originally posted by Simon Anders View Post
                    In other words, you are bargaining on the assumption that differences in gene expression are much stronger between different tissues of the same subject than between samples from the same tissue taken from different subjects. This is a risky assumption that is certainly untenable for quite a few genes, and due to your lack of replicates there is no way whatsoever to check it.
                    Thank you Simon for this precise answer. I never have seen this fact (don't ask me why). I totally agree with you and should ask my advisor how to deal with this problem

                    Comment


                    • #11
                      @ Jean
                      1.) I also observed the hiccups you mentioned in cufflinks. DESeq didn't show any problems with the configuration I stated above.
                      By hiccups I meant statistical hiccups. When the variation in your replicates is very high, you will violate some of the assumptions for the differential expression test. Whatever method/analysis you apply you'll want to really look at your data (does it fit the assumptions and models used, do the called and uncalled genes make biological sense).

                      Everyone else had good advice. Unfortunately you don't have an ideal dataset for the questions you want to ask.

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        Strategies for Sequencing Challenging Samples
                        by seqadmin


                        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                        03-22-2024, 06:39 AM
                      • seqadmin
                        Techniques and Challenges in Conservation Genomics
                        by seqadmin



                        The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                        Avian Conservation
                        Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                        03-08-2024, 10:41 AM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, 03-27-2024, 06:37 PM
                      0 responses
                      13 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 03-27-2024, 06:07 PM
                      0 responses
                      11 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 03-22-2024, 10:03 AM
                      0 responses
                      53 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 03-21-2024, 07:32 AM
                      0 responses
                      69 views
                      0 likes
                      Last Post seqadmin  
                      Working...
                      X