Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    1. Actually, libraries with higher size factors are given more weight in the test (i.e., when calculating the p value). This is because we compare the sums of the unnormalized counts with what one should expect according to the size factors. (For details, please see the fine print in our paper.) For the fold change, we simply calculate the ration of the averages of the normalized counts, which is straight-forward. However, you may have a point that it would be more consistent to weigh the sum according to size factors or fitted variances.

    2. I don't think that this is the cause of the artifacts you see. However, as you are worried about the size factors, you may want to double check that they are good estimates. Make an MA plot, i.e. a log-log plot of means versus ratios of the normalized counts between all pairs of samples and check whether the bulk of the genes is centered around zero log fold change.

    3. Controlling false discovery rate at 0.01 sounds extremely stringent to me. Remember that controlling FDR at x% means that your hit list can be expected to have at most x% false positives. It is common to cut adjusted p values at 5% or 10% because this is quite a reasonable FDR that one can usually well live with.

    Comment


    • #17
      Excellent. Thank you so much for your input. I will calculate my own expression values and fold changes from now on. One of my earlier posts in this thread was an MA plot; you can see it above. I think it looks pretty good and centered around zero.

      As far as the adjusted p-values go, I will start using higher values, and filtering with both adjusted and non-adjusted p-values. Sadly, even with a threshold of 10% for padj, I still get only 50 genes. This is, I'm sure, attributable to the variation in the samples.

      Thank you again for your help!

      Artur

      Comment


      • #18
        Originally posted by Artur Jaroszewicz View Post
        Excellent. Thank you so much for your input. I will calculate my own expression values and fold changes from now on. One of my earlier posts in this thread was an MA plot; you can see it above. I think it looks pretty good and centered around zero.
        Yes, but it is an MA plot of averages across replicates. I suggested you do the same but only comparing two individual samples (both of the same and of different conditions) at a time. You

        As far as the adjusted p-values go, I will start using higher values, and filtering with both adjusted and non-adjusted p-values.
        You should never look at non-adjusted p-values for filtering. Chose an FDR and cut the adjusted p-values there.

        Sadly, even with a threshold of 10% for padj, I still get only 50 genes. This is, I'm sure, attributable to the variation in the samples.
        Well, you cannot always have great data.

        Two more things to check: Try to use only six of your nine samples. maybe only one of the three FACS-sorted cell type has high variance, and you can still get good results for a comparison of the other two.

        Second: Your data is not grouped, is it? If, say, samples CD90.1, UtESC-.1, and UtESC+.1 are from one mouse, CD90.2, UtESC-.2 and UtESC+.2 from a second mouse etc., a GLM will give you a lot of extra power.

        Comment


        • #19
          Dispersion plot

          HI,

          I am completely newbie in regards to statistics.
          I've been following Seqanswers forum and published papers on how to go about analyzing the data.
          I have a two normal vs. two drug treated samples which we prep for RNA-seq. After running the data through top hat and HTseq to get the read counts I run the DEseq but I don't see a difference in two conditions based on the DEseq results. When I look at the dispersion plot I get very weird plots.
          I can tell something is wrong but I don't know what this plot means to understand where the problem is coming from. Can anyone help?
          Thank you.

          Him26
          Attached Files

          Comment


          • #20
            Hi,
            Welcome. I am also somewhat a newbie (been in bioinformatics since last September), but maybe I can help somewhat. How are you estimating your dispersions? Do you have replicates? Your plots do look a little weird.. Are these log plots?
            Artur

            Comment


            • #21
              response

              Thank you Arthur,

              I have five bio reps. I don't think the plot uses log scale.
              I used the following to estimate the function.

              estimateDispersions( cds, method="per-condition",sharingMode="maximum", fitType="local")

              parametric fit fails. so I did a local fit and did percondition method with max sharing mode. I tried other method but it doesn't seems to change the plot that much.

              Following is the function I used to plot the graph.
              estimateDispersions( cds, method="per-condition",sharingMode="maximum", fitType="local")

              Thank you for your response.

              Comment


              • #22
                Originally posted by Him26 View Post
                I have a two normal vs. two drug treated samples which we prep for RNA-seq. After running the data through top hat and HTseq to get the read counts I run the DEseq but I don't see a difference in two conditions based on the DEseq results. When I look at the dispersion plot I get very weird plots.
                The plots' shapes look fine but the dispersion value is the problem. Most genes have a dispersion in the range of maybe .8 or so (hard to see but check the y value for the red line), which means a typical variation of sqrt(.8)=89%, i.e. your genes typically differ by a factor of nearly 2 already between replicates. With so much noise, you can only detect really extreme changes, and I guess you do not have any.

                Comment


                • #23
                  Im trying to plot the MVA plot using DESeq but am getting an error that I cannot make sense of.Did anyone else come across this error? If R version could be an issue, Im using R 2.14.

                  Code:
                  > head(res2)
                                 id    baseMean   baseMeanA   baseMeanB foldChange log2FoldChange
                  1 ENSG00000000419 4706.072643 4981.666042 4430.479244  0.8893569     -0.1691655
                  2 ENSG00000000457 1668.622986 1840.590047 1496.655925  0.8131392     -0.2984257
                  3 ENSG00000000460 3029.854113 3053.176700 3006.531525  0.9847224     -0.0222110
                  4 ENSG00000000938    2.475421    2.118055    2.832787  1.3374473      0.4194820
                  5 ENSG00000000971   68.213557   82.604156   53.822958  0.6515769     -0.6179927
                  6 ENSG00000001036 1535.208776 1635.138684 1435.278869  0.8777719     -0.1880819
                         pval padj
                  1 0.6054591    1
                  2 0.3865157    1
                  3 0.9477727    1
                  4 1.0000000    1
                  5 0.4141867    1
                  6 0.5881573    1
                  > plotMA(res2)
                  Error in MA[, array] - x : non-numeric argument to binary operator
                  In addition: There were 50 or more warnings (use warnings() to see the first 50)
                  Last edited by vyellapa; 08-01-2012, 10:40 PM.

                  Comment


                  • #24
                    So I found that plotMA() or plotDE() functions cannot be found for some reason. Using the plot() function solved it.

                    Code:
                    plot((log10( counts(cds)[,1] ) + log10( counts(cds)[,2] ))/2, log10( counts(cds)[,2] ) - log10( counts(cds)[,1] ))

                    Comment


                    • #25
                      They cannot be found because you are reading the manual for the development (i.e., pre-release) version but seem to have installed the released version.

                      Comment


                      • #26
                        Hi guys, I'm trying to plotPCA with DESeq, but it says it only has a maximum of 12 colors. Is there a way to get around this? I tried just using arrayQualityMetrics, but that just gave me a PDF with only one color.

                        Comment


                        • #27
                          Also, as a follow-up DESeq question, can DESeq do comparisons between 3 conditions? From the vignette, I can see it can do multi-factor testing with library type and condition. But my library-types are all the same, but I'd like to do a comparison between my 3 different conditions, instead of just doing res <- (WT, knockout), res <- (WT, control), and res(knockout, control).

                          Thoughts?

                          Comment


                          • #28
                            Hi guys,

                            So I'm wondering about DEseq and the old and new methods for accounting for variability. I see that the new methods uses the regression line as default variability for the gene, unless the variability is above that line, in which case it will use that. I'm thinking about using "sharing-mode=per gene estimate" because I have my 5 replicates. Will this method just use the variability of that gene (e.g. IL8) or will it use the variability of all genes at that abundance?

                            Comment


                            • #29
                              Dear Bill

                              if you type "plotPCA" in the R prompt you will see that it is a rather simple & short function, and rather than overloading it with lots of options, I'd encourage you to adapt it to your needs, e.g. for colour & symbol choices or other layout options.

                              With arrayQualityMetrics, you need to set the function argument 'intgroups'.

                              Hope this helps - best wishes
                              Wolfgang
                              Wolfgang Huber
                              EMBL

                              Comment


                              • #30
                                Originally posted by billstevens View Post
                                Also, as a follow-up DESeq question, can DESeq do comparisons between 3 conditions? From the vignette, I can see it can do multi-factor testing with library type and condition. But my library-types are all the same, but I'd like to do a comparison between my 3 different conditions, instead of just doing res <- (WT, knockout), res <- (WT, control), and res(knockout, control).
                                Thoughts?
                                Dear Bill,

                                DESeq can fit any sort of linear model (more precisely: generalised linear model), and you could have a look at the documentation of linear modelling in R. To set this up, you will need to ask one or several specific questions, such as: "which genes are unchanged from WT to control but up in KO" etc. - the machinery does not by itself decide that for you.

                                Best wishes
                                Wolfgang
                                Wolfgang Huber
                                EMBL

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Essential Discoveries and Tools in Epitranscriptomics
                                  by seqadmin




                                  The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                                  04-22-2024, 07:01 AM
                                • seqadmin
                                  Current Approaches to Protein Sequencing
                                  by seqadmin


                                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                  04-04-2024, 04:25 PM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, Yesterday, 08:47 AM
                                0 responses
                                12 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-11-2024, 12:08 PM
                                0 responses
                                60 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 10:19 PM
                                0 responses
                                59 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 09:21 AM
                                0 responses
                                54 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X