Seqanswers Leaderboard Ad

**Simon Anders** · 05-07-2012, 01:07 PM

1. Actually, libraries with higher size factors are given more weight in the test (i.e., when calculating the p value). This is because we compare the sums of the unnormalized counts with what one should expect according to the size factors. (For details, please see the fine print in our paper.) For the fold change, we simply calculate the ration of the averages of the normalized counts, which is straight-forward. However, you may have a point that it would be more consistent to weigh the sum according to size factors or fitted variances.

2. I don't think that this is the cause of the artifacts you see. However, as you are worried about the size factors, you may want to double check that they are good estimates. Make an MA plot, i.e. a log-log plot of means versus ratios of the normalized counts between all pairs of samples and check whether the bulk of the genes is centered around zero log fold change.

3. Controlling false discovery rate at 0.01 sounds extremely stringent to me. Remember that controlling FDR at x% means that your hit list can be expected to have at most x% false positives. It is common to cut adjusted p values at 5% or 10% because this is quite a reasonable FDR that one can usually well live with.

**Artur Jaroszewicz** · 05-07-2012, 02:12 PM

Excellent. Thank you so much for your input. I will calculate my own expression values and fold changes from now on. One of my earlier posts in this thread was an MA plot; you can see it above. I think it looks pretty good and centered around zero.

As far as the adjusted p-values go, I will start using higher values, and filtering with both adjusted and non-adjusted p-values. Sadly, even with a threshold of 10% for padj, I still get only 50 genes. This is, I'm sure, attributable to the variation in the samples.

Thank you again for your help!

Artur

**Simon Anders** · 05-07-2012, 10:37 PM

Originally posted by Artur Jaroszewicz View Post

Excellent. Thank you so much for your input. I will calculate my own expression values and fold changes from now on. One of my earlier posts in this thread was an MA plot; you can see it above. I think it looks pretty good and centered around zero.

Yes, but it is an MA plot of averages across replicates. I suggested you do the same but only comparing two individual samples (both of the same and of different conditions) at a time. You

As far as the adjusted p-values go, I will start using higher values, and filtering with both adjusted and non-adjusted p-values.

You should never look at non-adjusted p-values for filtering. Chose an FDR and cut the adjusted p-values there.

Sadly, even with a threshold of 10% for padj, I still get only 50 genes. This is, I'm sure, attributable to the variation in the samples.

Well, you cannot always have great data.

Two more things to check: Try to use only six of your nine samples. maybe only one of the three FACS-sorted cell type has high variance, and you can still get good results for a comparison of the other two.

Second: Your data is not grouped, is it? If, say, samples CD90.1, UtESC-.1, and UtESC+.1 are from one mouse, CD90.2, UtESC-.2 and UtESC+.2 from a second mouse etc., a GLM will give you a lot of extra power.

**Him26** · 06-29-2012, 04:35 PM

Dispersion plot

HI,

I am completely newbie in regards to statistics.
I've been following Seqanswers forum and published papers on how to go about analyzing the data.
I have a two normal vs. two drug treated samples which we prep for RNA-seq. After running the data through top hat and HTseq to get the read counts I run the DEseq but I don't see a difference in two conditions based on the DEseq results. When I look at the dispersion plot I get very weird plots.
I can tell something is wrong but I don't know what this plot means to understand where the problem is coming from. Can anyone help?
Thank you.

Him26

Attached Files

**Artur Jaroszewicz** · 06-30-2012, 01:37 PM

Hi,
Welcome. I am also somewhat a newbie (been in bioinformatics since last September), but maybe I can help somewhat. How are you estimating your dispersions? Do you have replicates? Your plots do look a little weird.. Are these log plots?
Artur

**Him26** · 06-30-2012, 10:34 PM

response

Thank you Arthur,

I have five bio reps. I don't think the plot uses log scale.
I used the following to estimate the function.

estimateDispersions( cds, method="per-condition",sharingMode="maximum", fitType="local")

parametric fit fails. so I did a local fit and did percondition method with max sharing mode. I tried other method but it doesn't seems to change the plot that much.

Following is the function I used to plot the graph.
estimateDispersions( cds, method="per-condition",sharingMode="maximum", fitType="local")

Thank you for your response.

**Simon Anders** · 07-09-2012, 01:21 PM

Originally posted by Him26 View Post

I have a two normal vs. two drug treated samples which we prep for RNA-seq. After running the data through top hat and HTseq to get the read counts I run the DEseq but I don't see a difference in two conditions based on the DEseq results. When I look at the dispersion plot I get very weird plots.

The plots' shapes look fine but the dispersion value is the problem. Most genes have a dispersion in the range of maybe .8 or so (hard to see but check the y value for the red line), which means a typical variation of sqrt(.8)=89%, i.e. your genes typically differ by a factor of nearly 2 already between replicates. With so much noise, you can only detect really extreme changes, and I guess you do not have any.

**vyellapa** · 08-01-2012, 10:37 PM

Im trying to plot the MVA plot using DESeq but am getting an error that I cannot make sense of.Did anyone else come across this error? If R version could be an issue, Im using R 2.14.

Code:

> head(res2)
               id    baseMean   baseMeanA   baseMeanB foldChange log2FoldChange
1 ENSG00000000419 4706.072643 4981.666042 4430.479244  0.8893569     -0.1691655
2 ENSG00000000457 1668.622986 1840.590047 1496.655925  0.8131392     -0.2984257
3 ENSG00000000460 3029.854113 3053.176700 3006.531525  0.9847224     -0.0222110
4 ENSG00000000938    2.475421    2.118055    2.832787  1.3374473      0.4194820
5 ENSG00000000971   68.213557   82.604156   53.822958  0.6515769     -0.6179927
6 ENSG00000001036 1535.208776 1635.138684 1435.278869  0.8777719     -0.1880819
       pval padj
1 0.6054591    1
2 0.3865157    1
3 0.9477727    1
4 1.0000000    1
5 0.4141867    1
6 0.5881573    1
> plotMA(res2)
Error in MA[, array] - x : non-numeric argument to binary operator
In addition: There were 50 or more warnings (use warnings() to see the first 50)

**vyellapa** · 08-02-2012, 12:50 PM

So I found that plotMA() or plotDE() functions cannot be found for some reason. Using the plot() function solved it.

Code:

plot((log10( counts(cds)[,1] ) + log10( counts(cds)[,2] ))/2, log10( counts(cds)[,2] ) - log10( counts(cds)[,1] ))

**Simon Anders** · 08-06-2012, 05:42 AM

They cannot be found because you are reading the manual for the development (i.e., pre-release) version but seem to have installed the released version.

**billstevens** · 11-01-2012, 02:10 PM

Hi guys, I'm trying to plotPCA with DESeq, but it says it only has a maximum of 12 colors. Is there a way to get around this? I tried just using arrayQualityMetrics, but that just gave me a PDF with only one color.

**billstevens** · 11-01-2012, 05:58 PM

Also, as a follow-up DESeq question, can DESeq do comparisons between 3 conditions? From the vignette, I can see it can do multi-factor testing with library type and condition. But my library-types are all the same, but I'd like to do a comparison between my 3 different conditions, instead of just doing res <- (WT, knockout), res <- (WT, control), and res(knockout, control).

Thoughts?

**billstevens** · 11-05-2012, 02:33 PM

Hi guys,

So I'm wondering about DEseq and the old and new methods for accounting for variability. I see that the new methods uses the regression line as default variability for the gene, unless the variability is above that line, in which case it will use that. I'm thinking about using "sharing-mode=per gene estimate" because I have my 5 replicates. Will this method just use the variability of that gene (e.g. IL8) or will it use the variability of all genes at that abundance?

**Wolfgang Huber** · 11-13-2012, 02:35 AM

Dear Bill

if you type "plotPCA" in the R prompt you will see that it is a rather simple & short function, and rather than overloading it with lots of options, I'd encourage you to adapt it to your needs, e.g. for colour & symbol choices or other layout options.

With arrayQualityMetrics, you need to set the function argument 'intgroups'.

Hope this helps - best wishes
Wolfgang

**Wolfgang Huber** · 11-13-2012, 02:45 AM

Originally posted by billstevens View Post

Also, as a follow-up DESeq question, can DESeq do comparisons between 3 conditions? From the vignette, I can see it can do multi-factor testing with library type and condition. But my library-types are all the same, but I'd like to do a comparison between my 3 different conditions, instead of just doing res <- (WT, knockout), res <- (WT, control), and res(knockout, control).
Thoughts?

Dear Bill,

DESeq can fit any sort of linear model (more precisely: generalised linear model), and you could have a look at the documentation of linear modelling in R. To set this up, you will need to ask one or several specific questions, such as: "which genes are unchanged from WT to control but up in KO" etc. - the machinery does not by itself decide that for you.

Best wishes
Wolfgang

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, Today, 11:49 AM	0 responses 12 views 0 likes	Last Post by seqadmin Today, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, Yesterday, 08:47 AM	0 responses 16 views 0 likes	Last Post by seqadmin Yesterday, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 61 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News