Seqanswers Leaderboard Ad

**dpryan** · 04-25-2014, 03:12 AM

You'd have to show us the code you used for us to be able to follow along. The normalized counts aren't actually used in calculations (in fact, they're not even stored!) but the normalization factor is, which is presumably different between your usage and DESeq2's (I would suspect that this is leading to the sign reversal). Further, you're just looking at one gene with no information sharing, which can make a big difference in terms of significance and reliability (the dispersion shrinkage is a really good idea when you have a limited number of samples). Then there's the multiple testing difference, though perhaps you're comparing raw p-values.

**TheSeqGeek** · 04-25-2014, 07:15 AM

Originally posted by dpryan View Post

You'd have to show us the code you used for us to be able to follow along. The normalized counts aren't actually used in calculations (in fact, they're not even stored!) but the normalization factor is, which is presumably different between your usage and DESeq2's (I would suspect that this is leading to the sign reversal). Further, you're just looking at one gene with no information sharing, which can make a big difference in terms of significance and reliability (the dispersion shrinkage is a really good idea when you have a limited number of samples). Then there's the multiple testing difference, though perhaps you're comparing raw p-values.

If I do a log fold change with the normalized values I get roughly the same log fold change with the same sign as what is being spat out in the end by DESeq in the final result. Those normalized counts must be used for the final results otherwise I would get a sign reversal.

In the code below I am comparing just the exponential samples.

As far as I understand DESeq2 computes the "sizeFactor" for each sample and then basically weighs each sample by that weight factor. In my opinion its almost no different than weighing by the total library size. That's where the reversal comes from.

Code:


library(DESeq2)

#import the data
directory<-"/Users/Nme/Documents/R/Files//New-Counts"
sampleFiles <- grep("E",list.files(directory),value=TRUE)

#Give names to experimental conditions
sampleCondition<-c("Mutant","Mutant","Mutant","WT","WT","WT")
sampleTable<-data.frame(sampleName=sampleFiles, fileName=sampleFiles, condition=sampleCondition)
sampleTable

ddsHTSeq<-DESeqDataSetFromHTSeqCount(sampleTable=sampleTable, directory=directory, design=~condition)
colData(ddsHTSeq)$condition<-factor(colData(ddsHTSeq)$condition, levels=c("WT","Mutant"))


dds<-DESeq(ddsHTSeq)
res<-results(dds)
res<-res[order(res$padj),]
head(res)
resultsNames(dds)
sizeFactors(dds)
normalizedCounts <- t( t(counts(dds)) / sizeFactors(dds) )

write.csv(normalizedCounts, "Exponential_normalized_counts.csv")
write.csv(res, "Exponential_results.csv")

**Wallysb01** · 04-25-2014, 08:00 AM

I think if you switch to this you’ll get what you want. In the conditions table, the one listed first is the base from which the fold changes are calculated relative to.

Code:

sampleCondition<-c("WT","WT",”WT”,"Mutant","Mutant","Mutant")

**TheSeqGeek** · 04-25-2014, 08:53 AM

Originally posted by Wallysb01 View Post

I think if you switch to this you’ll get what you want. In the conditions table, the one listed first is the base from which the fold changes are calculated relative to.

Code:

sampleCondition<-c("WT","WT",”WT”,"Mutant","Mutant","Mutant")

It doesn't change the fact that Gene1 had more counts for the mutant than the WT and Gene2 had more counts for mutant than the WT but after nornaliation Gene2 was flipped and Gene1 was not.

It has all to do with the size factor...

If I weigh my mutants by 1.3, 1,2, 1.1 and my WTs by 0.9, 0.95, 0.87 then depending on the counts some genes will be flipped...

If you looks at the sample table it is correct... so changing WT relative to Mutant in sampleCondition won't change any of this...

**TheSeqGeek** · 04-25-2014, 01:15 PM

Originally posted by dpryan View Post

You'd have to show us the code you used for us to be able to follow along. The normalized counts aren't actually used in calculations (in fact, they're not even stored!) but the normalization factor is, which is presumably different between your usage and DESeq2's (I would suspect that this is leading to the sign reversal). Further, you're just looking at one gene with no information sharing, which can make a big difference in terms of significance and reliability (the dispersion shrinkage is a really good idea when you have a limited number of samples). Then there's the multiple testing difference, though perhaps you're comparing raw p-values.

I used a Bonferroni adjustment. The most stringent one as I understand it to be.

**dpryan** · 04-25-2014, 01:19 PM

Well, I guess I should have said that it uses them for some things but not everything (after all, the negative binomial distribution itself needs integers). The normalized values will give you a decent guesstimate about the fold-change you'll get, at least it's sign. However they'll never give you the exact value, particularly since the fold-change is shrunken (that's what the "moderated estimation of fold-change" part of the title of the DESeq2 paper is referring to). How similar the size factor used by DESeq2 is to that produced by just normalizing by library size will be experiment dependent, though just using library-size normalization should be avoided for the reason layed out in the original DESeq and edgeR papers.

Of course the size factors may cause some fold-changes to change sign. Not using a size factor (or using an incorrect one) will produce meaningless results.

**dpryan** · 04-25-2014, 01:21 PM

Bonferroni is better referred to as the most conservative and really shouldn't be used anymore since you'd just be throwing away meaningful results with it. The "BH" method has supplanted it for good reasons.

**TheSeqGeek** · 04-25-2014, 01:26 PM

Originally posted by dpryan View Post

Bonferroni is better referred to as the most conservative and really shouldn't be used anymore since you'd just be throwing away meaningful results with it. The "BH" method has supplanted it for good reasons.

Sure I can get the ranks of the p-values by running a loop to for all the genes and adjust the pvalue by number of samples divided by the rank.

I guess maybe now that I know how it works I'm not quite sure what the real benefits are of DESEq2 compared to doing this on your own by applying a more sophisticated normalization technique and extracting differentially expressed genes.

**dpryan** · 04-25-2014, 01:41 PM

It depends a bit on what you have in mind in terms of methodological difference. DESeq2 can accept any size factors you give it (including different factors for each gene), so if that's your main complaint then just don't use that particular function. Other than that the methods are generally quite good, so I'm curious what you find lacking (after all, why reinvent the wheel if the one you have is doing what you want).

**TheSeqGeek** · 04-25-2014, 01:46 PM

Originally posted by dpryan View Post

It depends a bit on what you have in mind in terms of methodological difference. DESeq2 can accept any size factors you give it (including different factors for each gene), so if that's your main complaint then just don't use that particular function. Other than that the methods are generally quite good, so I'm curious what you find lacking (after all, why reinvent the wheel if the one you have is doing what you want).

Not reinvent... adjust... or mold to my expectations as they grow...

I guess I don't find anything special in normalization data with a fudge factors... I can do that upfront with my library size as one of the factors... I compared the data with the way I normalize it to DESeq2 there's hardly a difference... I guess I see it as "I know more variables than a statistical package".

Anyway, it's a good program for some applications, I just figured out its insides and moving on.

**dpryan** · 04-25-2014, 01:53 PM

Library size normalization is known to not be robust to things like differences in rRNA depletion between samples, which is a common occurrence, so I'd advise you to not just use that as a factor. It is, of course, good to not just blindly accept the values produced by DESeq2, it's certainly not a magic wand (I've certainly had to tweak things on occasion).

**TheSeqGeek** · 04-25-2014, 01:55 PM

Originally posted by dpryan View Post

Library size normalization is known to not be robust to things like differences in rRNA depletion between samples, which is a common occurrence, so I'd advise you to not just use that as a factor. It is, of course, good to not just blindly accept the values produced by DESeq2, it's certainly not a magic wand (I've certainly had to tweak things on occasion).

Completely agree. I am trying to develop something one my own with information that I ALREADY KNOW about certain genes, but anyway not sure I have the patience for it.

What do you think about conditional quantile normalization by Irizzary?

**dpryan** · 04-25-2014, 02:01 PM

I've needed to use cqn once, when there seemed to be a weird difference introduced in the library prep (it never reoccurred, so who knows why). For most of my datasets it doesn't really change anything since I don't normally have much in the way of the biases that it's normally intended to control for. For the one case when I needed it it seemed to perform nicely (I think I was using it with limma, which is also a convenient package).

**TheSeqGeek** · 04-25-2014, 02:02 PM

Originally posted by dpryan View Post

I've needed to use cqn once, when there seemed to be a weird difference introduced in the library prep (it never reoccurred, so who knows why). For most of my datasets it doesn't really change anything since I don't normally have much in the way of the biases that it's normally intended to control for. For the one case when I needed it it seemed to perform nicely (I think I was using it with limma, which is also a convenient package).

Neat thanks!

schöne Wochenende

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 55 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 52 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 45 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 55 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Flipped Results

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News