SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
bug of wiggles blue mood Bioinformatics 0 08-06-2012 07:23 AM
tophat bug MeixiaZhao Bioinformatics 5 04-26-2012 02:14 PM
A samtools v0.1.11 bug??? nturay Bioinformatics 0 04-28-2011 03:52 AM
TopHat bug telos Bioinformatics 0 03-17-2010 08:50 AM
Tophat Bug? jling Bioinformatics 0 02-11-2010 04:14 PM

Reply
 
Thread Tools
Old 09-28-2012, 01:57 PM   #1
danielfortin86
Junior Member
 
Location: California

Join Date: Aug 2008
Posts: 5
Default EdgeR LogCPM Bug?

When I output the logCPM values for my genes, they are identical across contrasts. However, when I output the logFold values these seem to be correctly calculated for each contrast. Is this a bug or am I not understanding what the logCPM is that is returned by glmLRT?

Code:
library(edgeR)

# Read in the quantification data
data <- read.table('File.txt', header = TRUE, row.names = 1)

keep <- rowSums(cpm(data) > 2) >= min(summary(3)) 
data <- data[keep,]

y <- DGEList(counts = data)
y <- calcNormFactors(y)

rownames(design) <- colnames(y)

y <- estimateGLMCommonDisp(y, design, verbose = TRUE)
y <- estimateGLMTrendedDisp(y, design)
y <- estimateGLMTagwiseDisp(y, design)

# Fit the linear model to the design matrix
fit <- glmFit(y, design)

# Make Contrast Matrix
my.contrasts <- makeContrasts(....... # Different contrasts
	levels = design)


# Loop to calculate the contrasts
for (i in 1:dim(my.contrasts)[2]) {
# Create the log ratios
	lrt <- glmLRT(y, fit, contrast = my.contrasts[,i])
	tempLR <- lrt$table[2]
	if(exists("LogCPM")){  LogCPM<  <- cbind(LogCPM, tempLR)   }
	else{ LogCPM  <- tempLR }
}

# rows of LogCPM are identical?!?
danielfortin86 is offline   Reply With Quote
Old 09-29-2012, 07:13 PM   #2
Gordon Smyth
Member
 
Location: Melbourne, Australia

Join Date: Apr 2011
Posts: 91
Default

This is not a bug, rather you are not understanding what logCPM represents.

The help page for glmLRT says that logCPM is "the average log2-counts-per-million". The average is taken over all libraries in your dataset y, and hence is always the same regardless of the contrast being tested.

logPCM is intended as a measure of the overall expression level of the transcript, and is displayed by functions such as plotBCV().

BTW, it is not a simple average. The logCPM is computed using the edgeR function mglmOneGroup(), taking into account the estimated dispersions and the library sizes.

Gordon
Gordon Smyth is offline   Reply With Quote
Old 10-05-2012, 05:56 PM   #3
danielfortin86
Junior Member
 
Location: California

Join Date: Aug 2008
Posts: 5
Default

Thanks! I was interpreting the documentation as meaning that the average was calculated for specific factor level combinations / contrasts. Wouldn't plotting the actual "A" for a specific contrast be more informative? Presumably that can be extracted by multiplying the matrix with the particular coefficients. What's the rationale for plotting the experiment-wide "A" rather than the condition-specific one? My question also applies to dispersion? What about condition-specific dispersion? Any help you could provide would be greatly appreciated!

D
danielfortin86 is offline   Reply With Quote
Old 10-05-2012, 06:40 PM   #4
Gordon Smyth
Member
 
Location: Melbourne, Australia

Join Date: Apr 2011
Posts: 91
Default

Quote:
Originally Posted by danielfortin86 View Post
Thanks! I was interpreting the documentation as meaning that the average was calculated for specific factor level combinations / contrasts.
But there's nothing in the documentation that says so.
Quote:
Wouldn't plotting the actual "A" for a specific contrast be more informative? Presumably that can be extracted by multiplying the matrix with the particular coefficients.
Since this is your suggestion, the onus is on you to suggest a reason why this would be more informative. For my part, I don't know even how you would define "the actual A" to be for a general contrast. For example, suppose you have contrast=c(-0.4,0.5,0,-0.1). How you would you define A and what would it tell you?
Quote:
What's the rationale for plotting the experiment-wide "A" rather than the condition-specific one?
I already told you in my previous reply.
Quote:
My question also applies to dispersion? What about condition-specific dispersion? Any help you could provide would be greatly appreciated!
I wonder whether you might be underestimating the statistical penalties and uncertainties introduced by fitting overly complicated models and estimating huge numbers of parameters, especially dispersion parameters. It is very unlikely that fitting condition-specific dispersions would be worthwhile, let alone a contrast-specific dispersions. But see

https://www.stat.math.ethz.ch/piperm...er/048438.html
Gordon Smyth is offline   Reply With Quote
Old 11-26-2012, 08:51 AM   #5
earonesty
Member
 
Location: United States of America

Join Date: Mar 2011
Posts: 52
Default

Quote:
Originally Posted by Gordon Smyth View Post
But there's nothing in the documentation that says so.

Since this is your suggestion, the onus is on you to suggest a reason why this would be more informative. For my part, I don't know even how you would define "the actual A" to be for a general contrast. For example, suppose you have contrast=c(-0.4,0.5,0,-0.1). How you would you define A and what would it tell you?
It *might* be informative to show a summary of original expression values and variability per group - just so you can see if edgeR's being silly and might need some tweaking ... (like when expression values are zero, or when there's outliers, or when you forgot to use tagwise dispersion, or when you used it... and would like to know what value edgeR used.... etc).
earonesty is offline   Reply With Quote
Old 11-26-2012, 08:03 PM   #6
Gordon Smyth
Member
 
Location: Melbourne, Australia

Join Date: Apr 2011
Posts: 91
Default

Quote:
Originally Posted by earonesty View Post
It *might* be informative to show a summary of original expression values and variability per group - just so you can see if edgeR's being silly and might need some tweaking ... (like when expression values are zero, or when there's outliers, or when you forgot to use tagwise dispersion, or when you used it... and would like to know what value edgeR used.... etc).
Certainly may be useful to look individual expression values for leading genes, and all the edgeR cases studies show how to do this. The original poster was asking for something different however, for the definition of logCPM to be changed.
Gordon Smyth is offline   Reply With Quote
Old 05-20-2013, 10:17 AM   #7
earonesty
Member
 
Location: United States of America

Join Date: Mar 2011
Posts: 52
Default

I meant as extra columns for edgeR, not as a "leading gene thing". Stuff like a normalized mean and variability for each group. I've taken to moderating edgeR's results by removing results where the within-group variability is high, for example. (I think people are interpreting the logCPM value in that context... because they are expecting outputs that summarize group-wise summary information and don't see it.)
earonesty is offline   Reply With Quote
Old 05-22-2013, 01:46 AM   #8
parham
Junior Member
 
Location: Rotterdam, Netherlands

Join Date: May 2013
Posts: 1
Default

when you perform edgeR analysis comparing two groups (mock vs treat), you get a list of genes with probabilities of differential expression with certain logFC. Assuming for this logFC edgeR is calculating a mean expression value for samples within each group,
(1) how can I see what value is used (by edgeR) for each gene of each group?
(2) how is this mean expression value calculated? (is it an average cpm values of samples of each group?)
parham is offline   Reply With Quote
Reply

Tags
edger, glmlrt, logcpm, ma plot

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 07:23 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO