SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
EdgeR: Heatmaps polsum Bioinformatics 5 02-04-2012 12:47 PM
Help the rookie with how to use edgeR- please! finnola Bioinformatics 1 11-08-2011 10:18 AM
edgeR p-value and logFC alessandra85 Bioinformatics 3 08-29-2011 06:04 AM
edgeR Puva Bioinformatics 2 05-19-2011 10:04 AM
DEGseq or EdgeR MerFer Bioinformatics 3 02-25-2010 01:48 AM

Reply
 
Thread Tools
Old 02-08-2011, 10:24 PM   #1
ECHo
Member
 
Location: Taiwan

Join Date: Jan 2010
Posts: 17
Default edgeR

I've read and used the DEGseq R-package.
And edgeR seems to be complement to DEGseq package.

But while manipulating with the edgeR manual, I insert some data into DGEList.
However, the counts are shown without the automatic counting library size(lib.size would be NA).

Does anyone know why it is?
My R version is 2.12.0.

Thank you.
ECHo is offline   Reply With Quote
Old 02-08-2011, 11:12 PM   #2
ECHo
Member
 
Location: Taiwan

Join Date: Jan 2010
Posts: 17
Default

haha
I've found out the reason...
Some data include missing values.
ECHo is offline   Reply With Quote
Old 02-08-2011, 11:40 PM   #3
ECHo
Member
 
Location: Taiwan

Join Date: Jan 2010
Posts: 17
Default

Well, still one question:
When I want to plot the MDS, I'd like to use the following command:
plotMDS.dge(d, xlim=c(-2,1));
d is a DGE object

However, the R system always shows the following:
Error in if (mx < tol) { : missing value where TRUE/FALSE needed
Error during wrapup: cannot open the connection

Do you guys have this kind of questions?
How could I solve the problem?


Thanks!
ECHo is offline   Reply With Quote
Old 02-09-2011, 01:43 AM   #4
colindaven
Senior Member
 
Location: Germany

Join Date: Oct 2008
Posts: 415
Default

Hmm, this worked fine for me in the last few weeks despite the fact I'm not an edgeR expert.

Perhaps you still have a problem with missing values, try taking a small high quality subset of your data and retrying with that.
colindaven is offline   Reply With Quote
Old 08-21-2012, 02:35 PM   #5
carmeyeii
Senior Member
 
Location: Mexico

Join Date: Mar 2011
Posts: 137
Default

Hi all,

I am having a similar problem to this and was wondering if any one might have come across this before:

I get the message Error in if (mx < tol) { : missing value where TRUE/FALSE needed when I run the command EstimateCommonDisp(y) .


> mutant_control= x[,c(1,5,9,6,12,14)]
> group <- factor(c(1,1,1,2,2,2))
> y <- DGEList(counts=mutant_control, group=group)
Calculating library sizes from column totals.
> y <- estimateCommonDisp(y)
Error in if (mx < tol) { : missing value where TRUE/FALSE needed

> head(mutant_control)
27 31 35 32 38 40
128up 100.85404 94.66619 87.78034 101.9768 91.39150 85.91481
14-3-3epsilon 9061.95160 9391.45480 9106.62168 9604.3740 9952.53064 9667.63616
14-3-3zeta 7959.80739 8169.34580 8478.59387 8434.7244 7926.26723 8587.06141
140up 19.50291 22.34962 14.74578 15.2824 19.61044 14.21309
18w 88.16118 113.38107 97.86222 115.4046 120.79999 125.11319
26-29-p 288.60969 274.10267 262.37095 275.9005 283.34272 296.14799

> tail(mutant_control)
27 31 35 32 38 40
zip 2317.423662 2690.28298 2746.989546 2960.364282 2897.5624980 2985.039414
zormin 324.178816 270.25428 350.734099 337.749747 370.9414048 304.788741
zpg 0.000000 0.00000 0.000000 0.000000 0.0000000 0.000000
zuc 3.015593 1.21086 1.031125 5.638336 2.6222360 4.258978
zwilch 30.068800 28.26996 25.578376 27.846503 25.5089142 27.275533
zye 1.292230 0.00000 0.000000 1.486839 0.8172129 0.000000

Could it be because of the 0.0000 values?

Thanks a lot,

Carmen
carmeyeii is offline   Reply With Quote
Old 08-22-2012, 03:15 PM   #6
carmeyeii
Senior Member
 
Location: Mexico

Join Date: Mar 2011
Posts: 137
Default

So i think I figured it out and it has to do with the function expecting integers and not real numbers. If you just round your counts matrix everything will run smoothly.

Cheers!
Carmen
carmeyeii is offline   Reply With Quote
Old 08-22-2012, 11:18 PM   #7
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 994
Default

Yes, it runs smoothly but it won't give you correct results. There is a reason that edgeR and DESeq want integer values, namely that you are supposed to supply a table which, for each gene and each sample, tells the number of reads that map to the gene.

How can 2317.423662 reads map to gene 'zip'?
Simon Anders is offline   Reply With Quote
Old 08-29-2012, 05:36 PM   #8
carmeyeii
Senior Member
 
Location: Mexico

Join Date: Mar 2011
Posts: 137
Default

Thank you Simon. I was missing something fundamental about edgeR.
carmeyeii is offline   Reply With Quote
Old 04-18-2013, 07:09 AM   #9
earonesty
Member
 
Location: United States of America

Join Date: Mar 2011
Posts: 52
Default integers

edgeR expects integers, but many programs use estimation functions to improve transcript counts... ie: non integers. So you need to round.
earonesty is offline   Reply With Quote
Old 04-18-2013, 07:29 AM   #10
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 994
Default

Sigh.

No, you should not round. If you do not have integer counts, your input is not suitable for these tools. This is why they insist that you give them integer counts.

Of course, you can trick them into using your unsuitable data by rounding but than you will not get a reliable result. Please only use statistical methods off-label if you know what you are doing.
Simon Anders is offline   Reply With Quote
Old 04-18-2013, 12:49 PM   #11
lshen
Member
 
Location: Toronto

Join Date: Jan 2008
Posts: 30
Default

I compared the HTseq derived counts, and the rounded counts from cuffdiff v 2.1.1 (released last week).


I run 2-group edgeR, 3 rep. in control and 4 rep in cases.


DEGs at FDR 0.05:

HTseq derived counts: 475

rounded counts from cuffdiff v 2.1.1: 441

Overlap: 398.

In addition, 439 of the 475 htseq DEGs are of FDR <=0.1 in the results from rounded counts from cuffdiff v 2.1.1.

So, maybe using rounded counts data is acceptable in final results even though not strictly following edgeR assumptions?


Checking a few replicated (attached plot, r= 0.99 using transformation log(x)+1 ), there are some genes showing very different counts in htseq. Many of them are very short miRNAs thus missed by cufflinks ( counts=0).

count.cuffdiff.vs.htseq.ensgene71.PEN.png

Last edited by lshen; 04-18-2013 at 12:56 PM. Reason: Enhancement content
lshen is offline   Reply With Quote
Old 04-18-2013, 01:31 PM   #12
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 994
Default

Sure, if the values you obtain by rounding the output of cufflinks happen to be close to the correct values, there is a good chance that the result won't be that different, either.

But why would you do that when it is no more difficult to get the correct values in the first place?

This willingness of amassing many minor inaccuracies despite better knowledge is common in bioinformatics, but it is still sloppy science.


And, with all due respect: If the instructions for a statistical method state very clearly and explicitly that the method requires a certain kind of data as input advises against using the method on other data, and even gives a clear reason, founded on statistical theory, for that -- are you really that confident in you knowledge of advanced statistics that you think you know better?
Simon Anders is offline   Reply With Quote
Old 04-18-2013, 01:50 PM   #13
lshen
Member
 
Location: Toronto

Join Date: Jan 2008
Posts: 30
Default

I provide bioinformatics analysis services, and have people talking about using cufflinks counts directly. So I want to take a checking of it in addition to telling them assumptions that you emphasized many times.

I used pipleines of htseq count and edgeR/DESeq. And we trusted this combination more than FPKM-based results. But it relies on known gene annotations, whereas cufflinks can do de novo predictions. So I look for the non-expression tests of it (promoter, splicing), and using count based method for expression analysis.
lshen is offline   Reply With Quote
Old 04-18-2013, 02:01 PM   #14
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 994
Default

Sorry for the harsh tone, which was more directed at post #9.

I am simply getting tired from getting asked the same stuff over and over again -- and way too often, I meet this attitude that as soon as a program runs through without throwing an error, the result must be right, no matter what one has done before.
Simon Anders is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 03:37 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO