SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
DESeq: question about with replicates and without any replicates. nb509 RNA Sequencing 2 10-25-2011 06:04 AM
DESeq: question about baseMean Azazel Bioinformatics 2 05-22-2011 05:40 PM
DESeq: question about baseMean. Also, replicates. Azazel Bioinformatics 5 05-18-2011 10:51 PM
DESeq question gfmgfm Bioinformatics 2 04-18-2011 03:15 AM
Another DESeq question shurjo Bioinformatics 2 05-15-2010 09:25 PM

Reply
 
Thread Tools
Old 04-28-2010, 08:24 AM   #1
chrisbala
Member
 
Location: North Carolina

Join Date: Jan 2010
Posts: 82
Default DESeq question

Hi Simon (and others),

Quick question about DESeq. So far it looks like a nice approach! But it was giving me some somewhat strange (although possible reasonable) numbers...so I wanted to check....

Sometimes the values estimated for baseMeanA and baseMeanB differ in the direction and the extent of change from the original raw counts. Is this expected? These baseMeans are meant to be 'corrected' estimates of read counts for the conditions in question? If I am getting such differences between the raw read counts and the baseMeans, is this simply a reflection of the strength of the dataset (I am running without biological reps). If so, thats fine (I think). But just making sure all is well..

For example...
Code:
Counts_Novel	Counts_Familiar	DESeq-id	DESeq-baseMean	DESeq-baseMeanA	DESeq-baseMeanB	DESeq-foldChange
141	139	28	140.176915	133.0658319	147.2879981	1.106880677
143	136	30	139.5312055	134.9532905	144.1091204	1.067844436
Is this normal?
chrisbala is offline   Reply With Quote
Old 04-28-2010, 11:57 AM   #2
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 994
Default

Hi Chris

If, by "strength", you mean sequencing depth, yes.

In order to make the samples comparable, differences in sequencing depth (a.k.a. library size) have to be addressed. If sample A is sequenced 20% deeper than sample B, then 120 counts in sample A should be considered as equal in expression strength to 100 counts in B.

The function 'estimateSizeFactors' estimates the sequencing depth, and with 'sizeFactors', you can print out the estimates. In the example just given, you might get size factors 1.2 and 1.0 for A and B (or another pair of numbers with the ratio 1.2:1).

To get the 'baseMean' value, all count values are transformed to the "base scale" (or "common scale" in the paper) by dividing them with the appropriate size factor.

I've explained our scheme to estimate the size factors a while ago in this post.

Simon
Simon Anders is offline   Reply With Quote
Old 04-28-2010, 12:01 PM   #3
chrisbala
Member
 
Location: North Carolina

Join Date: Jan 2010
Posts: 82
Default counts deseq

hi simon,

nope, thats not what i meant by 'strength', but it makes much much more sense...

thanks!
chrisbala is offline   Reply With Quote
Old 05-04-2010, 04:44 PM   #4
m!x
Member
 
Location: Seattle

Join Date: Aug 2009
Posts: 10
Default

Hi Simon,

I am trying to install DESeq package on Mac OS X 10.4 and Bioconductor 2.10.
But I keep getting this error message:

$ R CMD install /Users/lilyanamargaretha/Desktop/DESeq
* installing to library ‘/Library/Frameworks/R.framework/Resources/library’
* installing *source* package ‘DESeq’ ...
** libs
** arch - i386
gcc -arch i386 -std=gnu99 -I/Library/Frameworks/R.framework/Resources/include -I/Library/Frameworks/R.framework/Resources/include/i386 -I/usr/local/include -fPIC -g -O2 -c pval.c -o pval.o
cc1: error: unrecognized command line option "-arch"
make: *** [pval.o] Error 1
ERROR: compilation failed for package ‘DESeq’
* removing ‘/Library/Frameworks/R.framework/Resources/library/DESeq’
* restoring previous ‘/Library/Frameworks/R.framework/Resources/library/DESeq’

Any idea how to fix it?

Thanks!
m!x is offline   Reply With Quote
Old 05-05-2010, 05:13 AM   #5
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 994
Default

Hi M!x

Quote:
Originally Posted by m!x View Post
Hi Simon,
I am trying to install DESeq package on Mac OS X 10.4 and Bioconductor 2.10.
But I keep getting this error message:

[...]
cc1: error: unrecognized command line option "-arch"
This is a most peculiar error. Your C compiler does not accept the '-arch' option. However, this is a standard option of GCC (and, if I am not completely confused, has been so since long before thre release of Tiger). Hence, something in your Xcode installation seems to be utterly broken.

Could you please type in a terminal the commands "gcc -v" and "which gcc" and send me the output (by private mail to anders(at)embl(dot)de, in order to not clog the forum with details)?

Cheers
Simon
Simon Anders is offline   Reply With Quote
Old 05-12-2010, 02:12 PM   #6
m!x
Member
 
Location: Seattle

Join Date: Aug 2009
Posts: 10
Default

Hi Simon,

I finally managed to install DESeq package and ran it pretty well.
Thanks!
m!x is offline   Reply With Quote
Old 06-01-2010, 08:46 AM   #7
ikumar2
Junior Member
 
Location: Champaign

Join Date: May 2010
Posts: 1
Default

Hi Simon,

I am trying to use DESeq. After step-

cds <- newCountDataSet( countsTable, conds)

I am getting this error-

Error in round(countData) : Non-numeric argument to mathematical function

Any suggestions? Thanks in advance.
ikumar2 is offline   Reply With Quote
Old 06-01-2010, 10:21 AM   #8
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 994
Default

Quote:
Originally Posted by ikumar2 View Post
cds <- newCountDataSet( countsTable, conds)

I am getting this error-

Error in round(countData) : Non-numeric argument to mathematical function
It probably means that your countsTable contains non-numeric data, e.g. string with gene names. It should contain only numbers. Make sure to cut away any columns with annotation data and/or move them to the row / column names.

Simon
Simon Anders is offline   Reply With Quote
Old 06-03-2010, 02:15 PM   #9
dgu
Junior Member
 
Location: Seattle

Join Date: Jun 2010
Posts: 5
Default DESeq installation doubts...

Quote:
Originally Posted by Simon Anders View Post
Hi M!x



This is a most peculiar error. Your C compiler does not accept the '-arch' option. However, this is a standard option of GCC (and, if I am not completely confused, has been so since long before thre release of Tiger). Hence, something in your Xcode installation seems to be utterly broken.

Could you please type in a terminal the commands "gcc -v" and "which gcc" and send me the output (by private mail to anders(at)embl(dot)de, in order to not clog the forum with details)?

Cheers
Simon
Hello Simon!

Hope you can give me a hand about your program.

I started a session in R 2.10.1 and typed the following to load DESeq:

> source("http://www.bioconductor.org/biocLite.R")
> biocLite("DESeq")
Using R version 2.10.1, biocinstall version 2.5.10.
Installing Bioconductor version 2.5 packages:
[1] "DESeq"
Please wait...

Warning message:
In getDependencies(pkgs, dependencies, available, lib) :
package ‘DESeq’ is not available

Do you recognize the warning message? I think it may have not installed DESeq after all, 'cause R doesn't seem recognize the function "newCountDataSet".

Sorry if this is too basic but just started with R and DESeq. Any pointers will be appreciated.

Daniel
dgu is offline   Reply With Quote
Old 06-05-2010, 05:59 AM   #10
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 994
Default

Quote:
Originally Posted by dgu View Post
I started a session in R 2.10.1 and typed the following to load DESeq:

> source("http://www.bioconductor.org/biocLite.R")
> biocLite("DESeq")
Using R version 2.10.1, biocinstall version 2.5.10.
Installing Bioconductor version 2.5 packages:
[1] "DESeq"
Please wait...

Warning message:
In getDependencies(pkgs, dependencies, available, lib) :
package ‘DESeq’ is not available

Do you recognize the warning message? I think it may have not installed DESeq after all, 'cause R doesn't seem recognize the function "newCountDataSet".

Sorry if this is too basic but just started with R and DESeq. Any pointers will be appreciated
It says: "package ‘DESeq’ is not available". So, of course, you haven't installed it. The Bioconductor web server had occasional hick-ups recently, so maybe it really was not available for a while. Please try again, and, if it still does not work, follow the manual installation instructions here.

Simon
Simon Anders is offline   Reply With Quote
Old 06-06-2010, 09:14 AM   #11
Wolfgang Huber
Senior Member
 
Location: Heidelberg, Germany

Join Date: Aug 2009
Posts: 109
Default Bioconductor versions for DESeq

Hi dgu and Simon,

DESeq was introduced into Bioconductor at release 2.6. You are trying with release 2.5, that's why you get "package ‘DESeq’ is not available". Bioconductor releases are synchronised with R releases, so Bioconductor 2.6 goes with R 2.11. The recommended approach is to install R >= 2.11 and try with "biocLite" again, as you described. See also http://www.bioconductor.org -> Getting Started.

Nothing to do with server availability, which as far as I can tell has been fine.

Wolfgang Huber
__________________
Wolfgang Huber
EMBL
Wolfgang Huber is offline   Reply With Quote
Old 06-14-2010, 02:43 PM   #12
dgu
Junior Member
 
Location: Seattle

Join Date: Jun 2010
Posts: 5
Default

Quote:
Originally Posted by Wolfgang Huber View Post
Hi dgu and Simon,

DESeq was introduced into Bioconductor at release 2.6. You are trying with release 2.5, that's why you get "package ‘DESeq’ is not available". Bioconductor releases are synchronised with R releases, so Bioconductor 2.6 goes with R 2.11. The recommended approach is to install R >= 2.11 and try with "biocLite" again, as you described. See also http://www.bioconductor.org -> Getting Started.

Nothing to do with server availability, which as far as I can tell has been fine.

Wolfgang Huber
Thanks, Wolfgang!

Indeed, I downloaded R 2.11.1 and the problem was taking care of. Thanks!
dgu is offline   Reply With Quote
Old 06-14-2010, 03:06 PM   #13
dgu
Junior Member
 
Location: Seattle

Join Date: Jun 2010
Posts: 5
Default

Quote:
Originally Posted by Simon Anders View Post
It says: "package ‘DESeq’ is not available". So, of course, you haven't installed it. The Bioconductor web server had occasional hick-ups recently, so maybe it really was not available for a while. Please try again, and, if it still does not work, follow the manual installation instructions here.

Simon
Hi Simon!

Thanks for your reply. I sorted the problem out using the newest version of R.

I am going through each step of differential gene expression analysis from the DESeq manual. It is great and very well explained.

I had one question about Fig. 2 (pg 7). I managed to get the "smoothScatter" function to work, but the regression fit (in red) doesn't want to come out. Below is the script:

## Fit from the local regression
lines(log10(fittedBaseVar)~log10(baseMean),diagForYC[order(diagForYC$baseMean),],col="red")

The plotting windows kinda freezes indefinitely and I get the following message:

Error in plot.xy(xy.coords(x, y), type = type, ...) :
plot.new has not been called yet

I've rewritten the script many times with no success. Do you see anything wrong?

Thanks in advance for any pointers!

Daniel
dgu is offline   Reply With Quote
Old 06-15-2010, 01:30 AM   #14
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 994
Default

Hi Daniel,

maybe you have closed the plotting windows after calling 'smoothScatter'? The 'lines' function expects that a plot is already displayed so that the axes ares set.

Simon
Simon Anders is offline   Reply With Quote
Old 06-23-2010, 09:57 AM   #15
dgu
Junior Member
 
Location: Seattle

Join Date: Jun 2010
Posts: 5
Default

Thanks for the prompt answer, Simon!

Indeed... that was the problem. I was closing the window.

R is great. It is just not THAT straightforward, if you know what I mean.

Thanks again for your help. Do you have any updates on your DESeq paper?

Cheers,
Daniel
dgu is offline   Reply With Quote
Old 06-24-2010, 03:35 AM   #16
emma_n
Junior Member
 
Location: Liverpool

Join Date: Jun 2010
Posts: 2
Default

Hi all,

I'm very new to R and also RNA-seq in general so I hope my question is not too daft and that you might be able to help me out.

I have an experiment set up where I have a group of 8 samples. 2 x infected at 4hrs, 2 x mock infected at 4hrs, 2 x infected at 7hrs and 2 x mock infected at 7hrs.

I've been trying to use Deseq to analyse the read counts and by following the instructions from the PDF (which btw is very well written and can even be followed by a novice like me!) and I get to the point where I can look for differentially expressed genes. However, the instuctions only go into comparisons between two conditions. This is fine for looking at the infected and uninfected states but I would also like to examine whether the time course of infection has any effect on gene transcription. Is there any way of running a 3-way analysis in Deseq?
Also, when I come to display the results of the most significantly differentially expressed genes the table generated only produces the top 6 genes. Is there a way of exporting the whole table created so that i can introduce a cut-off myself based on p-values?

Thanks in advance for any help!

Emma
emma_n is offline   Reply With Quote
Old 06-29-2010, 01:07 AM   #17
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 994
Default

Dear Emma

Quote:
Originally Posted by emma_n View Post
I have an experiment set up where I have a group of 8 samples. 2 x infected at 4hrs, 2 x mock infected at 4hrs, 2 x infected at 7hrs and 2 x mock infected at 7hrs.

I've been trying to use Deseq to analyse the read counts and by following the instructions from the PDF (which btw is very well written and can even be followed by a novice like me!) and I get to the point where I can look for differentially expressed genes. However, the instuctions only go into comparisons between two conditions. This is fine for looking at the infected and uninfected states but I would also like to examine whether the time course of infection has any effect on gene transcription. Is there any way of running a 3-way analysis in Deseq?
I take it that, by "3-way analysis", you mean that you want to test an interaction contrast, i.e., to see whether the effect of the combination of later time and infection is significantly different from the sum of the effect of the later time only and the effect of the infection only.

This is not yet possible with DESeq. The stress is on "not yet": I've found a way to do this (you are not the first to ask for it), and it works on my computer; I only need to add it to DESeq, and I hope that I'll be able to do so within the next wto weeks. You are welcome to be one of the first to try this new feature.

If you want to try something right now, you can also use the 'getVarianceStabilizedData' to transform your data to a homoscedastic scale and the perform the analysis with limma, an R package for microarray analysis that is designed for such tasks and might be able to work with sequencing data as well, provided you first do a variance-stabilizing transform.

Quote:
Also, when I come to display the results of the most significantly differentially expressed genes the table generated only produces the top 6 genes. Is there a way of exporting the whole table created so that i can introduce a cut-off myself based on p-values?
The 'head' function used in the vignette instructs R to only display the first 10 or so lines of a data frame. Write 'head( ..., 100 )' to get 100 lines, or omit the 'head' altogether to get all lines.

If you want to look at the data with your favorite spreadsheet program (Excel or the like), just save the result data frame with the 'write.csv' function: write.csv( res, file="myresults.txt" )

Simon
Simon Anders is offline   Reply With Quote
Old 06-29-2010, 03:54 AM   #18
emma_n
Junior Member
 
Location: Liverpool

Join Date: Jun 2010
Posts: 2
Default

Thanks for your reply Simon, i'm now generating the tables I'm after!

As for the 3-way analysis, yes you're right that I want to be able to find out if the combination of infection plus time has and affect on the differentially transcribed genes. If you are developing this as an add in for Deseq then I'm more than happy to hang on for that. I would much prefer to do the analysis altogether within one package and if Deseq will be able to offer this in the near future that would be great for me.

Thanks again for your help

Emma
emma_n is offline   Reply With Quote
Old 07-31-2010, 11:23 AM   #19
avm970
Junior Member
 
Location: Birmingham, UK

Join Date: Jul 2010
Posts: 5
Default

Hi Everyone

In my Analysis, i am working without any relicates......I trying to find differentially expressed genes using DESeq.
After running > res <- nbinomTest(cds, "T", "N") I got following results
> head(res)
id baseMean baseMeanA baseMeanB foldChange log2FoldChange pval
1 1 62.5678206 81.639584 43.4960569 0.5327815 -0.9083842 0.3533266
2 2 123.5275329 217.192101 29.8629644 0.1374956 -2.8625423 0.1169151
3 3 0.3245974 0.000000 0.6491949 Inf Inf 1.0000000
4 4 125.0737757 207.949884 42.1976671 0.2029223 -2.3010007 0.1651501
5 5 5.2703034 9.242217 1.2983898 0.1404847 -2.8315155 0.1715208
6 6 359.4632766 485.216397 233.7101564 0.4816617 -1.0539079 0.3340572
padj resVarA resVarB
1 1 NA NA
2 1 NA NA
3 1 NA NA
4 1 NA NA
5 1 NA NA
6 1 NA NA
> head(res)
id baseMean baseMeanA baseMeanB foldChange log2FoldChange pval
1 1 62.5678206 81.639584 43.4960569 0.5327815 -0.9083842 0.3533266
2 2 123.5275329 217.192101 29.8629644 0.1374956 -2.8625423 0.1169151
3 3 0.3245974 0.000000 0.6491949 Inf Inf 1.0000000
4 4 125.0737757 207.949884 42.1976671 0.2029223 -2.3010007 0.1651501
5 5 5.2703034 9.242217 1.2983898 0.1404847 -2.8315155 0.1715208
6 6 359.4632766 485.216397 233.7101564 0.4816617 -1.0539079 0.3340572
padj resVarA resVarB
1 1 NA NA
2 1 NA NA
3 1 NA NA
4 1 NA NA
5 1 NA NA
6 1 NA NA
As a next step to get a plot I tried following command and got the error
> plot(
+ res$baseMean,
+ res$log2FoldChange
+ log="x", pch=20, cex=.1,
Error: unexpected symbol in:
"res$log2FoldChange
log"

Could you please help in this situation. how can i get reed of this errorand will get MvA plot.

Regards
Aniket
avm970 is offline   Reply With Quote
Old 07-31-2010, 12:48 PM   #20
Wolfgang Huber
Senior Member
 
Location: Heidelberg, Germany

Join Date: Aug 2009
Posts: 109
Default

Hi Aniket,

your call to plot contains a syntax error, you need a comma between res$log2FoldChange and log="x".

Have a look at http://www.r-project.org -> Manuals -> An Introduction to R if you are unsure about using R.

Best wishes
Wolfgang
__________________
Wolfgang Huber
EMBL
Wolfgang Huber is offline   Reply With Quote
Reply

Tags
deseq, illumina, rna-seq

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 11:08 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO