SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
DESeq-newCountDataSet lynn012 RNA Sequencing 14 03-21-2015 10:00 AM
Problems installing DEseq on linux reut Bioinformatics 6 11-13-2014 12:38 AM
DESeq problem NicoBxl Bioinformatics 21 08-11-2012 10:57 AM
GLM in DESeq SMcTaggart Bioinformatics 3 12-11-2011 12:02 AM
Is more than two conditions possible in DESEQ? greener RNA Sequencing 5 05-09-2011 04:10 PM

Reply
 
Thread Tools
Old 10-03-2011, 05:45 AM   #1
janec
Junior Member
 
Location: Scotland

Join Date: Jun 2011
Posts: 3
Default DESeq problems

I'm running DESeq for the first time and am having a few problems. I'm working through the documentation dated August 11, 2011.

I had no problems setting up a CountDataSet object with my counts table and can access the counts and estimate size factors for samples fine.

However when I tried to call the normalised counts, using the "normalized = TRUE" argument for the counts accessor, it says that there's no argument with that name. When I look at the man page for "counts", it also appears to have no arguments other than the name of the object it is counting.

I skipped that step and went on to the variance estimation stage, but when I try to run the estimateDispersions function I get a "function not found" message.

Is there something else I need to have installed?

Cheers!
janec is offline   Reply With Quote
Old 10-03-2011, 09:56 AM   #2
kopi-o
Senior Member
 
Location: Stockholm, Sweden

Join Date: Feb 2008
Posts: 319
Default

I think DESeq is a bit out of sync with the documentation/tutorial at the moment. I ran into the same problems as you, but after looking at some help pages and/or the vignette (can't remember which) I ended up using:

cds <- newCountDataSet(data, conds)
cds <- estimateSizeFactors(cds)
cds <- estimateVarianceFunctions(cds)

res <- nbinomTest(cds, "case", "control")

... which seems to work.
kopi-o is offline   Reply With Quote
Old 10-04-2011, 03:13 AM   #3
janec
Junior Member
 
Location: Scotland

Join Date: Jun 2011
Posts: 3
Default

Thanks, Kopi.

I'm using it on a remote server so am waiting for one of our sysadmins to install the latest versions of R and DESeq - much googling revealed that using the developer version of R and the latest version DESeq would cure my problems.

Last edited by janec; 10-04-2011 at 03:32 AM.
janec is offline   Reply With Quote
Old 10-07-2011, 04:36 AM   #4
Wolfgang Huber
Senior Member
 
Location: Heidelberg, Germany

Join Date: Aug 2009
Posts: 109
Default Version numbers matter

Quote:
Originally Posted by kopi-o View Post
I think DESeq is a bit out of sync with the documentation/tutorial at the moment. I ran into the same problems as you, but after looking at some help pages and/or the vignette (can't remember which) I ended up using:
Hi Kopi,

for DESeq, as for all Bioconductor packages, the software and the documentation are delivered together within the same package, and some attention is being paid that they are in sync. So, just use the documentation that comes with the version of DESeq that you are using.

Of course, if you download the software from one place, and look at documentation that Google finds in another place, these may be out of sync, as should be expected for a package that is actively being maintained.

Best wishes
Wolfgang
__________________
Wolfgang Huber
EMBL
Wolfgang Huber is offline   Reply With Quote
Old 10-07-2011, 05:35 AM   #5
kopi-o
Senior Member
 
Location: Stockholm, Sweden

Join Date: Feb 2008
Posts: 319
Default

Hi Wolfgang,

It's probably my mistake then, I may have 'de-synced' the documentation with the actual version I am using myself :-) Anyway, my post above may still give a pointer to a set of commands that works for some versions.
kopi-o is offline   Reply With Quote
Old 10-07-2011, 08:41 AM   #6
byou678
Member
 
Location: Maryland

Join Date: Aug 2011
Posts: 52
Smile

I had the similar problems as you and figured it out by using the old version manual (2010-01-19) with R2.11.1. Forget the "normalized" and " estimateDispersions" now Maybe somebody can use both to get the correct results (I cannot). And it will be great if s/he can share her/his details/experience with using "normalized" and " estimateDispersions" .

In addition, I am using R2.13.2 and cannot even successfully install the latest DESeq, Could anybody give my some ideas? Thanks.

> library("DESeq")
Loading required package: Biobase

Welcome to Bioconductor

Vignettes contain introductory material. To view, type
'browseVignettes()'. To cite Bioconductor, see
'citation("Biobase")' and for packages 'citation("pkgname")'.

Loading required package: locfit
Loading required package: akima
locfit 1.5-6 2010-01-20
Error in library.dynam(lib, package, package.lib) :
DLL 'genefilter' not found: maybe not installed for this architecture?
In addition: Warning messages:
1: '.readRDS' is deprecated.
Use 'readRDS' instead.
See help("Deprecated")
2: '.readRDS' is deprecated.
Use 'readRDS' instead.
See help("Deprecated")
Error: package/namespace load failed for 'DESeq'

Quote:
Originally Posted by janec View Post
I'm running DESeq for the first time and am having a few problems. I'm working through the documentation dated August 11, 2011.

I had no problems setting up a CountDataSet object with my counts table and can access the counts and estimate size factors for samples fine.

However when I tried to call the normalised counts, using the "normalized = TRUE" argument for the counts accessor, it says that there's no argument with that name. When I look at the man page for "counts", it also appears to have no arguments other than the name of the object it is counting.

I skipped that step and went on to the variance estimation stage, but when I try to run the estimateDispersions function I get a "function not found" message.

Is there something else I need to have installed?

Cheers!

Last edited by byou678; 10-07-2011 at 08:45 AM.
byou678 is offline   Reply With Quote
Old 10-07-2011, 09:08 AM   #7
byou678
Member
 
Location: Maryland

Join Date: Aug 2011
Posts: 52
Wink

I installed DESeq (1.4.1) after running the codes below in R2.13.2.( to update all installed packages that are out of date)

source("http://bioconductor.org/biocLite.R")
update.packages(repos=biocinstallRepos(), ask=FALSE, checkBuilt=TRUE)

And it showed:
> library("DESeq")
Warning message:
'.readRDS' is deprecated.
Use 'readRDS' instead.
See help("Deprecated")

But it still cannot show me normalized and estimateDispersions by using ?estimateDispersions
> ?estimateDispersions
No documentation for 'estimateDispersions' in specified packages and libraries:
you could try '??estimateDispersions'

Last edited by byou678; 10-07-2011 at 11:02 AM.
byou678 is offline   Reply With Quote
Old 10-07-2011, 11:25 AM   #8
sdriscoll
I like code
 
Location: San Diego, CA, USA

Join Date: Sep 2009
Posts: 438
Default

I believe you need to be running the dev build of R version 2.14 to take advantage of the latest DESeq package. If you install R 2.14 and then install DESeq via the biocLite function I think you'll get the updated version. I haven't done this yet because I don't want to run a dev version of R since I do many other things in R.
sdriscoll is offline   Reply With Quote
Old 10-08-2011, 04:37 AM   #9
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 994
Default

In Bioconductor, you can use the command 'browseVignettes()' to see the vignettes (PDF manuals) that come with all the Bioconductor packages you installed. If you read a vignette found via Google instead, you may be out of sync as Wolfgang stressed. if you have to, note that all Bioconductor vignettes end with a 'sessionInfo' that lists with the version numbers of the packages used when building the vignette. Be sure to compare this with the output of 'sessionInfo()' in your R session to make sure that the vignette you are reading fits to the package versions you have installed.
Simon Anders is offline   Reply With Quote
Old 01-28-2013, 01:57 PM   #10
byou678
Member
 
Location: Maryland

Join Date: Aug 2011
Posts: 52
Default

Could anyone tell Which Version Galaxy has the function to run DESeq?

As far as I know, Penn State Version doesn't integrate the DESeq function, and Ratsch Lab Version does not work recently.

Thanks a lot for any response!!
byou678 is offline   Reply With Quote
Old 08-05-2013, 02:57 AM   #11
vd4mindia
Member
 
Location: Milan

Join Date: May 2013
Posts: 40
Unhappy Help with DESeq analysis and filtering the DEGs from its output

I have just started using DESeq and trying to compare my results for DEGs between cuffdiff , DESeq and RankProd. I would like to ask certain stuffs as I am confused at a point after the analysis is done. I am comparing 2 conditions of tumor where I am having in total 5 samples. Its like 3 samples for peripheries giving tumor (PGT) and 2 for peripheries not giving tumor(PDGT). So what I did is according to DESeq I created a matrix for the conditions with the raw fragment counts as DESeq works only with raw fragment counts and converted the matrix to nearest integer values as the package only works with integer values. Then I used the normal DESeq commands to create my own results of DEGs but the output does not preferentially gives DEGs , it lists for all the genes. Can you tell me where I am going wrong and also is there any pre filtering I should do or post filtering to extract the list of DEGs from the output. I am sending the output file as well and the script code. Another problem is the p.adj which is the corrected p-value is also not giving proper values so I cannot on the basis of that and then list my DEGs up and down with Log2FC values. The p.adj values are either 1 or NA and even I am not getting proper value in the field of Basemean as sometimes I am getting 0 and in Log2FC is #NAME? which means excel cannot recognize the formula used to calculate it as its for those rows where one of the BaseMean is 0 and so the FC is also zero and the Log2FC cannot be calculated.

dat1<- read.table("/Users/vdas/Documents/RNA-Seq_Smaples_Udine_08032013/GBM_29052013/UD_RP_25072013/RP_matrix_RF_PGTvsPDGT.txt",sep="",header=TRUE,stringsAsFactors=FALSE)

dat1[,-1]<- lapply(lapply(dat1[,-1],round),as.integer)

write.table(dat1,"/Users/vdas/Documents/RNA-Seq_Smaples_Udine_08032013/GBM_29052013/UD_RP_25072013/rev_RF_PGTvsPDGT.txt",sep="\t",)

count_table<-read.table("/Users/vdas/Documents/RNA-Seq_Smaples_Udine_08032013/GBM_29052013/UD_RP_25072013/rev_RF_PGTvsPDGT.txt",header=T,sep="\t",row.names=1)

expt_design <- data.frame(row.names = colnames(count_table),
condition = c("PGT","PGT","PGT","PDGT","PDGT"))

expt_design

conditions = expt_design$condition

conditions

data <- newCountDataSet(count_table, conditions)

head(counts(data))

data <- estimateSizeFactors(data)

sizeFactors(data)

data <- estimateDispersions(data)

results <- nbinomTest(data, "PGT", "PDGT")

Is there anything wrong in the analysis script? Please let me know or if I have to introduce some post filtering or not. Please let me know if you want any more infos.
vd4mindia is offline   Reply With Quote
Old 08-05-2013, 04:16 AM   #12
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

On your second line, you're rounding and casting to an integer, which suggests that you don't actually have raw counts. Can you post a snippet of RP_matrix_RF_PGTvsPDGT.txt file?

BTW, it might be best to start a new thread, since the one you're replying to is quite old.
dpryan is offline   Reply With Quote
Old 08-05-2013, 04:48 AM   #13
vd4mindia
Member
 
Location: Milan

Join Date: May 2013
Posts: 40
Default

Sorry I am trying to start a new thread but am unable to do that , I donot know why.. yes I am converting the values to integer as with the direct raw counts I cannot run the DESeq commands as they are not integer values. So I have to convert them to the nearest integer and then carry out the analysis.

gene PGT-1 PGT-0 PGT-2 PDGT-0 PDGT-1
XLOC_000001 2603 1534 1764 9030 4309
XLOC_000002 304 175 208 1095 835
XLOC_000003 195 80 109 687 454
XLOC_000004 66 49 54 236 90
XLOC_000092 365 211 242 1523 624
XLOC_000093 0.666667 0.5 1 1.66667 3.33333
XLOC_000094 0 0 0 0 0
XLOC_000095 6 11.4802 4.56786 8.49762 7.22143
XLOC_000096 0 0.25 0 0 0
XLOC_000097 195.561 90.88 114.348 262.98 246.68
XLOC_000098 0 7.79035 3.89757 0 1.30276
XLOC_000099 39 18 23 55 27
XLOC_000100 10.9163 5 3 12.8974 8
XLOC_000101 533 32 28 854 288
XLOC_000102 2756.33 3090.17 2311 4873 1677.25


You can see the raw counts are having decimal values which does not work for DESeq so I converted them to nearest integer.
vd4mindia is offline   Reply With Quote
Old 08-05-2013, 04:58 AM   #14
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

If the "raw" counts have decimals, then they're not raw counts. How did you generate these counts? The typical workflow would be to use htseq-count.
dpryan is offline   Reply With Quote
Old 08-05-2013, 05:05 AM   #15
vd4mindia
Member
 
Location: Milan

Join Date: May 2013
Posts: 40
Default

I am using the genes.readgrouptracking outpule file of cuffdiff which you can use to create a matrix of the raw fragment counts. I create the matrix from this tracking file output of cuffdiff. Here is a snipplet of it

tracking_id condition replicate raw_frags internal_scaled_frags external_scaled_frags FPKM effective_length status
XLOC_000001 PGT 1 2603 1669.52 1669.52 71.2509 - OK
XLOC_000001 PGT 0 1534 1601.85 1601.85 68.3629 - OK
XLOC_000001 PGT 2 1764 2534.53 2534.53 108.168 - OK
XLOC_000001 PDGT 0 9030 9030 9030 195.012 - OK
XLOC_000001 PDGT 1 4309 4309 4309 93.0574 - OK
XLOC_000002 PGT 1 304 194.98 194.98 8.71171 - OK
XLOC_000002 PGT 0 175 182.74 182.74 8.16483 - OK
XLOC_000002 PGT 2 208 298.856 298.856 13.3529 - OK
XLOC_000002 PDGT 0 1095 1095 1095 24.7572 - OK
XLOC_000002 PDGT 1 835 835 835 18.8788 - OK
XLOC_000003 PGT 1 195 125.07 125.07 14.6047 - OK
XLOC_000003 PGT 0 80 83.5384 83.5384 9.75499 - OK
XLOC_000003 PGT 2 109 156.612 156.612 18.288 - OK
XLOC_000003 PDGT 0 687 687 687 40.595 - OK
XLOC_000003 PDGT 1 454 454 454 26.827 - OK
XLOC_000092 PDGT 1 624 624 624 39.1462 - OK
XLOC_000093 PGT 1 0.666667 0.427588 0.427588 0.0107606 - OK
XLOC_000093 PGT 0 0.5 0.522115 0.522115 0.0131394 - OK
XLOC_000093 PGT 2 1 1.43681 1.43681 0.0361585 - OK
XLOC_000093 PDGT 0 1.66667 1.66667 1.66667 0.0212244 - OK
XLOC_000093 PDGT 1 3.33333 3.33333 3.33333 0.0424487 - OK
XLOC_000094 PGT 1 0 0 0 0 - OK
XLOC_000094 PGT 0 0 0 0 0 - OK
XLOC_000094 PGT 2 0 0 0 0 - OK
XLOC_000094 PDGT 0 0 0 0 0 - OK
XLOC_000094 PDGT 1 0 0 0 0 - OK
XLOC_000095 PGT 1 6 3.84829 3.84829 0.0342992 - OK
XLOC_000095 PGT 0 11.4802 11.9879 11.9879 0.106847 - OK
XLOC_000095 PGT 2 4.56786 6.56314 6.56314 0.0584964 - OK
XLOC_000095 PDGT 0 8.49762 8.49762 8.49762 0.0383257 - OK
XLOC_000095 PDGT 1 7.22143 7.22143 7.22143 0.0325698 - OK


the 4th coulmn is the raw fragment count and as you can see some of the values are having decimal values.
vd4mindia is offline   Reply With Quote
Old 08-05-2013, 05:13 AM   #16
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

Those aren't raw counts, they're estimated raw counts.

Hint: don't try to coerce cufflinks data into DESeq.
dpryan is offline   Reply With Quote
Old 08-05-2013, 05:40 AM   #17
vd4mindia
Member
 
Location: Milan

Join Date: May 2013
Posts: 40
Default

So, you saw that since these are estimated raw counts and they are coming from cufflinks which are infact considering noise for duplicate read counts as well. However for DESeq we donot consider the duplicate read counts right? For DESeq each counting unit is evidence of one sequencing read. So we should not consider this estimated raw count and rather generate the raw count using HT-Seq for my data and then use them for DESeq analysis? Is that what you want to mean?
vd4mindia is offline   Reply With Quote
Old 08-05-2013, 05:44 AM   #18
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

The validity of the statistical model used by DESeq depends on its input being integer read counts directly from individual samples. Trying to use anything else will just produce unreliable results.

So yes, use htseq-count to generate your count matrix.
dpryan is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 09:36 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO