SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
DESeq2 Simon Anders Bioinformatics 123 07-06-2015 01:45 AM
Getting Started with SOAPsnp taylorm Bioinformatics 0 03-03-2011 10:51 AM
Help - MAQ getting started inesdesantiago Bioinformatics 12 04-24-2010 10:52 PM
Help getting started viralnerd Bioinformatics 4 12-18-2009 06:52 AM
Getting Started n2hasan Bioinformatics 0 09-05-2009 12:59 PM

Reply
 
Thread Tools
Old 12-16-2013, 05:33 PM   #1
ThePresident
Member
 
Location: Sherbrooke / Canada

Join Date: Jun 2012
Posts: 72
Question Getting started with DESeq2

Hello everyone,

I'm trying to run DESeq2 package on R Studio under Wondows8 and I just can't do it right even if I've successfully used old version of DESeq under Linux...

So, I installed DESeq2 from biocLite, then i run
Code:
require(DESeq2)
and got:

Code:
Loading required package: DESeq2
Loading required package: GenomicRanges
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: ‘BiocGenerics’

The following objects are masked from ‘package:parallel’:

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ, clusterExport,
    clusterMap, parApply, parCapply, parLapply, parLapplyLB, parRapply,
    parSapply, parSapplyLB

The following object is masked from ‘package:stats’:

    xtabs

The following objects are masked from ‘package:base’:

    anyDuplicated, append, as.data.frame, as.vector, cbind, colnames,
    duplicated, eval, evalq, Filter, Find, get, intersect, is.unsorted,
    lapply, Map, mapply, match, mget, order, paste, pmax, pmax.int, pmin,
    pmin.int, Position, rank, rbind, Reduce, rep.int, rownames, sapply,
    setdiff, sort, table, tapply, union, unique, unlist

Loading required package: IRanges
Loading required package: XVector
Loading required package: Rcpp
Loading required package: RcppArmadillo
There are a bunch of "masked" objects... but no errors, so I assumed everything is okay. Now, I already have a HTSeq input table with 4 columns representing my conditions (2 wild type and 2 treated). Now, how the heck can I load this file to DESeq in order to analyze it? In the vignette, there is a section (I know...), but honestly I'm almost losing my hair by trying to make it work.

I tried to run the first line :
Code:
directory <- system.file("extdata", package="HTSeqCountTable", mustWork=TRUE)
and got an immediate error :
Code:
Error in system.file("extdata", package = "HTSeqCountTable", mustWork = TRUE) : 
  no file found
Thank you guys in advance,

TP
ThePresident is offline   Reply With Quote
Old 12-17-2013, 12:17 AM   #2
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

Is there actually a package called "HTSeqCountTable"?

Just use make a table and then
Code:
dds <- DESeqDataSetFromHTSeqCount(sampleTable = sampleTable,directory = directory,design= ~ condition)
as is done in the vignette? The "system.file(...)" stuff is just to load the example data from the pasilla package, so don't do that part.
dpryan is offline   Reply With Quote
Old 12-17-2013, 01:02 AM   #3
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 994
Default

Quote:
Originally Posted by ThePresident View Post
I tried to run the first line :
Code:
directory <- system.file("extdata", package="HTSeqCountTable", mustWork=TRUE)
and got an immediate error :
Code:
Error in system.file("extdata", package = "HTSeqCountTable", mustWork = TRUE) : 
  no file found
The "first line" of what? Where have you copied that from, and what is it supposed to do?
Simon Anders is offline   Reply With Quote
Old 12-17-2013, 06:46 AM   #4
ThePresident
Member
 
Location: Sherbrooke / Canada

Join Date: Jun 2012
Posts: 72
Default

Hello,

Thank you for your time guys.

@dpryan: that's what was confusing me. I didn't see that the whole system.file was just to load the example. Thanks for clarifying that for me. Otherwise, I'm still swimming in partial confusion concerning directory and conditions options? What the heck is directory option and how do you exactly define it along with conditions? Here is the begining of my HTseq table:
Code:
> head (sampleTable)
  X     WT1    WT2  LYS1  LYS2
1 0    3697   2413  2163  2772
2 1    8603   5701  4875  6380
3 2 1062195  38969 30022 32932
4 3 5852834 112090 80004 88024
5 4    4930    706   164   594
6 5     737    677   486   721
@Simon: I know how frustrating all this can be for you. However, bear in mind that there's a bunch of biologists that pass 99% of their working time on the bench and have very little experience with R and other language program. For us, all this feels like Chinese Don't get me wrong, I respect so much all what you guys do, but I would love to see a vignette that is somewhat simpler... Like, step 1: load you data into R with this code/function where parameter A is for this and parameter B for this. Step 2: and so on. Thanks God, you, Ryan and others from this forum are always willing to give a hand when something is going wrong, so we manage to get what we need at the end.

Thank you again guys,
TP
ThePresident is offline   Reply With Quote
Old 12-17-2013, 07:03 AM   #5
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

Your sampleTable actually looks like a merged table of counts, though without any geneIDs, which will make life difficult later on. "directory" is just the name of the directory that the files are in. In the vignette, everything is in a directory that's specified by where ever the pasilla package contents are installed. You can probably get away without inputting a directory (just start R in the directory that contains the count files). "conditions" is a dataframe that describes what group(s) each of the samples belongs. Note that the ordering must be the same as the files are read (i.e., lexicographic order). I've seen a couple people screw things up by assuming otherwise.
dpryan is offline   Reply With Quote
Old 12-17-2013, 07:03 AM   #6
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 994
Default

I understand that it is confusing. My question was not supposed to sound annoyed, but meant literally: Which document were you following?

There are many tutorials on using Bioconductor tools out there on the web, in many different versions. So, if you want to ask a question like yours, you should write something like this: "I tried to follow the explanations of the DESeq2 vignette, as supplied with version x.x.. of the package [or: as found at this web address], and I copied the command "...", changing "..." for "...", and then got this error address: ..."

Without these details, we have to do a lot of guessing on why you have typed this command.

Making a vignette readable for an absolute beginner is hard. I actually think that we have done a very good job on this with the old DESeq vignette and with the new version of the DEXSeq vignette. The DESeq2 vignette still needs a bit of work, especially at the initial steps on data loading.

However, if you read about a specific function in a vignette and want to know, which parameters this functions take and what the precise meaning of these parameters is, the vignette is not the place to look for this information. This is what the help pages for the individual functions are for, so be sure to read them, too!
Simon Anders is offline   Reply With Quote
Old 12-17-2013, 07:12 AM   #7
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 994
Default

Quote:
Originally Posted by ThePresident View Post
Otherwise, I'm still swimming in partial confusion concerning directory and conditions options? What the heck is directory option and how do you exactly define it along with conditions?
You are looking at the wrong function. DESeqDataSetFromHTSeqCount is used to read the count data from several files that have been produced with htseq-count. That tool produces one file for each sample, with only two columns, namely gene ID and read count in this sample. DESeqDataSetFromHTSeqCount reads all these files and puts them together in one matrix of read counts. It wants a "sample table", with one row for each sample and columns for sample ID, file name of the htseq-count file, and experimental condition(s).

You already have your matrix of read counts, so you should use DESeqDataSetFromMatrix to get started. Type "?DESeqDataSetFromMatrix" to see how it works.

And as D P Ryan has already pointed out, the first column of your count table seems to contain some indices, and you don't have row names with the gene IDs. You need to fix this first. See "?read.table" or "?read.csv" to learn how to read in CSV files or text files with tabular data, or ask, but then tell us how you have the counts on your hard disk.
Simon Anders is offline   Reply With Quote
Old 12-18-2013, 08:43 AM   #8
ThePresident
Member
 
Location: Sherbrooke / Canada

Join Date: Jun 2012
Posts: 72
Default

Hey guys,

Just a quick update: with your help, I've succeeded to run DESeq2 properly with DESeqDataSetFromMatrix. Thank you again!
ThePresident is offline   Reply With Quote
Old 02-27-2014, 04:14 PM   #9
spujr
Junior Member
 
Location: USA

Join Date: Sep 2011
Posts: 4
Default

Hi I am having trouble getting started with DESeq2. I am able to load in the file okay and don't receive any errors until I start the "DESeq" step. Quick background, my data input is a merged file of raw counts. I have 4 conditions with 3 reps each.

Here are my steps:
"
> countsTable<-read.delim("merged_raw_counts.txt",header=T)
> rownames(countsTable)<-countsTable$Name
> countsTable<-countsTable[,-1]
> head(countsTable)
T0R1 T0R2 T0R3 T0S1 T0S2 T0S3 T1R1 T1R2 T1R3 T1S1 T1S2 T1S3
gene01g00010 0 5 2 3 4 6 4 1 0 7 1 1
gene01g00020 1 0 12 0 6 2 3 3 1 2 0 0
gene01g00030 0 0 0 0 0 0 0 0 0 1 0 0
gene01g00040 0 0 0 0 0 0 0 0 0 0 0 0
gene01g00050 0 0 0 1 0 0 0 0 0 0 0 0
gene01g00060 13 99 77 109 76 82 82 140 115 157 72 118
>countData<-matrix(countsTable)
>colData<-data.frame(condition=factor(c("T0R","T0R","T0R","T0S","T0S","T0S","T1R","T1R","T1R","T1S","T1S","T1S")))
>dds<-DESeqDataSetFromMatrix(countData=countsTable,colData,formula(~condition))
Usage note: the following factors have 3 or more levels:

condition

For DESeq2 versions < 1.3, if you plan on extracting results for
these factors, we recommend using betaPrior=FALSE as an argument
when calling DESeq().
As currently implemented in version 1.2, the log2 fold changes can
vary if the base level is changed, when extracting results for a
factor with 3 or more levels. A solution will be implemented in
version 1.3 which allows for the use of a beta prior and symmetric
log2 fold change estimates regardless of the selection of base level.
>colData(dds)$condition<-factor(colData(dds)$condition,levels=c("T0R","T0S"))
> dds
class: DESeqDataSet
dim: 34903 12
exptData(0):
assays(1): counts
rownames(34903): gene01g00010 gene01g00020 ... gene00g99390 gene00g99400
rowData metadata column names(0):
colnames(12): 1 2 ... 11 12
colData names(1): condition
> dds<-DESeq(dds,betaPrior=F)
estimating size factors
estimating dispersions
gene-wise dispersion estimates

error: element-wise multiplication: incompatible matrix dimensions: 12x1 and 6x1

Error: element-wise multiplication: incompatible matrix dimensions: 12x1 and 6x1
"

I am not sure where I am going wrong. I see the warning about have more than 3 levels. Although I have these 4 different conditions I am not interested in doing a pairwise comparision on all. I would ideally like to compare T0R to T0S, T1R to T1S, T0R to T1R, and T0S to T1S.

Thanks in advance for insight on this.
spujr is offline   Reply With Quote
Old 02-27-2014, 04:40 PM   #10
spujr
Junior Member
 
Location: USA

Join Date: Sep 2011
Posts: 4
Default

Hi, I figured out the error, quite silly actually.
The step of
>colData(dds)$condition<-factor(colData(dds)$condition,levels=c("T0R","T0S"))

should list all four conditions, not just 2.

>colData(dds)$condition<-factor(colData(dds)$condition,levels=c("T0R","T0S","T1R","T1S"))

Hopefully that's the only problem.
spujr is offline   Reply With Quote
Reply

Tags
deseq2, htseq, rna-seq

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 01:43 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO