Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Getting started with DESeq2

    Hello everyone,

    I'm trying to run DESeq2 package on R Studio under Wondows8 and I just can't do it right even if I've successfully used old version of DESeq under Linux...

    So, I installed DESeq2 from biocLite, then i run
    Code:
    require(DESeq2)
    and got:

    Code:
    Loading required package: DESeq2
    Loading required package: GenomicRanges
    Loading required package: BiocGenerics
    Loading required package: parallel
    
    Attaching package: ‘BiocGenerics’
    
    The following objects are masked from ‘package:parallel’:
    
        clusterApply, clusterApplyLB, clusterCall, clusterEvalQ, clusterExport,
        clusterMap, parApply, parCapply, parLapply, parLapplyLB, parRapply,
        parSapply, parSapplyLB
    
    The following object is masked from ‘package:stats’:
    
        xtabs
    
    The following objects are masked from ‘package:base’:
    
        anyDuplicated, append, as.data.frame, as.vector, cbind, colnames,
        duplicated, eval, evalq, Filter, Find, get, intersect, is.unsorted,
        lapply, Map, mapply, match, mget, order, paste, pmax, pmax.int, pmin,
        pmin.int, Position, rank, rbind, Reduce, rep.int, rownames, sapply,
        setdiff, sort, table, tapply, union, unique, unlist
    
    Loading required package: IRanges
    Loading required package: XVector
    Loading required package: Rcpp
    Loading required package: RcppArmadillo
    There are a bunch of "masked" objects... but no errors, so I assumed everything is okay. Now, I already have a HTSeq input table with 4 columns representing my conditions (2 wild type and 2 treated). Now, how the heck can I load this file to DESeq in order to analyze it? In the vignette, there is a section (I know...), but honestly I'm almost losing my hair by trying to make it work.

    I tried to run the first line :
    Code:
    directory <- system.file("extdata", package="HTSeqCountTable", mustWork=TRUE)
    and got an immediate error :
    Code:
    Error in system.file("extdata", package = "HTSeqCountTable", mustWork = TRUE) : 
      no file found
    Thank you guys in advance,

    TP

  • #2
    Is there actually a package called "HTSeqCountTable"?

    Just use make a table and then
    Code:
    dds <- DESeqDataSetFromHTSeqCount(sampleTable = sampleTable,directory = directory,design= ~ condition)
    as is done in the vignette? The "system.file(...)" stuff is just to load the example data from the pasilla package, so don't do that part.

    Comment


    • #3
      Originally posted by ThePresident View Post
      I tried to run the first line :
      Code:
      directory <- system.file("extdata", package="HTSeqCountTable", mustWork=TRUE)
      and got an immediate error :
      Code:
      Error in system.file("extdata", package = "HTSeqCountTable", mustWork = TRUE) : 
        no file found
      The "first line" of what? Where have you copied that from, and what is it supposed to do?

      Comment


      • #4
        Hello,

        Thank you for your time guys.

        @dpryan: that's what was confusing me. I didn't see that the whole system.file was just to load the example. Thanks for clarifying that for me. Otherwise, I'm still swimming in partial confusion concerning directory and conditions options? What the heck is directory option and how do you exactly define it along with conditions? Here is the begining of my HTseq table:
        Code:
        > head (sampleTable)
          X     WT1    WT2  LYS1  LYS2
        1 0    3697   2413  2163  2772
        2 1    8603   5701  4875  6380
        3 2 1062195  38969 30022 32932
        4 3 5852834 112090 80004 88024
        5 4    4930    706   164   594
        6 5     737    677   486   721
        @Simon: I know how frustrating all this can be for you. However, bear in mind that there's a bunch of biologists that pass 99% of their working time on the bench and have very little experience with R and other language program. For us, all this feels like Chinese Don't get me wrong, I respect so much all what you guys do, but I would love to see a vignette that is somewhat simpler... Like, step 1: load you data into R with this code/function where parameter A is for this and parameter B for this. Step 2: and so on. Thanks God, you, Ryan and others from this forum are always willing to give a hand when something is going wrong, so we manage to get what we need at the end.

        Thank you again guys,
        TP

        Comment


        • #5
          Your sampleTable actually looks like a merged table of counts, though without any geneIDs, which will make life difficult later on. "directory" is just the name of the directory that the files are in. In the vignette, everything is in a directory that's specified by where ever the pasilla package contents are installed. You can probably get away without inputting a directory (just start R in the directory that contains the count files). "conditions" is a dataframe that describes what group(s) each of the samples belongs. Note that the ordering must be the same as the files are read (i.e., lexicographic order). I've seen a couple people screw things up by assuming otherwise.

          Comment


          • #6
            I understand that it is confusing. My question was not supposed to sound annoyed, but meant literally: Which document were you following?

            There are many tutorials on using Bioconductor tools out there on the web, in many different versions. So, if you want to ask a question like yours, you should write something like this: "I tried to follow the explanations of the DESeq2 vignette, as supplied with version x.x.. of the package [or: as found at this web address], and I copied the command "...", changing "..." for "...", and then got this error address: ..."

            Without these details, we have to do a lot of guessing on why you have typed this command.

            Making a vignette readable for an absolute beginner is hard. I actually think that we have done a very good job on this with the old DESeq vignette and with the new version of the DEXSeq vignette. The DESeq2 vignette still needs a bit of work, especially at the initial steps on data loading.

            However, if you read about a specific function in a vignette and want to know, which parameters this functions take and what the precise meaning of these parameters is, the vignette is not the place to look for this information. This is what the help pages for the individual functions are for, so be sure to read them, too!

            Comment


            • #7
              Originally posted by ThePresident View Post
              Otherwise, I'm still swimming in partial confusion concerning directory and conditions options? What the heck is directory option and how do you exactly define it along with conditions?
              You are looking at the wrong function. DESeqDataSetFromHTSeqCount is used to read the count data from several files that have been produced with htseq-count. That tool produces one file for each sample, with only two columns, namely gene ID and read count in this sample. DESeqDataSetFromHTSeqCount reads all these files and puts them together in one matrix of read counts. It wants a "sample table", with one row for each sample and columns for sample ID, file name of the htseq-count file, and experimental condition(s).

              You already have your matrix of read counts, so you should use DESeqDataSetFromMatrix to get started. Type "?DESeqDataSetFromMatrix" to see how it works.

              And as D P Ryan has already pointed out, the first column of your count table seems to contain some indices, and you don't have row names with the gene IDs. You need to fix this first. See "?read.table" or "?read.csv" to learn how to read in CSV files or text files with tabular data, or ask, but then tell us how you have the counts on your hard disk.

              Comment


              • #8
                Hey guys,

                Just a quick update: with your help, I've succeeded to run DESeq2 properly with DESeqDataSetFromMatrix. Thank you again!

                Comment


                • #9
                  Hi I am having trouble getting started with DESeq2. I am able to load in the file okay and don't receive any errors until I start the "DESeq" step. Quick background, my data input is a merged file of raw counts. I have 4 conditions with 3 reps each.

                  Here are my steps:
                  "
                  > countsTable<-read.delim("merged_raw_counts.txt",header=T)
                  > rownames(countsTable)<-countsTable$Name
                  > countsTable<-countsTable[,-1]
                  > head(countsTable)
                  T0R1 T0R2 T0R3 T0S1 T0S2 T0S3 T1R1 T1R2 T1R3 T1S1 T1S2 T1S3
                  gene01g00010 0 5 2 3 4 6 4 1 0 7 1 1
                  gene01g00020 1 0 12 0 6 2 3 3 1 2 0 0
                  gene01g00030 0 0 0 0 0 0 0 0 0 1 0 0
                  gene01g00040 0 0 0 0 0 0 0 0 0 0 0 0
                  gene01g00050 0 0 0 1 0 0 0 0 0 0 0 0
                  gene01g00060 13 99 77 109 76 82 82 140 115 157 72 118
                  >countData<-matrix(countsTable)
                  >colData<-data.frame(condition=factor(c("T0R","T0R","T0R","T0S","T0S","T0S","T1R","T1R","T1R","T1S","T1S","T1S")))
                  >dds<-DESeqDataSetFromMatrix(countData=countsTable,colData,formula(~condition))
                  Usage note: the following factors have 3 or more levels:

                  condition

                  For DESeq2 versions < 1.3, if you plan on extracting results for
                  these factors, we recommend using betaPrior=FALSE as an argument
                  when calling DESeq().
                  As currently implemented in version 1.2, the log2 fold changes can
                  vary if the base level is changed, when extracting results for a
                  factor with 3 or more levels. A solution will be implemented in
                  version 1.3 which allows for the use of a beta prior and symmetric
                  log2 fold change estimates regardless of the selection of base level.
                  >colData(dds)$condition<-factor(colData(dds)$condition,levels=c("T0R","T0S"))
                  > dds
                  class: DESeqDataSet
                  dim: 34903 12
                  exptData(0):
                  assays(1): counts
                  rownames(34903): gene01g00010 gene01g00020 ... gene00g99390 gene00g99400
                  rowData metadata column names(0):
                  colnames(12): 1 2 ... 11 12
                  colData names(1): condition
                  > dds<-DESeq(dds,betaPrior=F)
                  estimating size factors
                  estimating dispersions
                  gene-wise dispersion estimates

                  error: element-wise multiplication: incompatible matrix dimensions: 12x1 and 6x1

                  Error: element-wise multiplication: incompatible matrix dimensions: 12x1 and 6x1
                  "

                  I am not sure where I am going wrong. I see the warning about have more than 3 levels. Although I have these 4 different conditions I am not interested in doing a pairwise comparision on all. I would ideally like to compare T0R to T0S, T1R to T1S, T0R to T1R, and T0S to T1S.

                  Thanks in advance for insight on this.

                  Comment


                  • #10
                    Hi, I figured out the error, quite silly actually.
                    The step of
                    >colData(dds)$condition<-factor(colData(dds)$condition,levels=c("T0R","T0S"))

                    should list all four conditions, not just 2.

                    >colData(dds)$condition<-factor(colData(dds)$condition,levels=c("T0R","T0S","T1R","T1S"))

                    Hopefully that's the only problem.

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Strategies for Sequencing Challenging Samples
                      by seqadmin


                      Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                      03-22-2024, 06:39 AM
                    • seqadmin
                      Techniques and Challenges in Conservation Genomics
                      by seqadmin



                      The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                      Avian Conservation
                      Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                      03-08-2024, 10:41 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, 03-27-2024, 06:37 PM
                    0 responses
                    12 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 03-27-2024, 06:07 PM
                    0 responses
                    11 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 03-22-2024, 10:03 AM
                    0 responses
                    53 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 03-21-2024, 07:32 AM
                    0 responses
                    69 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X