Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Grigory
    Junior Member
    • Aug 2014
    • 7

    DESeq on 4 conditions, 3 replicates each - how?

    Hello,

    I am a bench developmental biologist with essentially 0 bioinformatics background, so please excuse my stupidity.

    I generated 12 HTSeq count files (4 experimental conditions - 1 control experiment and 3 gene knockdowns, 3 biological replicates for each condition, single end 50 bp reads, not strand specific) from my tophat alignments. I would like to use DESeq to analyze differential gene expression in my samples. My problems start with me not understanding the explanation of how I should create a CountDataSet from my 12 files, and the pasilla example on the DESeq web page is not helping. Can someone please help me with that?

    I would also be grateful for a link to a detailed step-by-step protocol for DESeq on HTSeq counts (not the one from the DESeq page - I have that already, and it is not "for dummies" enough for me).

    Thanks!

    Grigory
  • crazyhottommy
    Senior Member
    • Apr 2012
    • 187

    #2
    Have a look at my post http://crazyhottommy.blogspot.com/20...-sort-and.html



    Originally posted by Grigory View Post
    Hello,

    I am a bench developmental biologist with essentially 0 bioinformatics background, so please excuse my stupidity.

    I generated 12 HTSeq count files (4 experimental conditions - 1 control experiment and 3 gene knockdowns, 3 biological replicates for each condition, single end 50 bp reads, not strand specific) from my tophat alignments. I would like to use DESeq to analyze differential gene expression in my samples. My problems start with me not understanding the explanation of how I should create a CountDataSet from my 12 files, and the pasilla example on the DESeq web page is not helping. Can someone please help me with that?

    I would also be grateful for a link to a detailed step-by-step protocol for DESeq on HTSeq counts (not the one from the DESeq page - I have that already, and it is not "for dummies" enough for me).

    Thanks!

    Grigory

    Comment

    • Grigory
      Junior Member
      • Aug 2014
      • 7

      #3
      Originally posted by crazyhottommy View Post
      Hi,
      Thanks! There must have been some misunderstanding. I am already as far as the end of the page you referred to. I have counts for each gene in a table generated by htseq. What I did was the following:

      samtools view mapped_reads_sample1.bam | htseq-count -s no - transcriptome.gtf > mapped_reads_sample1_htseq_out
      So, I ended up with 12 such files. The question is what I should do now to be able to load this data into DESeq, normalize, merge biological replicates and compare expression in 3 experiments to control.

      Comment

      • kopi-o
        Senior Member
        • Feb 2008
        • 319

        #4
        Use the DESeqDataSetFromHTSeq function from DESeq2, or repeatedly use the "join" Unix/Linux command line tool, or use something from this thread

        Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc

        Comment

        • Wallysb01
          Senior Member
          • Feb 2011
          • 286

          #5
          since htseq is always going to be in the same order, I just use paste (which can takes a space seperated list of your files) and awk. Paste will print the row names for each file, so that's what awk is taking out.

          So its:

          [CODE]
          paste htseq_outputs* | awk '{print $1,$2,$4,$6,$8}' > htseq_all.txt
          [\CODE]

          just mind the seperators, after that awk command it will be a single space.
          Last edited by Wallysb01; 08-21-2014, 06:26 AM.

          Comment

          • Grigory
            Junior Member
            • Aug 2014
            • 7

            #6
            Thanks Wallysb01! This was helpful. Got my table now. Gave me a bit of learning too since I did not understand in the beginning why I am printing only even columns after the first 2.
            Let's see if I can get any further...

            Comment

            • crazyhottommy
              Senior Member
              • Apr 2012
              • 187

              #7
              Originally posted by Grigory View Post
              Thanks Wallysb01! This was helpful. Got my table now. Gave me a bit of learning too since I did not understand in the beginning why I am printing only even columns after the first 2.
              Let's see if I can get any further...
              you should get rid of the last five lines:

              Just use this single command to merge the HTSeq count results:
              paste *txt | cut -f1,2,4,6,8,10,12,14,16,18,20,22,24,26 | head -n -5 > counts.txt
              Note: we assume the gene_id (column 1) are in the same sequence, otherwise use command like "join" . we paste the txt files side by side using paste. The gene_id column will be duplicated, just keep the first column, and cut out the counts column by "cut", then remove the last five lines by head -n -5 and redirect it to a new file. Now,you can feed DESeq this counts.txt directly.

              Comment

              • dpryan
                Devon Ryan
                • Jul 2011
                • 3478

                #8
                If you're using DESeq2, there's no reason to manually merge the files. You can just use a list of files and it'll merge everything (and remove the last 5 lines) for you. It'll also sanity check that things are in the same order and contain the same IDs.

                Comment

                • Grigory
                  Junior Member
                  • Aug 2014
                  • 7

                  #9
                  Originally posted by dpryan View Post
                  If you're using DESeq2, there's no reason to manually merge the files. You can just use a list of files and it'll merge everything (and remove the last 5 lines) for you. It'll also sanity check that things are in the same order and contain the same IDs.
                  Thank you! I have installed DESeq2, and after several blunders managed to make the table from HTseq data.
                  However, at this point I fell into the next pit hole:
                  I could define which parts of the data table I wanted to compare as:

                  > ddsHTSeq$condition <- factor(ddsHTSeq$condition, levels=c("control","morphant-1"))

                  I only included one of my 3 morphant conditions for comparison with control, as the comparison is pairwise, if I understand it correctly. This worked, and the output looked reasonable - R excluded counts from 2 remaining morphants:

                  > ddsHTSeq$condition
                  [1] control control control morphant-1 morphant-1 morphant-1 <NA> <NA> <NA> <NA> <NA> <NA>
                  Levels: control morphant-1

                  Here, the DESeq2 manual starts with differential expression analysis as follows:
                  dds <- DESeq(dds)
                  res <- results(dds)
                  resOrdered <- res[order(res$padj),]

                  For that I obviously need to define dds. The DESeq2 manual says:

                  Here we show the most basic steps for a differential expression analysis. These steps imply you have a SummarizedExperiment object se with a column condition in colData(se).

                  dds <- DESeqDataSet(se = se, design = ~ condition)
                  dds <- DESeq(dds)
                  res <- results(dds)
                  but here comes the error:
                  > dds <- DESeqDataSet(se = se, design = ~ condition)
                  Error in assays(se) : error in evaluating the argument 'x' during method selection for function 'assays': Error: object se not found.

                  Can't find my way around this one yet.

                  Grigory

                  Comment

                  • dpryan
                    Devon Ryan
                    • Jul 2011
                    • 3478

                    #10
                    Please show the code you used to get "ddsHTSeq". I'm guessing that that's already a DESeqDataSet object, so you'd just use it as "dds".

                    Comment

                    • Grigory
                      Junior Member
                      • Aug 2014
                      • 7

                      #11
                      Hi, thanks for you help! I basically followed the DESeq2 manual.

                      > sampleFiles <- grep("MO_", list.files(directory), value=TRUE)
                      > sampleCondition <-sub("(.*MO).*","\\1",sampleFiles)
                      > sampleTable<-data.frame(sampleName=sampleFiles,
                      + fileName=sampleFiles,
                      + condition=sampleCondition)
                      > ddsHTSeq <- DESeqDataSetFromHTSeqCount(sampleTable = sampleTable,
                      + directory = directory,
                      + design = ~ condition)
                      > ddsHTSeq$condition <- factor(ddsHTSeq$condition, levels=c("control","morhant-1"))
                      > dds <- DESeq(dds)
                      Error in attr(object, "betaPrior") <- betaPrior : object 'dds' not found
                      I then tried to do the following based on this protocol (which is for DESeq, not DESeq2):

                      > dds <- DESeqDataSet(ddsHTSeq, design = ~ condition)
                      > dds <- estimateSizeFactors(dds)
                      > sizeFactors(dds)
                      > head(counts(dds))
                      > head(counts(dds,normalized=TRUE))
                      > dds <- estimateDispersions(dds)
                      gene-wise dispersion estimates
                      Error in t(hatmatrix %*% t(y)) : non-conformable arguments
                      Size factors and normalized counts I could see, but estimating dispersion did not work.

                      Thanks again!

                      G.
                      Last edited by Grigory; 08-22-2014, 11:12 AM.

                      Comment

                      • dpryan
                        Devon Ryan
                        • Jul 2011
                        • 3478

                        #12
                        It would seem much easier to just:
                        Code:
                        >sampleFiles <- grep("MO_", list.files(directory), value=TRUE)
                        > sampleCondition <-sub("(.*MO).*","\\1",sampleFiles)
                        > sampleTable<-data.frame(sampleName=sampleFiles,
                        + fileName=sampleFiles,
                        + condition=sampleCondition)
                        > IDX <- which(sampleCondition %in% c("control", "morphant-1"))
                        > sampleCondition <- sampleCondition[IDX,]
                        > ddsHTSeq <- DESeqDataSetFromHTSeqCount(sampleTable = sampleTable,
                        + directory = directory,
                        + design = ~ condition)
                        > dds <- DESeq(ddsHTSeq)
                        You could also just subset ddsHTSeq.

                        Comment

                        • Grigory
                          Junior Member
                          • Aug 2014
                          • 7

                          #13
                          Originally posted by dpryan View Post
                          It would seem much easier to just:
                          Code:
                          >sampleFiles <- grep("MO_", list.files(directory), value=TRUE)
                          > sampleCondition <-sub("(.*MO).*","\\1",sampleFiles)
                          > sampleTable<-data.frame(sampleName=sampleFiles,
                          + fileName=sampleFiles,
                          + condition=sampleCondition)
                          > IDX <- which(sampleCondition %in% c("control", "morphant-1"))
                          > sampleCondition <- sampleCondition[IDX,]
                          > ddsHTSeq <- DESeqDataSetFromHTSeqCount(sampleTable = sampleTable,
                          + directory = directory,
                          + design = ~ condition)
                          > dds <- DESeq(ddsHTSeq)
                          You could also just subset ddsHTSeq.
                          > sampleCondition <- sampleCondition[IDX,]
                          Error in sampleCondition[IDX, ] : incorrect number of dimensions

                          Comment

                          • dpryan
                            Devon Ryan
                            • Jul 2011
                            • 3478

                            #14
                            Oops, the line in question should be:
                            Code:
                            sampleTable <- sampleTable[IDX,]

                            Comment

                            • Grigory
                              Junior Member
                              • Aug 2014
                              • 7

                              #15
                              Originally posted by dpryan View Post
                              Oops, the line in question should be:
                              Code:
                              sampleTable <- sampleTable[IDX,]
                              Thank you! Worked.

                              Comment

                              Latest Articles

                              Collapse

                              • SEQadmin2
                                From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                                by SEQadmin2


                                Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                                The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                                ...
                                Yesterday, 10:05 AM
                              • SEQadmin2
                                Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                                by SEQadmin2


                                With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                                Introduction

                                Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                                05-22-2026, 06:42 AM
                              • SEQadmin2
                                Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                                by SEQadmin2

                                Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                                Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                                05-06-2026, 09:04 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, Yesterday, 12:03 PM
                              0 responses
                              19 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, Yesterday, 11:40 AM
                              0 responses
                              14 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 05-28-2026, 11:40 AM
                              0 responses
                              29 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 05-26-2026, 10:12 AM
                              0 responses
                              31 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...