Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • DESeq on 4 conditions, 3 replicates each - how?

    Hello,

    I am a bench developmental biologist with essentially 0 bioinformatics background, so please excuse my stupidity.

    I generated 12 HTSeq count files (4 experimental conditions - 1 control experiment and 3 gene knockdowns, 3 biological replicates for each condition, single end 50 bp reads, not strand specific) from my tophat alignments. I would like to use DESeq to analyze differential gene expression in my samples. My problems start with me not understanding the explanation of how I should create a CountDataSet from my 12 files, and the pasilla example on the DESeq web page is not helping. Can someone please help me with that?

    I would also be grateful for a link to a detailed step-by-step protocol for DESeq on HTSeq counts (not the one from the DESeq page - I have that already, and it is not "for dummies" enough for me).

    Thanks!

    Grigory

  • #2
    Have a look at my post http://crazyhottommy.blogspot.com/20...-sort-and.html



    Originally posted by Grigory View Post
    Hello,

    I am a bench developmental biologist with essentially 0 bioinformatics background, so please excuse my stupidity.

    I generated 12 HTSeq count files (4 experimental conditions - 1 control experiment and 3 gene knockdowns, 3 biological replicates for each condition, single end 50 bp reads, not strand specific) from my tophat alignments. I would like to use DESeq to analyze differential gene expression in my samples. My problems start with me not understanding the explanation of how I should create a CountDataSet from my 12 files, and the pasilla example on the DESeq web page is not helping. Can someone please help me with that?

    I would also be grateful for a link to a detailed step-by-step protocol for DESeq on HTSeq counts (not the one from the DESeq page - I have that already, and it is not "for dummies" enough for me).

    Thanks!

    Grigory

    Comment


    • #3
      Originally posted by crazyhottommy View Post
      Hi,
      Thanks! There must have been some misunderstanding. I am already as far as the end of the page you referred to. I have counts for each gene in a table generated by htseq. What I did was the following:

      samtools view mapped_reads_sample1.bam | htseq-count -s no - transcriptome.gtf > mapped_reads_sample1_htseq_out
      So, I ended up with 12 such files. The question is what I should do now to be able to load this data into DESeq, normalize, merge biological replicates and compare expression in 3 experiments to control.

      Comment


      • #4
        Use the DESeqDataSetFromHTSeq function from DESeq2, or repeatedly use the "join" Unix/Linux command line tool, or use something from this thread

        Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc

        Comment


        • #5
          since htseq is always going to be in the same order, I just use paste (which can takes a space seperated list of your files) and awk. Paste will print the row names for each file, so that's what awk is taking out.

          So its:

          [CODE]
          paste htseq_outputs* | awk '{print $1,$2,$4,$6,$8}' > htseq_all.txt
          [\CODE]

          just mind the seperators, after that awk command it will be a single space.
          Last edited by Wallysb01; 08-21-2014, 06:26 AM.

          Comment


          • #6
            Thanks Wallysb01! This was helpful. Got my table now. Gave me a bit of learning too since I did not understand in the beginning why I am printing only even columns after the first 2.
            Let's see if I can get any further...

            Comment


            • #7
              Originally posted by Grigory View Post
              Thanks Wallysb01! This was helpful. Got my table now. Gave me a bit of learning too since I did not understand in the beginning why I am printing only even columns after the first 2.
              Let's see if I can get any further...
              you should get rid of the last five lines:

              Just use this single command to merge the HTSeq count results:
              paste *txt | cut -f1,2,4,6,8,10,12,14,16,18,20,22,24,26 | head -n -5 > counts.txt
              Note: we assume the gene_id (column 1) are in the same sequence, otherwise use command like "join" . we paste the txt files side by side using paste. The gene_id column will be duplicated, just keep the first column, and cut out the counts column by "cut", then remove the last five lines by head -n -5 and redirect it to a new file. Now,you can feed DESeq this counts.txt directly.

              Comment


              • #8
                If you're using DESeq2, there's no reason to manually merge the files. You can just use a list of files and it'll merge everything (and remove the last 5 lines) for you. It'll also sanity check that things are in the same order and contain the same IDs.

                Comment


                • #9
                  Originally posted by dpryan View Post
                  If you're using DESeq2, there's no reason to manually merge the files. You can just use a list of files and it'll merge everything (and remove the last 5 lines) for you. It'll also sanity check that things are in the same order and contain the same IDs.
                  Thank you! I have installed DESeq2, and after several blunders managed to make the table from HTseq data.
                  However, at this point I fell into the next pit hole:
                  I could define which parts of the data table I wanted to compare as:

                  > ddsHTSeq$condition <- factor(ddsHTSeq$condition, levels=c("control","morphant-1"))

                  I only included one of my 3 morphant conditions for comparison with control, as the comparison is pairwise, if I understand it correctly. This worked, and the output looked reasonable - R excluded counts from 2 remaining morphants:

                  > ddsHTSeq$condition
                  [1] control control control morphant-1 morphant-1 morphant-1 <NA> <NA> <NA> <NA> <NA> <NA>
                  Levels: control morphant-1

                  Here, the DESeq2 manual starts with differential expression analysis as follows:
                  dds <- DESeq(dds)
                  res <- results(dds)
                  resOrdered <- res[order(res$padj),]

                  For that I obviously need to define dds. The DESeq2 manual says:

                  Here we show the most basic steps for a differential expression analysis. These steps imply you have a SummarizedExperiment object se with a column condition in colData(se).

                  dds <- DESeqDataSet(se = se, design = ~ condition)
                  dds <- DESeq(dds)
                  res <- results(dds)
                  but here comes the error:
                  > dds <- DESeqDataSet(se = se, design = ~ condition)
                  Error in assays(se) : error in evaluating the argument 'x' during method selection for function 'assays': Error: object se not found.

                  Can't find my way around this one yet.

                  Grigory

                  Comment


                  • #10
                    Please show the code you used to get "ddsHTSeq". I'm guessing that that's already a DESeqDataSet object, so you'd just use it as "dds".

                    Comment


                    • #11
                      Hi, thanks for you help! I basically followed the DESeq2 manual.

                      > sampleFiles <- grep("MO_", list.files(directory), value=TRUE)
                      > sampleCondition <-sub("(.*MO).*","\\1",sampleFiles)
                      > sampleTable<-data.frame(sampleName=sampleFiles,
                      + fileName=sampleFiles,
                      + condition=sampleCondition)
                      > ddsHTSeq <- DESeqDataSetFromHTSeqCount(sampleTable = sampleTable,
                      + directory = directory,
                      + design = ~ condition)
                      > ddsHTSeq$condition <- factor(ddsHTSeq$condition, levels=c("control","morhant-1"))
                      > dds <- DESeq(dds)
                      Error in attr(object, "betaPrior") <- betaPrior : object 'dds' not found
                      I then tried to do the following based on this protocol (which is for DESeq, not DESeq2):

                      > dds <- DESeqDataSet(ddsHTSeq, design = ~ condition)
                      > dds <- estimateSizeFactors(dds)
                      > sizeFactors(dds)
                      > head(counts(dds))
                      > head(counts(dds,normalized=TRUE))
                      > dds <- estimateDispersions(dds)
                      gene-wise dispersion estimates
                      Error in t(hatmatrix %*% t(y)) : non-conformable arguments
                      Size factors and normalized counts I could see, but estimating dispersion did not work.

                      Thanks again!

                      G.
                      Last edited by Grigory; 08-22-2014, 11:12 AM.

                      Comment


                      • #12
                        It would seem much easier to just:
                        Code:
                        >sampleFiles <- grep("MO_", list.files(directory), value=TRUE)
                        > sampleCondition <-sub("(.*MO).*","\\1",sampleFiles)
                        > sampleTable<-data.frame(sampleName=sampleFiles,
                        + fileName=sampleFiles,
                        + condition=sampleCondition)
                        > IDX <- which(sampleCondition %in% c("control", "morphant-1"))
                        > sampleCondition <- sampleCondition[IDX,]
                        > ddsHTSeq <- DESeqDataSetFromHTSeqCount(sampleTable = sampleTable,
                        + directory = directory,
                        + design = ~ condition)
                        > dds <- DESeq(ddsHTSeq)
                        You could also just subset ddsHTSeq.

                        Comment


                        • #13
                          Originally posted by dpryan View Post
                          It would seem much easier to just:
                          Code:
                          >sampleFiles <- grep("MO_", list.files(directory), value=TRUE)
                          > sampleCondition <-sub("(.*MO).*","\\1",sampleFiles)
                          > sampleTable<-data.frame(sampleName=sampleFiles,
                          + fileName=sampleFiles,
                          + condition=sampleCondition)
                          > IDX <- which(sampleCondition %in% c("control", "morphant-1"))
                          > sampleCondition <- sampleCondition[IDX,]
                          > ddsHTSeq <- DESeqDataSetFromHTSeqCount(sampleTable = sampleTable,
                          + directory = directory,
                          + design = ~ condition)
                          > dds <- DESeq(ddsHTSeq)
                          You could also just subset ddsHTSeq.
                          > sampleCondition <- sampleCondition[IDX,]
                          Error in sampleCondition[IDX, ] : incorrect number of dimensions

                          Comment


                          • #14
                            Oops, the line in question should be:
                            Code:
                            sampleTable <- sampleTable[IDX,]

                            Comment


                            • #15
                              Originally posted by dpryan View Post
                              Oops, the line in question should be:
                              Code:
                              sampleTable <- sampleTable[IDX,]
                              Thank you! Worked.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM
                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 06:37 PM
                              0 responses
                              8 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, Yesterday, 06:07 PM
                              0 responses
                              8 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-22-2024, 10:03 AM
                              0 responses
                              49 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-21-2024, 07:32 AM
                              0 responses
                              66 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X