Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • DESeq with incomplete replicates

    Hi Simon and others,

    I have an unusual dataset to work with that does not derive from RNAseq, but produces comparable count data for two conditions for which I would like to identify the significantly changing genes.

    I have worked with similar datasets using DESeq with good results. However, for the current dataset I do not have a complete biological replicate - because of the way the experiments were carried out and the associated cost, replicates could only be obtained for a subset of genes (~1000 out of 8000). I have split the dataset into two CountDataSets (rep.cds and norep.cds) and used DESeq on the replicated genes as usual:

    rep.cds<-estimateSizeFactors(rep.cds)

    rep.cds<-estimateDispersions(rep.cds)

    rep.res<-nbinomTest(rep.cds, "cond1", "cond2")


    I then wanted to use the model fitted to the replicate data to estimate dispersions in the non-replicated dataset. I used the following syntax, which includes a bit of trial-and-error fiddling to circumvent error messages:

    nonrep.cds<-estimateSizeFactors(nonrep.cds)

    nonrep.cds<-estimateDispersions(nonrep.cds, method="blind", sharing-mode="fit-only", fitType="local")

    fData(nonrep.cds)<-as.data.frame(fitInfo(rep.cds)$dispFun(rowMeans(counts(rep.cds, normalized=T))))

    fvarLabels(nonrep.cds)<-"disp_blind"

    nonrep.res<-nbinomTest(nonrep.res, "cond1", "cond2")

    all.res<-rbind(rep.res, nonrep.res)

    I get biologically plausible results, which I am happy with - certainly much happier than ignoring the replicates and just using the method="blind" approach for non-replicated data. What I would like to know is:
    1. Is my approach sane?
    2. If it is, am I going about it in the right way?

    I'd be grateful for any comments you could offer.
    Cheers,
    Roy.

  • #2
    Your approach should be fine (even though without knowing more about the technique you used -- how come you have replicates for some but not all genes? -- it is hard to say for sure.)


    This line is odd:

    Originally posted by Roy View Post
    fData(nonrep.cds <-as.data.frame(fitInfo(rep.cds)$dispFun(rowMeans(counts(rep.cds, normalized=T))))
    You should apply the dispFun from rep.cds on the rowMeans of the counts from nonrep.cds. Your version should give an error as the number of genes is different between rep.cds and nonrep.cds.

    Simon

    Comment


    • #3
      Hi Simon,

      Thanks for the quick response, much appreciated.

      Originally posted by Simon Anders View Post
      Your approach should be fine (even though without knowing more about the technique you used -- how come you have replicates for some but not all genes? -- it is hard to say for sure.)
      I'm actually analysing data from Transposon sequencing (called Tn-seq or TraDIS in the literature). The counts for each "gene" actually correspond to different bacterial mutants, and I'm using DEseq to assess their relative abundance before and after a selective screen as a measure of fitness - this is analogous to looking for differential expression of transcripts between 2 conditions. Some of the experiments are limited in the numbers of mutants it is possible to screen at once (since the total number of bacterial cells is restricted, and you need a reasonable number of each mutant to avoid stochastic effects), so we divided the mutants into subgroups and screened each separately, before combining the extracted DNA for sequencing. As the screens are expensive it was not possible to perform replicates for the full set of mutants, only a subset.

      Originally posted by Simon Anders View Post
      You should apply the dispFun from rep.cds on the rowMeans of the counts from nonrep.cds. Your version should give an error as the number of genes is different between rep.cds and nonrep.cds.
      Sorry, that was a typo, yes, it should be the rowMeans from nonrep.cds.

      Another concern - since I am analysing the the replicated and non-replicated rows separately, the p-value adjustments do not take into account the total number of tests. Should I re-run the p.adjust method on the P-values in the combined all.res table?

      Cheers,
      Roy.

      Comment


      • #4
        Originally posted by Roy View Post
        Another concern - since I am analysing the the replicated and non-replicated rows separately, the p-value adjustments do not take into account the total number of tests. Should I re-run the p.adjust method on the P-values in the combined all.res table?
        Might be better. Shouldn't make too much of a difference, though.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Essential Discoveries and Tools in Epitranscriptomics
          by seqadmin




          The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
          04-22-2024, 07:01 AM
        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, Yesterday, 08:47 AM
        0 responses
        12 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        60 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        59 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 09:21 AM
        0 responses
        54 views
        0 likes
        Last Post seqadmin  
        Working...
        X