Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Going to cry, help me DESeq!

    Hey guys,

    I just wanted to check to make sure there's nothing else I can do before I give up.

    I'm using Tophat to HTSeq to DESeq workflow. I have three conditions, a WT, a knockout, and a control.

    After using HTSeq, I copied all my data into one file. I then followed the DESeq vignette. I've attached the plot of my dispersion, my MA plot, and my histogram plot. As you can see, I did not get a single gene that was differentially expressed. In fact, every one of them had a p-adjust value of 1.

    I then used the following steps for filtering.

    rs<- rowSums (counts(cds))
    use <- (rs > quantile(rs, 0.4))
    table (use)
    FALSE TRUE
    11784 17454

    cdsFilt <- cds[use, ]

    I then re-ran "estimateSizeFactors" and "estimateDispersions". I still did not get a single gene that was differentially expressed. Once again, all of them had a p-adjust value of 1.

    I've also attached those plots. Is there anything else I can do? Is there a way to determine if one of my replicates is bad, and that's throwing everything off (I did this in triplicate, with two of my replicates having 50M reads, with the other having 100 M)?

    Thanks for any and all advice! I'm pretty close just to giving up, which would be a wasted year, but at least I can just move on or something.
    Attached Files

  • #2
    When you say replicates do you mean Biological or Technical?

    Comment


    • #3
      Biological

      Comment


      • #4
        Better post your complete R code. Your plots are rather meaningless without knowing how you made them.

        Comment


        • #5
          Hi Simon,

          Thanks for taking a look.

          CountTable<- read.csv("DESEQ_CombinedGenes.csv", header=TRUE, row.names=1)

          Design <- data.frame(row.names = colnames(CountTable),
          condition= c("LuxS","LuxS","LuxS","RPMI","RPMI","RPMI","WT","WT","WT"),
          libType = c("paired-end","paired-end","paired-end","paired-end","paired-end","paired-end","paired-end","paired-end","paired-end"))

          conds <- factor(c("LuxS","LuxS","LuxS","RPMI","RPMI","RPMI","WT","WT","WT"))

          cds <- newCountDataSet(CountTable, conds)

          cds <- estimateSizeFactors(cds)

          cds <- estimateDispersions(cds)

          plotDispEsts <- function(cds)
          + {plot(rowMeans(counts(cds, normalized=TRUE)),
          + fitInfo(cds)$perGeneDispEsts,
          + pch = '.', log ="xy")
          + xg <- 10^seq(-.5,5,length.out=300)
          + lines(xg, fitInfo(cds)$dispFun(xg), col="red")}

          plotDispEsts(cds)

          res <- nbinomTest (cds, "LuxS", "WT")

          plotDE <- function (res)
          plot( res$baseMean, res$log2FoldChange, log="x", pch=20, cex=.3, col = ifelse(res$padj<.1, "red", "black"))

          plotDE(res)

          WITH INDEPENDENT FILTERING

          rs <- rowSums (counts(cds))
          use <- (rs > quantile (rs, 0.4))

          table(use)
          use
          FALSE TRUE
          11784 17454

          cdsFilt <- cdsFull[use, ]

          cdsFilt <- cdsFull[use, ]
          cdsFilt <- estimateSizeFactors(cdsFilt)
          cdsFilt <- estimateDispersions(cdsFilt)

          res <- nbinomTest (cdsFilt, "LuxS", "WT")

          plotDE(res)

          I can see that with filtering, I get less points on my graph, but still, all p-adj equal 1.

          Comment


          • #6
            Bill,

            did you check your data for batch effects, i.e. confounding variables besides the LuxS, RPMI, WT? You could e.g. use the arrayQualityMetrics package for that, as described in the DESeq vignette (the latest version, in the devel branch).

            If such problems were to exist, possible remediation strategies include dropping offending arrays (if they are few) or methods such as in the 'sva' package in Bioconductor.

            Best wishes
            Wolfgang
            Wolfgang Huber
            EMBL

            Comment


            • #7
              Thanks very much for the advice Wolfgang. I really appreciate how active you and Simon are on the forum.

              Essentially, my data kind of sucks.

              Its especially frustrating because I thought it was going to be good! My Phred Scores were all over 33, and the density of the genes (attached) were very similar. But nope, turns out, my data sucks.

              Now of course, I need to decide why, if its an experimental issue, biological variation, or something else. My PI is okay with doing more sequencing, as he said, "in for a penny, in for a pound". But neither he nor I want to waste time and money.

              So just broad bulletpoints of my experiment. I'm using cultured cancerous human epithelial cells, and I'm not using antibiotics. My first batch of sequencing I ran WT1, LuxS1, and RPMI1 (control) together. My second batch I ran WT2&3, LuxS2&3, and RPMI2&3 together.

              Based on Clustering, I should maybe throw away my WT1 and RPMI3, and go with that, but in that set-up, my WT is closer linked to my control RPMI, than my knockout, LuxS. But the PCA seems to provide more information, and shows a huge difference between WT1 and WT2&3 and very little differences between LuxS1,2, and 3. And it also shows I have no trustworth control, as all three samples of RPMI, my control, are different.

              What is most confusing is the control. These are untreated epithelial cells, but all three samples are completely different from each other. Is this nuts? Is this why people do 8 biological replicates? Any advice or suggestions?

              I re-did this by discarding WT1 and RPMI3, and I got tons of different genes. When can one justify throwing away sample sets?
              Attached Files
              Last edited by billstevens; 09-09-2012, 04:41 PM.

              Comment


              • #8
                What did you use to align your reads ? to pre-process your reads ? and to extract read count from your alignment ?

                Comment


                • #9
                  I feel your pain - I have had similar cases where the sequencing results look great but the data turned out to be junk, presumably due to bad sample prep in the first place. It's important to realize that a good sequencing run says nothing about how the results will be in the end - conversely, really bad sequencing runs (high PhiX error rates etc) can sometimes yield very nice data.

                  I think you will just have to sequence more, as the controls don't look reliable.

                  Comment


                  • #10
                    I used Tophat>HTSeq>DESeq.

                    Thanks for the commiseration in pain kopi-o.

                    Comment


                    • #11
                      Hi guys,

                      I was thinking, perhaps I had mycobacterium contamination. I was thinking I might check for this by taking my reads and aligning it against a mycobacteria genome. Does this seem feasible? The thing is I did get 90% alignment to hg19 in almost all of my samples, so I'm not sure I'm going to get anything that will make sense.

                      Comment


                      • #12
                        billstevens....do you think maybe somewhere along the line, like during sample prep, that maybe you mixed up your WT1 and RPMI3 sample? This would almost seems to be suggested based on your clustering. Although WT1 seems to cluster with the LUX samples...so that maybe unlikely as well.

                        1) Try redoing DESeq with WT1 treated as an RPMI replicate and RMPI3 as a WT replicate. What happens if you do that?

                        2) Genetically, is WT and RPMI different in some way? Are there known genetic markers in either one that would allow you to differentiate them? If that is the case, then look in your aligned reads in WT and RPMI to see if you can identify those. Even better, if you still have any RNA from those samples, if you could genotype by PCR, and check if maybe a mix up did occur.

                        That is the hunch I get from your clustering results and if its the case, maybe your data or at least some of the samples can be rescued.
                        Last edited by chadn737; 09-18-2012, 08:33 AM.

                        Comment


                        • #13
                          Wow, that's a very interesting idea I hadn't considered. The thing is, like you said WT1 clusters in the wrong area and I harvested these replicates by number together (all the 1s, 2s and 3s). Since LuxS3 and WT3 are in the right spots, switching RPMI3 doesn't work. WT1 and RPMI1 are located right next to each other so switching them is the same.

                          Genetically, they are the same.

                          Comment


                          • #14
                            DESeq questions

                            Hey guys,

                            So I just got my new runs. They all look great. I can clearly see the difference between biological replicates and condition. Unfortunately, they don't group ANYWHERE near where my other conditions grouped. I am almost certain this is because of where I sent my stuff for sequencing. The other runs make no sense whatsoever, and I sent it to one lab, and I sent this new stuff to a more established lab.

                            So clearly, I am getting absolutely no differential expression when I run all of them together. When I run just the new stuff, there isn't a lot of replicates so I get very little genes that are differentially expressed, plus these conditions are clearly very similar.

                            So my professor would like me to run the new dataset separately from the old one, and see the genes that match. My data feels unsalvageable, but he's forcing me to keep going. Does anyone have any ideas?
                            Attached Files

                            Comment

                            Latest Articles

                            Collapse

                            • seqadmin
                              Current Approaches to Protein Sequencing
                              by seqadmin


                              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                              04-04-2024, 04:25 PM
                            • seqadmin
                              Strategies for Sequencing Challenging Samples
                              by seqadmin


                              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                              03-22-2024, 06:39 AM

                            ad_right_rmr

                            Collapse

                            News

                            Collapse

                            Topics Statistics Last Post
                            Started by seqadmin, 04-11-2024, 12:08 PM
                            0 responses
                            27 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 04-10-2024, 10:19 PM
                            0 responses
                            30 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 04-10-2024, 09:21 AM
                            0 responses
                            26 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 04-04-2024, 09:00 AM
                            0 responses
                            52 views
                            0 likes
                            Last Post seqadmin  
                            Working...
                            X