Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • DESeq-strange disoersion plot and using shorth

    Hi,
    I am a newbie to bioinformatics and am trying to analyse RNASeq data with DeSeq.The dispersion plot for my data looks different from the the typical plots.I have attached two sample plots-the issue is that there is a sharp upper boundary and don't know how to interpret this!

    I have tried the various options for sharingmode and fitytpe for my data and sometimes, default works but with a different sample list, I have to try a local fit or different sharing mode). I have four different conditions and there are 6-16 biological replicates for each, so sample size is not a problem.


    Also, when I try to change estimate size factor command using the following, it does not work:
    cds = estimateSizeFactors(cds,locfunc=shorth)

    Am I doing anything wrong here?
    Thanks for your help
    Attached Files
    Last edited by tellsparck; 01-10-2013, 12:12 PM. Reason: Spelling mistake in title

  • #2
    I have run into the same problem with you,expect the answers……

    Comment


    • #3
      Hi - you are getting huge dispersions, of the order of 10, indicating that the counts between your different "replicate" samples are very, very different. Have you tried looking at pairwise scatterplots of the data? I.e. something like (replace pasilla with your own data):


      library("pasilla")
      data("pasillaGenes")
      trsf = function(x, c=1) log2(x+c)
      pairs(trsf(counts(pasillaGenes)), pch=".")


      Best wishes
      Wolfgang
      Wolfgang Huber
      EMBL

      Comment


      • #4
        Thanks! The high dispersions are somewhat expected because the data is from single cell RNA which underwent amplification.So there is inherent cell to cell variability and technical variability coming from amplification The question now is how badly this will affect the statistical analysis that follows. Do you think using per gene est (or any other deviation from the default) may help? Have you tried to modify DeSeq for this type of data?

        Comment


        • #5
          First: If you ask for advice on this forum, please always mention all relevant facts. Asking as question as yours without mentioning that you are not talking about standard RNA-Seq but about something unusual and very experimental, namely single-cell RNA-Seq, just wastes everybody's time as you will only get wrong advice.

          Now: The fit (red line) is indeed not very good, and we have some eays to improve the fit in siutations such as yours. This won't help much because the raw estrimates (black dots) are show that nearly all of your genes have dispersions above one and hence vary by a factor of two or more between cells of the same cell type. Unless the differences between different cell types are really drastic (at least, say, ten-fold), you cannot see them in this noise. This is not a problem of the statistical analysis, but one of the experimental protocol.

          Comment


          • #6
            Sorry for not mentioning the nature of data.Yes it is noisy, but the differences between groups are also drastic, many genes are close to zero in one group and thousands in another.But of course there are also some with less drastic differences.Given this how can I extract the maximum info out of it? I have tried using only samples that look similar in the PCA and similar Q3 and so on...
            Can you suggest any modifications in DeSeq that can improve the analysis?
            Many thanks

            Comment


            • #7
              "many genes are close to zero in one group and thousands in another" -- yes, this is a drastic difference, but have a look at your replicates: I guess you will see equally drastic changes between two cells of the same type. This is a quite common problem in single-cell RNA-Seq, and you are not the first one to find this out the hard way, sorry.

              As you have many samples, you could try to switch to 'sharingMode="gene-ests-only"'. This might be a little bit anticonservative, and if even this does not give you anything there might simply be nothing in your data.

              Comment


              • #8
                Thanks Simon. Yes, using 'gene est only' mode works and outputs a good sized list of differential expression, among them internal control genes which we know should be differentially expressed. (I can get this even with default settings in some comparisons).Two of my groups are from very closely related cells and it is here that I have to change sharing mode.
                Do you think I should fix the fit as you mentioned before? If so, how can I do this?
                Thanks again!

                Comment


                • #9
                  No, the whole point of "gene-ests-only" is that it instructs DESeq to ignore the fit, so it doesn't matter any more if it's bad.

                  Comment


                  • #10
                    Thanks Simon. But I can still benefit from improving the fit in some comparisons where I do not use gene-est-only- do you have a code for this which you can share with me?

                    Comment


                    • #11
                      Hello community!

                      I have 3 conditions and 5 total replicates. So my DOF is 15 samples - 3 conditions = 12. Therefore I am using the gene-est only for estimating my dispersions.

                      cds <- estimateDispersions(cds, method= "per-condition", sharingMode="gene-est-only" )

                      Can anyone point me to where it is discussed to use gene-est when enough DOF have been reached? I saw in a post that Simon mentions once you get to 10 to 15, you can use gene-est, but I'm doing my proposal next week, and I want to be able to point to something more formal.

                      Thanks very much!!

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        Current Approaches to Protein Sequencing
                        by seqadmin


                        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                        04-04-2024, 04:25 PM
                      • seqadmin
                        Strategies for Sequencing Challenging Samples
                        by seqadmin


                        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                        03-22-2024, 06:39 AM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, 04-11-2024, 12:08 PM
                      0 responses
                      30 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-10-2024, 10:19 PM
                      0 responses
                      32 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-10-2024, 09:21 AM
                      0 responses
                      28 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-04-2024, 09:00 AM
                      0 responses
                      52 views
                      0 likes
                      Last Post seqadmin  
                      Working...
                      X