Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Strange Dispersion and Density Plots in cummeRbund

    Usually by the time I get to cummeRbund (having run TopHat, cufflinks, cuffmerge and cuffdiff) I see something in the dispersionPlot that looks at least vaguely diagonal, and in the csDensity plot that looks at least semi-normal.

    I've just done a study comparing cells exposed to media only and to media + a chemical compound. The idea is to look for genes that are differentially expressed when exposed to the chemical compound.

    There were 3 replicates for each (control and treatment).
    I've run TopHat with the gtf file for hg19 (these are human cells), then cufflinks, cuffmerge, cuffdiff (in mostly the standard ways) and got plots that look like nothing I've seen before.

    I re-rand cuffdiff with an upper-diagnol normalization, but honestly if these plots show fpkm I don't think that would have changed anything.

    My csScatter is pretty well centered on the diagonal, so I think normalization worked well. csBoxplots, with and without replicates, also look like what I usually see, though the IQRs are a little longer than usual.

    So what might be going on here, and should I be worried about my results?
    What's causing that "tail" in the dispersion plot?

    Any advice or insights would be greatly appreciated!

    Hopefully these attached images work:



  • #2
    An Update:

    I re-ran the analysis without using the gtf file for hg19 and got a dispersion plot that looks more like what I would usually see. My density plot is now more clearly bimodal as well, which isn't the approximately normal I like to see, but at least it's a more "regular" shape. (See below...)

    The interesting (reassuring?) thing is: Results are identical! Same exact genes found to be differentially expressed in each.

    I'm still wondering why this would happen - especially in light of the new information. Obviously, for publication reasons, I'd rather use the analysis with the gtf file included - but I still don't like the look of those plots, and would love to be able to explain why they look the way they do.







    Anyone?

    Comment


    • #3
      Hi DRS,
      do you think it is normal for your density plot to have negative log10(fpkm) values?

      Comment


      • #4
        Not at all - I noticed that later. The dispersion as well. I still have no idea what's going on with this data, either. If anyone has any clues why this might be happening, I'm still interested in some insights.

        Comment


        • #5
          Originally posted by cookiemic View Post
          Hi DRS,
          do you think it is normal for your density plot to have negative log10(fpkm) values?
          Yes, it is certainly "normal", or at least understandable that you would see negative log10(fpkm) values. It simply means that FPKM was < 1.0. Let's imagine you have an mRNA that is 1,500 bp long with 20 fragments mapped to it out of a total of 100 million mapped reads.

          FPKM = 20 ÷ 1.5 ÷ 100 = 0.133
          log10(0.1333) = -0.875

          Comment


          • #6
            Thank you - at least I can feel better about that. Still don't understand the "tail" I'm seeing on the one plot, but again, when I re-run the analysis with different parameters, I don't see that and get the exact same genes listed as DE.
            *shrug*

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Essential Discoveries and Tools in Epitranscriptomics
              by seqadmin


              The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
              Yesterday, 07:01 AM
            • seqadmin
              Current Approaches to Protein Sequencing
              by seqadmin


              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
              04-04-2024, 04:25 PM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 04-11-2024, 12:08 PM
            0 responses
            39 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 10:19 PM
            0 responses
            41 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 09:21 AM
            0 responses
            35 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-04-2024, 09:00 AM
            0 responses
            55 views
            0 likes
            Last Post seqadmin  
            Working...
            X