Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Finding overlap in 3 RNA-Seq analyses of same dataset

    Hi all!
    I am new to RNA-sequencing analysis- just started a few months ago. I have conducted an intergenerational mouse study to explore the effect of the father's diet on offspring gene expression. Due to the nature of this experiment, I wouldn't expect a ton of gene expression differences but I am hoping for some, especially since I observed some phenotypic differences...
    So I analyzed my data using cufflinks; RSEM-DESeq2; and HT-seq-count-DESeq2. I was advised to do these 3 different analyses as a way to sort of validate my findings. Our plan is to look at overlap between the results in all 3 analyses, and follow up on these findings with IPA to identify significant biological pathways that are enriched. I have 3 different diet groups that are being compared and I am analyzing males and females separately, as well as together, so I have a total of 18 different analyses. So the first issue is that cuffdiff resulted in more than 100 DE genes (q value< 0.05), but the RSEM and HT-Seq-count analyses resulted in far less, between 0 and 80 DE genes by adjusted p-value<0.05. Is it normal that the results would vary that much between methods? I was under the impression that cuffdiff is more conservative than the other methods, but that is not the case here. What we thought we could do is adjust the cutoffs of the RSEM and HT-seq-count results so that we would have about the same amount of significant genes as we have from cuffdiff, and then look at the overlap. Is this a good approach? The problem with this approach is that I'm not sure which cutoff to choose, because we have 6 analyses just for RSEM, for example, and when I choose a cutoff based on one analysis, it results in a disproportionate increase in DE genes in another analysis. Would it be better to just choose some other cutoff that people normally use, like adjusted p-value< 0.1? Sorry for all these questions! I'm not sure if this makes sense but I am basically just trying to make sense of 3 different types of analyses and wondering if it is even a good idea to use 3 analyses. Any advice would be much appreciated! Thank you so much
    Best,
    Julia

  • #2
    read this paper may help http://genomebiology.com/2013/14/9/R95

    Comment


    • #3
      First of all, it sounds like you do not have very many differentially expressed genes no matter how you slice it. Given that, no, I'm not surprised the different analysis were that different. In a study where any one analysis is only giving a hundred or so DEGs at most, it would be surprising to me if different analysis did not give quite widely different gene lists.

      I would NOT suggest using different thresholds for some analyses over another just to increase the numbers and the degree of overlap - that smacks of cherry picking your stringency just to get the lists to align the way you want. Whatever cutoff you choose needs to be uniformly applied to be comparable. As always, you will get your most reliable DGE results by simultaneiously applying a statistical threshold (e.g. FDR<0.05) and a fold change threshold (e.g. absolute value of fold change >1.5 or >2.0).

      In your case, that will end up just reducing your lists even further.

      It sounds to me like you simply have very little DGE actually going on in your treatments. Either that, or you are lacking in biological replicates and/or read depth to adequately detect the differences that do exist. If these are all fairly low expressor genes, then read depth may be your single limiting factor as it takes much higher read depth to detect subtle changes (at least with any real statistical rigor) then it does to detect large scale gene expression changes.

      Just how many biological replicates did you run? Do you still have library material left that you could use to increase read depth? Low expressors are low count features, which also have the highest variance within your mapped reads count data. They will inherently be the most difficult to detect reliably and with statistical confidence, and both increasing biological replication and increasing read depth can help identify them.
      Last edited by mbblack; 07-15-2014, 09:40 AM.
      Michael Black, Ph.D.
      ScitoVation LLC. RTP, N.C.

      Comment


      • #4
        Thank you both for your helpful responses. I find that article interesting because they found that cuffdiff has a lot more false positives than other methods, so maybe the increased amount of DEGs in my cuffdiff results are actually just false positives...
        As far as biological replicates, I have 5 females and 5 males per diet group. I analyzed males and females separately as well as together.
        Our average read depth is about 18 million reads per sample, and this is single-end sequencing, so that read depth should be sufficient right?
        I understand your point about using the same cutoff for all analyses, but would it be ok to stick with my cuffdiff output cutoff (q< 0.05), and then pick a different cutoff to use for all of the DESeq2 analyses? DESeq2 reports adj. p values, not q values, anyway. Can you suggest a particular cutoff or how I should determine the cutoff?
        Thanks for your help!

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM
        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        31 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        32 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 09:21 AM
        0 responses
        28 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-04-2024, 09:00 AM
        0 responses
        53 views
        0 likes
        Last Post seqadmin  
        Working...
        X