Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Help with cuffdiff gene_exp.diff output

    HI,

    I need some suggestion regarding Differentially expressed genes. I have a condition where I donot have any statistically significant differentially expressed genes from my cuffdiff output for the gene_exp.diff file. And also the q value is not significant as well. Only the original p values are significant and the fold change can be considered for up regulation and dow regulation comparison. I would like to know in this scenario how shall I select the genes of interest for Gene ontology analysis. And what criteria should I choose to filter out the up regulated and down regulated genes in the comparison. It would be good if someone provides me with suggestions.

  • #2
    How many replicates did you have in your samples? The more biological replicates you have, the more robust statistics you are able to compute.

    In the complete absence of biological replicates, you statistics are really quite meaningless as you have no estimation of biological variance.

    General best practices historically are that one will get the most robust or reliable lists of differentially expressed genes by simultaneously filtering your results on corrected p-value (FDR or q-value) AND a minimum fold change cutoff (say fold change of 1.5=log2 of 0.58496 or say a fold change of 2=log2 of 1). Doing that kind of filtering typically give the most robust gene lists, in terms of genes that confirm differential expression by other means like qPCR assays.

    In the absence of statistically significant results, there's no magic to apply. You cannot squeeze out something that simply is not there. So, you'd be limited to ranking your genes by fold change and simply applying a minimum cutoff, so 2 fold, for significance.

    You would not want to use individual test p-values as you know with the large number of tests, many will likely be false positives.
    Michael Black, Ph.D.
    ScitoVation LLC. RTP, N.C.

    Comment


    • #3
      Hi mbblack,
      Thanks for the reply. This really helped. Since I am not having any replicates here so p value consideration does not hold good here and then my FDR are not at all significant only the original uncorrected p values are significant so I am considering the cut off for the expression values and fold change. I would like t ask you the fold change cut off which you have mentioned here is for only up regulated conditions right , I should be using the same for the negative as well so get the down regulated candidates for the same as well. Right?

      Comment


      • #4
        My thoughts for those situations where one has no replicates is to ignore the statistics altogether - they are unreliable and will only open your selection of genes to criticism.

        So, you are left with selecting genes for further study/analyses based on the magnitude of observed differential expression. A long standing generic cutoff has been to take genes up/down regulated by a magnitude of +/-log2=1, or +/-2-fold. It's a purely arbitrary cutoff, but the thinking is when all you have is a measure of difference, 2-fold is likely high enough to avoid most spurious changes in gene expression.

        But, it also depends on what your intent with the data is? If your intent is to select genes for validation by some qPCR based method, then 2-fold may be fine, or you may even want to increase that cutoff, to 2.5 or 3-fold to ensure the most genes that really are differentially regulated (since qPCR assays are expensive and time consuming).

        If you are simply generating gene lists for exploratory enrichment or some such analyses, and 2-fold seems too restrictive (i.e. your gene lists are too short to get any real enrichment), then you may actually want to relax it, say to +/-1.5fold.

        Any cutoff, statistical or magnitude of change is arbitrary - pick one that you feel you can justify or defend, but which also works in terms of what you need out of the study to go forward.

        Even if you had 10 biological replicates and were basing your gene selection on simultaneous cutoffs of corrected p-value and fold change, your choices of those cutoffs is still arbitrary. I routinely use FDR values of anywhere from <0.01 to <0.1 and Fold change values of 1.5 to 2.0, depending on the data in hand, the results obtained, and what I intend to do with the genes identified as "differentially expressed" by whatever criteria.
        Michael Black, Ph.D.
        ScitoVation LLC. RTP, N.C.

        Comment


        • #5
          Hi guys,
          I have some questions about how Cufffdiff does the statistical analysis.
          I am looking for DE genes in two sample groups ( 3 replicate per group). In the Cuffdiff;s gene_exp.diff, I found many genes that have very large RPKM fold-change between two groups (with p value < or > 0.05) but still NO significant. Something like this:

          test_id gene_id gene locus sample_1 sample_2 status value_1 value_2 log2(fold_change) test_stat p_value q_value significant
          ENSMUSG00000047139 ENSMUSG00000047139 Cd24a 10:43579168-43584262 q1 q2 OK 96.2585 2700.55 4.8102 1.6486 0.03995 0.078237 no
          ENSMUSG00000066975 ENSMUSG00000066975 Cryba4 5:112246492-112252518 q1 q2 OK 424.582 46190.2 6.7654 0.598327 0.3408 0.442128 no

          Then I checked the READ_GROUP_TRACKING file for those genes to check the RPKM value for each replicate:

          tracking_id condition replicate raw_frags internal_scaled_frags external_scaled_frags FPKM effective_length status
          ENSMUSG00000047139 q1 1 11256 5876.82 5876.82 125.915 - OK
          ENSMUSG00000047139 q1 0 3783 4343.44 4343.44 42.0316 - OK
          ENSMUSG00000047139 q1 2 10051 5639.48 5639.48 120.829 - OK
          ENSMUSG00000047139 q2 1 76771 156059 156059 3343.66 - OK
          ENSMUSG00000047139 q2 0 82394 162172 162172 1420.33 - OK
          ENSMUSG00000066975 q1 1 12825 6696 6696 407.899 - OK
          ENSMUSG00000066975 q1 0 3694 4241.26 4241.26 375.211 - OK
          ENSMUSG00000066975 q1 2 14397 8077.95 8077.95 490.636 - OK
          ENSMUSG00000066975 q2 1 348103 707619 707619 42455.1 - OK
          ENSMUSG00000066975 q2 0 420896 828430 828430 48920.6 - OK
          ENSMUSG00000066975 q2 2 331098 767405 767405 47195 - OK


          Would not I expect these DE genes are significant? Do you have any idea why Cufflinks show this result?


          Best regards

          Comment


          • #6
            Hi mmback,
            I am trying to find DE genes using Cuffdiff as well. You mentioned about using FDR values as the cutoff for DE genes. Just wondering how can you find that value from Cuffdiff's output file?
            Thank so much
            Thanh

            Comment


            • #7
              For me, I actually try different q cut-offs and check the gene list output in DAVID to see which cutoff is giving more significant results; now for my last 2 RNA-seq exp. I am using q<0.01 cutoff but no cutoff for fold-change.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Current Approaches to Protein Sequencing
                by seqadmin


                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                04-04-2024, 04:25 PM
              • seqadmin
                Strategies for Sequencing Challenging Samples
                by seqadmin


                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                03-22-2024, 06:39 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 04-11-2024, 12:08 PM
              0 responses
              25 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 10:19 PM
              0 responses
              28 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 09:21 AM
              0 responses
              24 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-04-2024, 09:00 AM
              0 responses
              52 views
              0 likes
              Last Post seqadmin  
              Working...
              X