Seqanswers Leaderboard Ad

**mbblack** · 05-24-2013, 05:58 AM

How many replicates did you have in your samples? The more biological replicates you have, the more robust statistics you are able to compute.

In the complete absence of biological replicates, you statistics are really quite meaningless as you have no estimation of biological variance.

General best practices historically are that one will get the most robust or reliable lists of differentially expressed genes by simultaneously filtering your results on corrected p-value (FDR or q-value) AND a minimum fold change cutoff (say fold change of 1.5=log2 of 0.58496 or say a fold change of 2=log2 of 1). Doing that kind of filtering typically give the most robust gene lists, in terms of genes that confirm differential expression by other means like qPCR assays.

In the absence of statistically significant results, there's no magic to apply. You cannot squeeze out something that simply is not there. So, you'd be limited to ranking your genes by fold change and simply applying a minimum cutoff, so 2 fold, for significance.

You would not want to use individual test p-values as you know with the large number of tests, many will likely be false positives.

**vd4mindia** · 05-30-2013, 12:08 AM

Hi mbblack,
Thanks for the reply. This really helped. Since I am not having any replicates here so p value consideration does not hold good here and then my FDR are not at all significant only the original uncorrected p values are significant so I am considering the cut off for the expression values and fold change. I would like t ask you the fold change cut off which you have mentioned here is for only up regulated conditions right , I should be using the same for the negative as well so get the down regulated candidates for the same as well. Right?

**mbblack** · 05-31-2013, 05:03 AM

My thoughts for those situations where one has no replicates is to ignore the statistics altogether - they are unreliable and will only open your selection of genes to criticism.

So, you are left with selecting genes for further study/analyses based on the magnitude of observed differential expression. A long standing generic cutoff has been to take genes up/down regulated by a magnitude of +/-log2=1, or +/-2-fold. It's a purely arbitrary cutoff, but the thinking is when all you have is a measure of difference, 2-fold is likely high enough to avoid most spurious changes in gene expression.

But, it also depends on what your intent with the data is? If your intent is to select genes for validation by some qPCR based method, then 2-fold may be fine, or you may even want to increase that cutoff, to 2.5 or 3-fold to ensure the most genes that really are differentially regulated (since qPCR assays are expensive and time consuming).

If you are simply generating gene lists for exploratory enrichment or some such analyses, and 2-fold seems too restrictive (i.e. your gene lists are too short to get any real enrichment), then you may actually want to relax it, say to +/-1.5fold.

Any cutoff, statistical or magnitude of change is arbitrary - pick one that you feel you can justify or defend, but which also works in terms of what you need out of the study to go forward.

Even if you had 10 biological replicates and were basing your gene selection on simultaneous cutoffs of corrected p-value and fold change, your choices of those cutoffs is still arbitrary. I routinely use FDR values of anywhere from <0.01 to <0.1 and Fold change values of 1.5 to 2.0, depending on the data in hand, the results obtained, and what I intend to do with the genes identified as "differentially expressed" by whatever criteria.

**thanhhoang** · 07-21-2013, 01:18 AM

Hi guys,
I have some questions about how Cufffdiff does the statistical analysis.
I am looking for DE genes in two sample groups ( 3 replicate per group). In the Cuffdiff;s gene_exp.diff, I found many genes that have very large RPKM fold-change between two groups (with p value < or > 0.05) but still NO significant. Something like this:

test_id gene_id gene locus sample_1 sample_2 status value_1 value_2 log2(fold_change) test_stat p_value q_value significant
ENSMUSG00000047139 ENSMUSG00000047139 Cd24a 10:43579168-43584262 q1 q2 OK 96.2585 2700.55 4.8102 1.6486 0.03995 0.078237 no
ENSMUSG00000066975 ENSMUSG00000066975 Cryba4 5:112246492-112252518 q1 q2 OK 424.582 46190.2 6.7654 0.598327 0.3408 0.442128 no

Then I checked the READ_GROUP_TRACKING file for those genes to check the RPKM value for each replicate:

tracking_id condition replicate raw_frags internal_scaled_frags external_scaled_frags FPKM effective_length status
ENSMUSG00000047139 q1 1 11256 5876.82 5876.82 125.915 - OK
ENSMUSG00000047139 q1 0 3783 4343.44 4343.44 42.0316 - OK
ENSMUSG00000047139 q1 2 10051 5639.48 5639.48 120.829 - OK
ENSMUSG00000047139 q2 1 76771 156059 156059 3343.66 - OK
ENSMUSG00000047139 q2 0 82394 162172 162172 1420.33 - OK
ENSMUSG00000066975 q1 1 12825 6696 6696 407.899 - OK
ENSMUSG00000066975 q1 0 3694 4241.26 4241.26 375.211 - OK
ENSMUSG00000066975 q1 2 14397 8077.95 8077.95 490.636 - OK
ENSMUSG00000066975 q2 1 348103 707619 707619 42455.1 - OK
ENSMUSG00000066975 q2 0 420896 828430 828430 48920.6 - OK
ENSMUSG00000066975 q2 2 331098 767405 767405 47195 - OK

Would not I expect these DE genes are significant? Do you have any idea why Cufflinks show this result?

Best regards

**thanhhoang** · 07-21-2013, 01:23 AM

Hi mmback,
I am trying to find DE genes using Cuffdiff as well. You mentioned about using FDR values as the cutoff for DE genes. Just wondering how can you find that value from Cuffdiff's output file?
Thank so much
Thanh

**sazz** · 07-21-2013, 03:16 AM

For me, I actually try different q cut-offs and check the gene list output in DAVID to see which cutoff is giving more significant results; now for my last 2 RNA-seq exp. I am using q<0.01 cutoff but no cutoff for fold-change.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 25 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 28 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 24 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 52 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Help with cuffdiff gene_exp.diff output

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News