SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
multiple lines per single gene in gene_exp.diff output of cuffdiff reut Bioinformatics 2 03-09-2015 04:16 AM
Pool cuffdiff output -exp.diff files? mshamblott Bioinformatics 0 02-04-2013 05:30 PM
different fpkm values for one sample in cuffdiff isoform_exp.diff output liuxq Bioinformatics 2 09-20-2012 05:19 PM
gene_exp.diff file from different versions of cuffdiff narges Bioinformatics 0 09-16-2012 03:20 AM
multiple FPKM problem for single gene in gene_exp.diff after running cuffdiff ngs RNA Sequencing 4 03-30-2011 01:55 PM

Reply
 
Thread Tools
Old 05-24-2013, 02:55 AM   #1
vd4mindia
Member
 
Location: Milan

Join Date: May 2013
Posts: 40
Unhappy Help with cuffdiff gene_exp.diff output

HI,

I need some suggestion regarding Differentially expressed genes. I have a condition where I donot have any statistically significant differentially expressed genes from my cuffdiff output for the gene_exp.diff file. And also the q value is not significant as well. Only the original p values are significant and the fold change can be considered for up regulation and dow regulation comparison. I would like to know in this scenario how shall I select the genes of interest for Gene ontology analysis. And what criteria should I choose to filter out the up regulated and down regulated genes in the comparison. It would be good if someone provides me with suggestions.
vd4mindia is offline   Reply With Quote
Old 05-24-2013, 05:58 AM   #2
mbblack
Senior Member
 
Location: Research Triangle Park, NC

Join Date: Aug 2009
Posts: 245
Default

How many replicates did you have in your samples? The more biological replicates you have, the more robust statistics you are able to compute.

In the complete absence of biological replicates, you statistics are really quite meaningless as you have no estimation of biological variance.

General best practices historically are that one will get the most robust or reliable lists of differentially expressed genes by simultaneously filtering your results on corrected p-value (FDR or q-value) AND a minimum fold change cutoff (say fold change of 1.5=log2 of 0.58496 or say a fold change of 2=log2 of 1). Doing that kind of filtering typically give the most robust gene lists, in terms of genes that confirm differential expression by other means like qPCR assays.

In the absence of statistically significant results, there's no magic to apply. You cannot squeeze out something that simply is not there. So, you'd be limited to ranking your genes by fold change and simply applying a minimum cutoff, so 2 fold, for significance.

You would not want to use individual test p-values as you know with the large number of tests, many will likely be false positives.
__________________
Michael Black, Ph.D.
ScitoVation LLC. RTP, N.C.
mbblack is offline   Reply With Quote
Old 05-30-2013, 12:08 AM   #3
vd4mindia
Member
 
Location: Milan

Join Date: May 2013
Posts: 40
Default

Hi mbblack,
Thanks for the reply. This really helped. Since I am not having any replicates here so p value consideration does not hold good here and then my FDR are not at all significant only the original uncorrected p values are significant so I am considering the cut off for the expression values and fold change. I would like t ask you the fold change cut off which you have mentioned here is for only up regulated conditions right , I should be using the same for the negative as well so get the down regulated candidates for the same as well. Right?
vd4mindia is offline   Reply With Quote
Old 05-31-2013, 05:03 AM   #4
mbblack
Senior Member
 
Location: Research Triangle Park, NC

Join Date: Aug 2009
Posts: 245
Default

My thoughts for those situations where one has no replicates is to ignore the statistics altogether - they are unreliable and will only open your selection of genes to criticism.

So, you are left with selecting genes for further study/analyses based on the magnitude of observed differential expression. A long standing generic cutoff has been to take genes up/down regulated by a magnitude of +/-log2=1, or +/-2-fold. It's a purely arbitrary cutoff, but the thinking is when all you have is a measure of difference, 2-fold is likely high enough to avoid most spurious changes in gene expression.

But, it also depends on what your intent with the data is? If your intent is to select genes for validation by some qPCR based method, then 2-fold may be fine, or you may even want to increase that cutoff, to 2.5 or 3-fold to ensure the most genes that really are differentially regulated (since qPCR assays are expensive and time consuming).

If you are simply generating gene lists for exploratory enrichment or some such analyses, and 2-fold seems too restrictive (i.e. your gene lists are too short to get any real enrichment), then you may actually want to relax it, say to +/-1.5fold.

Any cutoff, statistical or magnitude of change is arbitrary - pick one that you feel you can justify or defend, but which also works in terms of what you need out of the study to go forward.

Even if you had 10 biological replicates and were basing your gene selection on simultaneous cutoffs of corrected p-value and fold change, your choices of those cutoffs is still arbitrary. I routinely use FDR values of anywhere from <0.01 to <0.1 and Fold change values of 1.5 to 2.0, depending on the data in hand, the results obtained, and what I intend to do with the genes identified as "differentially expressed" by whatever criteria.
__________________
Michael Black, Ph.D.
ScitoVation LLC. RTP, N.C.
mbblack is offline   Reply With Quote
Old 07-21-2013, 01:18 AM   #5
thanhhoang
Member
 
Location: Ohio, USA

Join Date: Jul 2013
Posts: 16
Default

Hi guys,
I have some questions about how Cufffdiff does the statistical analysis.
I am looking for DE genes in two sample groups ( 3 replicate per group). In the Cuffdiff;s gene_exp.diff, I found many genes that have very large RPKM fold-change between two groups (with p value < or > 0.05) but still NO significant. Something like this:

test_id gene_id gene locus sample_1 sample_2 status value_1 value_2 log2(fold_change) test_stat p_value q_value significant
ENSMUSG00000047139 ENSMUSG00000047139 Cd24a 10:43579168-43584262 q1 q2 OK 96.2585 2700.55 4.8102 1.6486 0.03995 0.078237 no
ENSMUSG00000066975 ENSMUSG00000066975 Cryba4 5:112246492-112252518 q1 q2 OK 424.582 46190.2 6.7654 0.598327 0.3408 0.442128 no

Then I checked the READ_GROUP_TRACKING file for those genes to check the RPKM value for each replicate:

tracking_id condition replicate raw_frags internal_scaled_frags external_scaled_frags FPKM effective_length status
ENSMUSG00000047139 q1 1 11256 5876.82 5876.82 125.915 - OK
ENSMUSG00000047139 q1 0 3783 4343.44 4343.44 42.0316 - OK
ENSMUSG00000047139 q1 2 10051 5639.48 5639.48 120.829 - OK
ENSMUSG00000047139 q2 1 76771 156059 156059 3343.66 - OK
ENSMUSG00000047139 q2 0 82394 162172 162172 1420.33 - OK
ENSMUSG00000066975 q1 1 12825 6696 6696 407.899 - OK
ENSMUSG00000066975 q1 0 3694 4241.26 4241.26 375.211 - OK
ENSMUSG00000066975 q1 2 14397 8077.95 8077.95 490.636 - OK
ENSMUSG00000066975 q2 1 348103 707619 707619 42455.1 - OK
ENSMUSG00000066975 q2 0 420896 828430 828430 48920.6 - OK
ENSMUSG00000066975 q2 2 331098 767405 767405 47195 - OK


Would not I expect these DE genes are significant? Do you have any idea why Cufflinks show this result?


Best regards
thanhhoang is offline   Reply With Quote
Old 07-21-2013, 01:23 AM   #6
thanhhoang
Member
 
Location: Ohio, USA

Join Date: Jul 2013
Posts: 16
Default

Hi mmback,
I am trying to find DE genes using Cuffdiff as well. You mentioned about using FDR values as the cutoff for DE genes. Just wondering how can you find that value from Cuffdiff's output file?
Thank so much
Thanh
thanhhoang is offline   Reply With Quote
Old 07-21-2013, 03:16 AM   #7
sazz
Member
 
Location: Istanbul, Turkey

Join Date: Oct 2012
Posts: 28
Default

For me, I actually try different q cut-offs and check the gene list output in DAVID to see which cutoff is giving more significant results; now for my last 2 RNA-seq exp. I am using q<0.01 cutoff but no cutoff for fold-change.
sazz is offline   Reply With Quote
Reply

Tags
bioinformactics, bioinformatic analaysis, cuffdiff

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 04:41 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO