Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • ThePresident
    replied
    Honestly, I don't have any certitudes concerning the outcome of my experience. I have my control and my test condition. Some genes could be upregulated, others downregulated but I expect that a majority would be unaltered.

    From there, and considering your examples, I should expect some of the genes to come necessarily as false positives. So, a prudent way to proceed would be to consider a more stringent approach... using FDR < 0.05 I got 15 genes which is fair well plus I'm confident that I see some true changes in expression.

    Thank you for your help

    Leave a comment:


  • mbblack
    replied
    You'll tend to interprete the FDR value similarly to how you interprete a single p-value. What is your comfort level in terms of false positives? Also, what are you generating gene lists for?

    So, for example:

    scenario one - you wish to pull out significantly differentially expressed genes for some sort of ontology or enrichment analysis. Your primary goal is to identify biological processes, pathways or other ontology categories. So you may be fairly relaxed with your choice of cutoff in order to be sure to have sufficient genes to get a reasonably robust enrichment result. So, you may pick an FDR of < 0.05, or even 0.1 if you need to pad out your gene lists.

    scenario two - you are trying to pick out genes as candidates for bio-assay development, so you'd like to find the least number necessary to characterize your system, and you need to be stringent about your risk of false positives (wasted money down the road if those fail to validate for your assay). So you now pick a more stringent FDR, maybe even going to < 0.01 if that gives you enough to continue with. Perhaps you simultaneously throw in a fold change cutoff as well, so only take genes with both an FDR < 0.05 and a log2 FC > 2 (picking only highly significant high expressors).

    So, as with any choice of statistical criteria, you pick a cutoff that makes sense in light of your questions(s) and your system.

    Since I work in toxicology, we tend to worry more about false negatives than false positives, so I generally need not be overly stringent with my cutoff and usually use an FDR < 0.05 to 0.1 in order to be sure I capture enough genes for my downstream analyses (and I frequently add a fold change filter as well, with linear fold change of +/-1.5 to +/-2) - but it really depends on what you want out of the diff. gene expression analysis in the first place.

    Leave a comment:


  • ThePresident
    replied
    Originally posted by mbblack View Post

    A fundamental difference is that a p-value is a statement about the probability of an observed test statistic given its distribution.

    While an FDR is a statement about the probability of false discoveries given a certain number of simultaneous tests and their p-value distribution. It is an attempt to control for false discoveries as the type I error tends to balloon with multiple tests, and that multiplicity of errors is not reflected in the individual test statistic's p-values.

    So if you base your selection on p-values, you will end up inherently including a large number of false positives. Using the FDR, you are controlling the number of false positives across all your significant statistical tests.
    Well explained, thanks! In fact, that's what I did; now I do have somewhat better understanding of what those parameters are. Only one thing: how do you interpret padj value in terms of significance? You consider that a null hypothesis is rejected under some threshold (like for p value) or...? I don't know if I was enough clear...

    Leave a comment:


  • mbblack
    replied
    Originally posted by ThePresident View Post
    Many thanks. I'll review the subject in some statistics handbook for details.
    just do a web search for some simple terms like "p-value versus FDR" and you will find many good summaries, many of them on various stats departments or professors web pages.

    A fundamental difference is that a p-value is a statement about the probability of an observed test statistic given its distribution.

    While an FDR is a statement about the probability of false discoveries given a certain number of simultaneous tests and their p-value distribution. It is an attempt to control for false discoveries as the type I error tends to balloon with multiple tests, and that multiplicity of errors is not reflected in the individual test statistic's p-values.

    So if you base your selection on p-values, you will end up inherently including a large number of false positives. Using the FDR, you are controlling the number of false positives across all your significant statistical tests.

    Leave a comment:


  • ishmael
    replied

    I think this is a good start point.

    PS, I am Chinese, but I don't think statistics is so simple as Chinese :-)

    Leave a comment:


  • mgogol
    replied
    I always thought this was a pretty good illustration:

    Leave a comment:


  • ThePresident
    replied
    Many thanks. I'll review the subject in some statistics handbook for details.

    Leave a comment:


  • dpryan
    replied
    You'll want to use the adjusted p-value. For the reason why, I would suggest that you review what a p-value is and why you expect to find spurious findings with increasing numbers of tests (the various adjusts are aimed at addressing this).

    Leave a comment:


  • ThePresident
    started a topic DESeq: pval vs padj

    DESeq: pval vs padj

    Hello everybody,

    Statistics is not my strong side, so I'm asking a basic question here. I've done some bacterial RNA seq. I was able to extract my reads with HTSeq and I've done some statistics with DESeq. Now I have a list of DE genes with their associated p value and padj value. I know from the DESeq vignette that padj value corresponds to p-value adjusted for multiple testing using Benjamini-Hochberg method. However, for me that's equivalent to Chinese, i.e. I'm not sure what does it actually means.

    Which p value I should consider when treating my data? Number of genes I would include in a qPCR validation study will change notably... And I would also like to understand why I should consider one or the other parameter.

    Thanks in advance!

Latest Articles

Collapse

  • seqadmin
    Addressing Off-Target Effects in CRISPR Technologies
    by seqadmin






    The first FDA-approved CRISPR-based therapy marked the transition of therapeutic gene editing from a dream to reality1. CRISPR technologies have streamlined gene editing, and CRISPR screens have become an important approach for identifying genes involved in disease processes2. This technique introduces targeted mutations across numerous genes, enabling large-scale identification of gene functions, interactions, and pathways3. Identifying the full range...
    08-27-2024, 04:44 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, Yesterday, 08:02 AM
0 responses
10 views
0 likes
Last Post seqadmin  
Started by seqadmin, 09-03-2024, 08:30 AM
0 responses
12 views
0 likes
Last Post seqadmin  
Started by seqadmin, 08-27-2024, 04:40 AM
0 responses
21 views
0 likes
Last Post seqadmin  
Started by seqadmin, 08-22-2024, 05:00 AM
0 responses
358 views
0 likes
Last Post seqadmin  
Working...
X