Dear All:
I am recently doing methodological comparisons among different tools for testing differential exon usage (DEU), and this is a general question about how we evaluate the many approaches via real and simulated datasets.
In the DEXSeq paper, the authors compared DEXSeq with cuffdiff in terms of controlling type-I error rates, via real RNA-Seq dataset: The authors made use of the fact that there are 4 reps in the untreated condition so a "2-2" comparison is possible -- ideally there should be few genes with DEU, and in fact, DEXSeq does perform quite well (from my own data analysis). However, I still have the following questions:
1. DEXSeq and cuffdiff have different model assumptions, so that according to Simon's post here, "...simulation only helps to compare two different tests developed for the same model assumption. To test which model is more appropriate you need real data." -- Is this saying that, for evaluating type-I error control of methods having different model assumptions, no simulation should be used?
2. The other side of the problem, besides type-I error rate, is the detection power of methods. I can use real datasets to get a list of genes with DEU for each method, to see how many overlapped, to see if any particular gene(s) appear to be of biological relevance. However, I didn't see in literature how this problem (picking up true positives) is addressed via simulated RNA-Seq dataset. Is it because it's hard to make the parameter specifications to make a "fair play" for the methods being compared, or we are not supposed to use simulated dataset for this purpose?
3. If in questions 1 and 2, simulated datasets turn out to be an option, then I wonder which RNA-Seq simulator is appropriate for comparing methods for differential exon usage.
BTW, the methods I compare have different model assumptions (just like DEXSeq and cuffdiff).
Thank you so much!
I am recently doing methodological comparisons among different tools for testing differential exon usage (DEU), and this is a general question about how we evaluate the many approaches via real and simulated datasets.
In the DEXSeq paper, the authors compared DEXSeq with cuffdiff in terms of controlling type-I error rates, via real RNA-Seq dataset: The authors made use of the fact that there are 4 reps in the untreated condition so a "2-2" comparison is possible -- ideally there should be few genes with DEU, and in fact, DEXSeq does perform quite well (from my own data analysis). However, I still have the following questions:
1. DEXSeq and cuffdiff have different model assumptions, so that according to Simon's post here, "...simulation only helps to compare two different tests developed for the same model assumption. To test which model is more appropriate you need real data." -- Is this saying that, for evaluating type-I error control of methods having different model assumptions, no simulation should be used?
2. The other side of the problem, besides type-I error rate, is the detection power of methods. I can use real datasets to get a list of genes with DEU for each method, to see how many overlapped, to see if any particular gene(s) appear to be of biological relevance. However, I didn't see in literature how this problem (picking up true positives) is addressed via simulated RNA-Seq dataset. Is it because it's hard to make the parameter specifications to make a "fair play" for the methods being compared, or we are not supposed to use simulated dataset for this purpose?
3. If in questions 1 and 2, simulated datasets turn out to be an option, then I wonder which RNA-Seq simulator is appropriate for comparing methods for differential exon usage.
BTW, the methods I compare have different model assumptions (just like DEXSeq and cuffdiff).
Thank you so much!
Comment