SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Testing/benchmarking RNASeq DE tools marcora Bioinformatics 21 02-19-2016 03:46 AM
Using DEXSeq to compare differential exon usage from different technical replicates alittleboy Bioinformatics 3 06-19-2013 03:11 PM
Tools for finding differential histone modification sites? asiangg Bioinformatics 27 04-09-2013 04:32 AM
tools for differential alternative splicing shamstabrez RNA Sequencing 0 01-18-2013 07:01 AM
differential isoform usage test suninsky RNA Sequencing 0 03-31-2012 10:13 AM

Reply
 
Thread Tools
Old 07-11-2013, 09:15 AM   #1
alittleboy
Member
 
Location: USA

Join Date: Apr 2011
Posts: 60
Default benchmarking tools for evaluating methods for differential exon usage

Dear All:

I am recently doing methodological comparisons among different tools for testing differential exon usage (DEU), and this is a general question about how we evaluate the many approaches via real and simulated datasets.

In the DEXSeq paper, the authors compared DEXSeq with cuffdiff in terms of controlling type-I error rates, via real RNA-Seq dataset: The authors made use of the fact that there are 4 reps in the untreated condition so a "2-2" comparison is possible -- ideally there should be few genes with DEU, and in fact, DEXSeq does perform quite well (from my own data analysis). However, I still have the following questions:

1. DEXSeq and cuffdiff have different model assumptions, so that according to Simon's post here, "...simulation only helps to compare two different tests developed for the same model assumption. To test which model is more appropriate you need real data." -- Is this saying that, for evaluating type-I error control of methods having different model assumptions, no simulation should be used?

2. The other side of the problem, besides type-I error rate, is the detection power of methods. I can use real datasets to get a list of genes with DEU for each method, to see how many overlapped, to see if any particular gene(s) appear to be of biological relevance. However, I didn't see in literature how this problem (picking up true positives) is addressed via simulated RNA-Seq dataset. Is it because it's hard to make the parameter specifications to make a "fair play" for the methods being compared, or we are not supposed to use simulated dataset for this purpose?

3. If in questions 1 and 2, simulated datasets turn out to be an option, then I wonder which RNA-Seq simulator is appropriate for comparing methods for differential exon usage.

BTW, the methods I compare have different model assumptions (just like DEXSeq and cuffdiff).

Thank you so much!
alittleboy is offline   Reply With Quote
Old 07-12-2013, 06:59 AM   #2
Wolfgang Huber
Senior Member
 
Location: Heidelberg, Germany

Join Date: Aug 2009
Posts: 109
Default

Dear alittleboy

assessment on simulated data have their value (more on that below) but as a reader I would be unlikely to pay much attention to a benchmark study that only relied on simulated data and had no real-data assessment. A very good example of how to perform such a benchmark -at the time, for Affymetrix GeneChip gene-level differential expression- was Rafa Irizarry's Affycomp. The paper on their rationale & study design is worth reading for anyone embarking on benchmarking: http://bioinformatics.oxfordjournals.../22/7/789.full Perhaps the major limitation of that one was still that the data were 'too clean' and did not have many of the defects that real data have (esp. from observational studies on non-lab-animals, such as humans).

Re your point 1., I think that was quoting Simon a bit out of context, he was pointing out the all-importance of the simulation assumptions when using simulated data to assess a method. If you can convince your readers that the model assumptions you use for the data simulation are in fact the relevant ones, then the results from such a benchmark are useful. But arguably, if we understood the data generation process that well, then writing a method for detecting differential expression would be simple. The real problem is that we don't.

Just my 2 (euro)cents...

Best wishes
Wolfgang
__________________
Wolfgang Huber
EMBL
Wolfgang Huber is offline   Reply With Quote
Old 07-12-2013, 09:02 AM   #3
alittleboy
Member
 
Location: USA

Join Date: Apr 2011
Posts: 60
Default

Quote:
Originally Posted by Wolfgang Huber View Post
Dear alittleboy

assessment on simulated data have their value (more on that below) but as a reader I would be unlikely to pay much attention to a benchmark study that only relied on simulated data and had no real-data assessment. A very good example of how to perform such a benchmark -at the time, for Affymetrix GeneChip gene-level differential expression- was Rafa Irizarry's Affycomp. The paper on their rationale & study design is worth reading for anyone embarking on benchmarking: http://bioinformatics.oxfordjournals.../22/7/789.full Perhaps the major limitation of that one was still that the data were 'too clean' and did not have many of the defects that real data have (esp. from observational studies on non-lab-animals, such as humans).

Re your point 1., I think that was quoting Simon a bit out of context, he was pointing out the all-importance of the simulation assumptions when using simulated data to assess a method. If you can convince your readers that the model assumptions you use for the data simulation are in fact the relevant ones, then the results from such a benchmark are useful. But arguably, if we understood the data generation process that well, then writing a method for detecting differential expression would be simple. The real problem is that we don't.

Just my 2 (euro)cents...

Best wishes
Wolfgang

Dear Wolfgang:

Thank you so much for your insights! Yes, I strongly agree that if we know the data-generating truth, then there should be a consensus on which DE test tool is best.

Because I am currently comparing different methods for DEU on real datasets, I am very curious how can I tell if one method is better than others (or they're comparable). For example, methods A and B:

1. For mock comparisons, A reported fewer genes with DEU than B, so A wins
2. For proper comparisons, A reported more genes with DEU than B, so A seems to have more detection power (?)
3. If in proper comparison, A reported 100 genes with DEU, B with 50 genes with DEU, and the number of overlap is 25 -- what can I tell from this result?


It is also noted that both methods detect a key gene -- I guess probably this is because the gene has so strong signal due to treatment that every method can detect it?

Thanks again for your help!
alittleboy is offline   Reply With Quote
Reply

Tags
dexseq

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 03:58 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO