Dear SEQanswers Community,
RNA-Seq is arguably, the most complex next-gen data analysis we face. Unlike genome-based sequencing, RNA-seq yields many different dimensions of data. Tools and algorithms are quickly being released in the literature, and at times it can be difficult to keep up with, although most of the packages are related to genome-based sequencing.
I would like to put together a challenge to the bioinformatics community for the most accurate method for mRNA-Seq analysis, modeled off what many of us participated in for the SEQanswers ChIP-Seq Challenge.
There should be several categories including:
Transcript Assembly
Transcript Quantitation
Gene Quantitation
and Differential Expression Testing
Since each pipeline will use the same dataset, it will be possible to compare sensitivity, accuracy, precision, FDR, etc.
It would be great if we could get Industry to support some awards in these categories.
There are of course several difficulties associated with this Challenge, including the design specifics of the challenge. As of now, I am thinking of setting up simulated human datasets (50 Million 2 x 36 or 2 x 75bp PE reads), but would like input from others on what they feel is most important in assessing analysis quality.
If you are interested in participating in this project, or have ideas/opinions on how to best design this challenge, please respond in this forum.
Best,
Steven Hart
University of Kansas Medical Center
RNA-Seq is arguably, the most complex next-gen data analysis we face. Unlike genome-based sequencing, RNA-seq yields many different dimensions of data. Tools and algorithms are quickly being released in the literature, and at times it can be difficult to keep up with, although most of the packages are related to genome-based sequencing.
I would like to put together a challenge to the bioinformatics community for the most accurate method for mRNA-Seq analysis, modeled off what many of us participated in for the SEQanswers ChIP-Seq Challenge.
There should be several categories including:
Transcript Assembly
Transcript Quantitation
Gene Quantitation
and Differential Expression Testing
Since each pipeline will use the same dataset, it will be possible to compare sensitivity, accuracy, precision, FDR, etc.
It would be great if we could get Industry to support some awards in these categories.
There are of course several difficulties associated with this Challenge, including the design specifics of the challenge. As of now, I am thinking of setting up simulated human datasets (50 Million 2 x 36 or 2 x 75bp PE reads), but would like input from others on what they feel is most important in assessing analysis quality.
If you are interested in participating in this project, or have ideas/opinions on how to best design this challenge, please respond in this forum.
Best,
Steven Hart
University of Kansas Medical Center
Comment