Seqanswers Leaderboard Ad

**Richard Finney** · 02-03-2011, 02:10 PM

Best is the enemy of good enough.

Blat or homebrew_model or bowtietuxedocuffwhatever probably does the minimal that people need.

To make definitive statements about which is better, you need to compare the results to truth. Can you come with a "truth set" to judge against? I think a synthetic set of input reads for the big test might have flaws.

**RockChalkJayhawk** · 02-03-2011, 03:32 PM

Originally posted by Richard Finney View Post

Best is the enemy of good enough.

Blat or homebrew_model or bowtietuxedocuffwhatever probably does the minimal that people need.

To make definitive statements about which is better, you need to compare the results to truth. Can you come with a "truth set" to judge against? I think a synthetic set of input reads for the big test might have flaws.

Richard,
Yes. It is possible to create a synthestic "truth dataset". However, I would love to apply these on real datasets, but as you pointed out, there is no way of knowing the truth. However, one can objectively assess performance metrics from known datasets.

Of course, no technique is perfect. But it would be advantageous for use to gague how well our performances rate. How else will we know about or address thier weaknesses to make better programs? Obviously this is a huge problem that will take many of us to figure out, but we need to start somewhere if we ever want to move forward.

**steven** · 02-04-2011, 07:53 AM

Originally posted by RockChalkJayhawk View Post

Dear SEQanswers Community,

RNA-Seq is arguably, the most complex next-gen data analysis we face. Unlike genome-based sequencing, RNA-seq yields many different dimensions of data. Tools and algorithms are quickly being released in the literature, and at times it can be difficult to keep up with, although most of the packages are related to genome-based sequencing.

I would like to put together a challenge to the bioinformatics community for the most accurate method for mRNA-Seq analysis, modeled off what many of us participated in for the SEQanswers ChIP-Seq Challenge.

There should be several categories including:
Transcript Assembly
Transcript Quantitation
Gene Quantitation
and Differential Expression Testing

Since each pipeline will use the same dataset, it will be possible to compare sensitivity, accuracy, precision, FDR, etc.

It would be great if we could get Industry to support some awards in these categories.

There are of course several difficulties associated with this Challenge, including the design specifics of the challenge. As of now, I am thinking of setting up simulated human datasets (50 Million 2 x 36 or 2 x 75bp PE reads), but would like input from others on what they feel is most important in assessing analysis quality.

If you are interested in participating in this project, or have ideas/opinions on how to best design this challenge, please respond in this forum.

Best,

Steven Hart
University of Kansas Medical Center

I think that this already exists. Or at least a similar challenge: look for RGASP (RNAseq Genome Annotation Assessment Project).

**RockChalkJayhawk** · 02-04-2011, 07:59 AM

Originally posted by steven View Post

I think that this already exists. Or at least a similar challenge: look for RGASP (RNAseq Genome Annotation Assessment Project).

Thank you steven, I was not aware of this project!

**steven** · 02-04-2011, 08:02 AM

You are welcome, Steven!

**RockChalkJayhawk** · 02-04-2011, 08:04 AM

Any idea on what the initial results look like or when the data will be published?

**steven** · 02-04-2011, 08:42 AM

I heard that two yet unpublished tools were exceptional:
- GEM: an incredibly fast and accurate read aligner, from Paolo Ribeca.
- The Flux Simulator/Flux Capacitor: an impressive RNA-seq analysis package for (alternative) transcript quantification, from Micha Sammeth.
Disclaimer: both are friends of mine

**RockChalkJayhawk** · 02-04-2011, 08:58 AM

I have used FluxSimulator in the past. It is really great!

However, I am trying to find some performance metrics for each of these tools, much like the RGASP project you sent me is doing.

Unfortunately, most users are blindly using these tools because they do the "minimal that people need". Some like cufflinks/cuffdiff do so much extra stuff that they must be the best tools. I am more interested in finding out the strengths and weaknesses of each, rather than accepting the results through blind faith.

For example, using tophat and/or Cufflinks with or without a reference GTF yield different transcript builds. Moreover, the differential statistics in cuffdiff leave me confused (because they are so complex). I can get a lot of "differential expression" between biological replicates (as high as 30% of the genes), which shouldn't happen, and actually does not happen (at the gene-level) when I count the number of reads and use other programs like DESeq (no genes DE). However, there are (to my knowlege) no transcript-level quantification tools that report estimated read counts. Now with so many tools out there, it is a good idea to start to think about how we can gague the performance of each tool.

Again, this seems to be what the RGASP project is aiming for and I look forward to thier results.

**marcora** · 04-16-2011, 01:24 AM

Has anybody been successful in generating a synthetic "truth dataset" for RNAseq. I am comparing cuffdiff to deseq and I am getting very different results. Which one should I pick? I can't answer this question until the dataset mentioned above is available!

**urchgene** · 04-19-2011, 05:52 AM

Hi everyone.................I am trying to do paired end mapping using SHRiMP but it requires that i have both the + orientation and - orientation of these reads following each other simultaneously in same file. Unfortunately i have these reads in a form that this ---> direction is in one file and this <------ direction is in another file. Do you know any scripts i can use to dump these reads in same file but in this manner that both directions are following each other simultaneously? (just a newbie please)

**Jayu** · 01-22-2012, 08:41 AM

Can anyone tell me the pipeline for RNAseq analysis?

**Apexy** · 01-22-2012, 09:33 AM

Hi Jayu,
There is no exact pipeline or tool to do this and the strategy to take will depend on the availability of a reference. While to the avoid suggesting a particular one for you without having this information and without having tasted all that is available, I propose you read the following paper to have a feel of these approaches. Jeffrey A. Martin1 & Zhong Wang. Next-generation transcriptome assembly.Nature Reviews Genetics 12, 671-682 (October 2011) | doi:10.1038/nrg3068

**Jayu** · 01-22-2012, 10:40 PM

Thank you but this paper is freely not available is their any other source or any other paper.

**Apexy** · 01-23-2012, 03:32 AM

Hello,

Sorry about that. I think this one is free: www.genome.org/cgi/doi/10.1101/gr.131383.111
. I have no idea how to send the paper to you and i wonder if it is acceptable to do that here given that it is not free.

HTH

Topics	Statistics	Last Post
A Closer Look at the Enigmatic Genomes of Oikopleura dioica by seqadmin Started by seqadmin, 05-10-2024, 06:35 AM	0 responses 19 views 0 likes	Last Post by seqadmin 05-10-2024, 06:35 AM
Advanced Epigenome Editing Platform Explores Gene Regulation Mechanisms by seqadmin Started by seqadmin, 05-09-2024, 02:46 PM	0 responses 22 views 0 likes	Last Post by seqadmin 05-09-2024, 02:46 PM
Telomere Maintenance by PARP1: A New Perspective in Cancer Research by seqadmin Started by seqadmin, 05-07-2024, 06:57 AM	0 responses 21 views 0 likes	Last Post by seqadmin 05-07-2024, 06:57 AM
Enhanced Neoantigen Detection: Introducing NeoHunter by seqadmin Started by seqadmin, 05-06-2024, 07:17 AM	0 responses 21 views 0 likes	Last Post by seqadmin 05-06-2024, 07:17 AM

Seqanswers Leaderboard Ad

Announcement

RNA-Seq Analysis Challenge

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News