SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
RNA-Seq: ReCount: A multi-experiment resource of analysis-ready RNA-seq gene count da Newsbot! Literature Watch 0 11-18-2011 02:20 AM
RNA-Seq: Comparative Analysis of RNA-Seq Alignment Algorithms and the RNA-Seq Unified Newsbot! Literature Watch 3 07-31-2011 07:08 PM
RNA-Seq: Massive-Scale RNA-Seq Analysis of Non Ribosomal Transcriptome in Human Triso Newsbot! Literature Watch 0 05-03-2011 02:00 AM
RNA-Seq: Deep sequencing-based transcriptome profiling analysis of bacteria-challenge Newsbot! Literature Watch 0 08-17-2010 02:00 AM
ChIP-Seq Challenge Nix Bioinformatics 32 03-18-2010 06:32 AM

Reply
 
Thread Tools
Old 02-03-2011, 11:50 AM   #1
RockChalkJayhawk
Senior Member
 
Location: Rochester, MN

Join Date: Mar 2009
Posts: 191
Default RNA-Seq Analysis Challenge

Dear SEQanswers Community,

RNA-Seq is arguably, the most complex next-gen data analysis we face. Unlike genome-based sequencing, RNA-seq yields many different dimensions of data. Tools and algorithms are quickly being released in the literature, and at times it can be difficult to keep up with, although most of the packages are related to genome-based sequencing.

I would like to put together a challenge to the bioinformatics community for the most accurate method for mRNA-Seq analysis, modeled off what many of us participated in for the SEQanswers ChIP-Seq Challenge.

There should be several categories including:
Transcript Assembly
Transcript Quantitation
Gene Quantitation
and Differential Expression Testing

Since each pipeline will use the same dataset, it will be possible to compare sensitivity, accuracy, precision, FDR, etc.

It would be great if we could get Industry to support some awards in these categories.

There are of course several difficulties associated with this Challenge, including the design specifics of the challenge. As of now, I am thinking of setting up simulated human datasets (50 Million 2 x 36 or 2 x 75bp PE reads), but would like input from others on what they feel is most important in assessing analysis quality.

If you are interested in participating in this project, or have ideas/opinions on how to best design this challenge, please respond in this forum.

Best,

Steven Hart
University of Kansas Medical Center
RockChalkJayhawk is offline   Reply With Quote
Old 02-03-2011, 01:10 PM   #2
Richard Finney
Senior Member
 
Location: bethesda

Join Date: Feb 2009
Posts: 687
Default

Best is the enemy of good enough.

Blat or homebrew_model or bowtietuxedocuffwhatever probably does the minimal that people need.

To make definitive statements about which is better, you need to compare the results to truth. Can you come with a "truth set" to judge against? I think a synthetic set of input reads for the big test might have flaws.
Richard Finney is offline   Reply With Quote
Old 02-03-2011, 02:32 PM   #3
RockChalkJayhawk
Senior Member
 
Location: Rochester, MN

Join Date: Mar 2009
Posts: 191
Default

Quote:
Originally Posted by Richard Finney View Post
Best is the enemy of good enough.

Blat or homebrew_model or bowtietuxedocuffwhatever probably does the minimal that people need.

To make definitive statements about which is better, you need to compare the results to truth. Can you come with a "truth set" to judge against? I think a synthetic set of input reads for the big test might have flaws.
Richard,
Yes. It is possible to create a synthestic "truth dataset". However, I would love to apply these on real datasets, but as you pointed out, there is no way of knowing the truth. However, one can objectively assess performance metrics from known datasets.

Of course, no technique is perfect. But it would be advantageous for use to gague how well our performances rate. How else will we know about or address thier weaknesses to make better programs? Obviously this is a huge problem that will take many of us to figure out, but we need to start somewhere if we ever want to move forward.
RockChalkJayhawk is offline   Reply With Quote
Old 02-04-2011, 06:53 AM   #4
steven
Senior Member
 
Location: Southern France

Join Date: Aug 2009
Posts: 269
Default

Quote:
Originally Posted by RockChalkJayhawk View Post
Dear SEQanswers Community,

RNA-Seq is arguably, the most complex next-gen data analysis we face. Unlike genome-based sequencing, RNA-seq yields many different dimensions of data. Tools and algorithms are quickly being released in the literature, and at times it can be difficult to keep up with, although most of the packages are related to genome-based sequencing.

I would like to put together a challenge to the bioinformatics community for the most accurate method for mRNA-Seq analysis, modeled off what many of us participated in for the SEQanswers ChIP-Seq Challenge.

There should be several categories including:
Transcript Assembly
Transcript Quantitation
Gene Quantitation
and Differential Expression Testing

Since each pipeline will use the same dataset, it will be possible to compare sensitivity, accuracy, precision, FDR, etc.

It would be great if we could get Industry to support some awards in these categories.

There are of course several difficulties associated with this Challenge, including the design specifics of the challenge. As of now, I am thinking of setting up simulated human datasets (50 Million 2 x 36 or 2 x 75bp PE reads), but would like input from others on what they feel is most important in assessing analysis quality.

If you are interested in participating in this project, or have ideas/opinions on how to best design this challenge, please respond in this forum.

Best,

Steven Hart
University of Kansas Medical Center
I think that this already exists. Or at least a similar challenge: look for RGASP (RNAseq Genome Annotation Assessment Project).
steven is offline   Reply With Quote
Old 02-04-2011, 06:59 AM   #5
RockChalkJayhawk
Senior Member
 
Location: Rochester, MN

Join Date: Mar 2009
Posts: 191
Default

Quote:
Originally Posted by steven View Post
I think that this already exists. Or at least a similar challenge: look for RGASP (RNAseq Genome Annotation Assessment Project).
Thank you steven, I was not aware of this project!
RockChalkJayhawk is offline   Reply With Quote
Old 02-04-2011, 07:02 AM   #6
steven
Senior Member
 
Location: Southern France

Join Date: Aug 2009
Posts: 269
Default

You are welcome, Steven!
steven is offline   Reply With Quote
Old 02-04-2011, 07:04 AM   #7
RockChalkJayhawk
Senior Member
 
Location: Rochester, MN

Join Date: Mar 2009
Posts: 191
Default

Any idea on what the initial results look like or when the data will be published?
RockChalkJayhawk is offline   Reply With Quote
Old 02-04-2011, 07:42 AM   #8
steven
Senior Member
 
Location: Southern France

Join Date: Aug 2009
Posts: 269
Default

I heard that two yet unpublished tools were exceptional:
- GEM: an incredibly fast and accurate read aligner, from Paolo Ribeca.
- The Flux Simulator/Flux Capacitor: an impressive RNA-seq analysis package for (alternative) transcript quantification, from Micha Sammeth.
Disclaimer: both are friends of mine
steven is offline   Reply With Quote
Old 02-04-2011, 07:58 AM   #9
RockChalkJayhawk
Senior Member
 
Location: Rochester, MN

Join Date: Mar 2009
Posts: 191
Default

I have used FluxSimulator in the past. It is really great!

However, I am trying to find some performance metrics for each of these tools, much like the RGASP project you sent me is doing.

Unfortunately, most users are blindly using these tools because they do the "minimal that people need". Some like cufflinks/cuffdiff do so much extra stuff that they must be the best tools. I am more interested in finding out the strengths and weaknesses of each, rather than accepting the results through blind faith.

For example, using tophat and/or Cufflinks with or without a reference GTF yield different transcript builds. Moreover, the differential statistics in cuffdiff leave me confused (because they are so complex). I can get a lot of "differential expression" between biological replicates (as high as 30% of the genes), which shouldn't happen, and actually does not happen (at the gene-level) when I count the number of reads and use other programs like DESeq (no genes DE). However, there are (to my knowlege) no transcript-level quantification tools that report estimated read counts. Now with so many tools out there, it is a good idea to start to think about how we can gague the performance of each tool.

Again, this seems to be what the RGASP project is aiming for and I look forward to thier results.
RockChalkJayhawk is offline   Reply With Quote
Old 04-16-2011, 01:24 AM   #10
marcora
Member
 
Location: Pasadena, CA USA

Join Date: Jan 2010
Posts: 52
Default

Has anybody been successful in generating a synthetic "truth dataset" for RNAseq. I am comparing cuffdiff to deseq and I am getting very different results. Which one should I pick? I can't answer this question until the dataset mentioned above is available!
marcora is offline   Reply With Quote
Old 04-19-2011, 05:52 AM   #11
urchgene
Member
 
Location: helsinki

Join Date: Oct 2010
Posts: 14
Default

Hi everyone.................I am trying to do paired end mapping using SHRiMP but it requires that i have both the + orientation and - orientation of these reads following each other simultaneously in same file. Unfortunately i have these reads in a form that this ---> direction is in one file and this <------ direction is in another file. Do you know any scripts i can use to dump these reads in same file but in this manner that both directions are following each other simultaneously? (just a newbie please)
urchgene is offline   Reply With Quote
Old 01-22-2012, 07:41 AM   #12
Jayu
Member
 
Location: Ahmedabad

Join Date: Mar 2011
Posts: 14
Default

Can anyone tell me the pipeline for RNAseq analysis?
Jayu is offline   Reply With Quote
Old 01-22-2012, 08:33 AM   #13
Apexy
Member
 
Location: Africa

Join Date: Apr 2011
Posts: 62
Default

Hi Jayu,
There is no exact pipeline or tool to do this and the strategy to take will depend on the availability of a reference. While to the avoid suggesting a particular one for you without having this information and without having tasted all that is available, I propose you read the following paper to have a feel of these approaches. Jeffrey A. Martin1 & Zhong Wang. Next-generation transcriptome assembly.Nature Reviews Genetics 12, 671-682 (October 2011) | doi:10.1038/nrg3068
Apexy is offline   Reply With Quote
Old 01-22-2012, 09:40 PM   #14
Jayu
Member
 
Location: Ahmedabad

Join Date: Mar 2011
Posts: 14
Default

Thank you but this paper is freely not available is their any other source or any other paper.
Jayu is offline   Reply With Quote
Old 01-23-2012, 02:32 AM   #15
Apexy
Member
 
Location: Africa

Join Date: Apr 2011
Posts: 62
Default

Hello,

Sorry about that. I think this one is free: www.genome.org/cgi/doi/10.1101/gr.131383.111
. I have no idea how to send the paper to you and i wonder if it is acceptable to do that here given that it is not free.

HTH
Apexy is offline   Reply With Quote
Old 01-23-2012, 08:46 AM   #16
NGSfan
Senior Member
 
Location: Austria

Join Date: Apr 2009
Posts: 181
Default

Jayu - whenever I cannot get access to a paper, I usually just write to the authors and they are more than happy to share a copy with me.

Don't be shy, you should try it!
NGSfan is offline   Reply With Quote
Old 01-27-2012, 10:01 PM   #17
Jayu
Member
 
Location: Ahmedabad

Join Date: Mar 2011
Posts: 14
Default

Thank you for the information it was very helpful!!!
Jayu is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 06:31 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO