SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
differential gene expression analysis between Different strains by RNA-seq qqtwee Bioinformatics 3 07-30-2012 04:19 AM
RNA-Seq - From Cufflinks FPKM to differential gene expression aituka Introductions 1 07-29-2012 10:20 AM
RNA-Seq Quantification and Differential Expression Analysis days369 RNA Sequencing 2 04-06-2011 12:24 AM
RNA-Seq: Cloud-scale RNA-sequencing differential expression analysis with Myrna. Newsbot! Literature Watch 0 08-13-2010 08:00 AM

Reply
 
Thread Tools
Old 11-28-2012, 04:01 AM   #1
bodhisattvax
Member
 
Location: Cambridge

Join Date: Nov 2011
Posts: 12
Default Survey: RNA-Seq analysis for Differential Gene/Transcript Expression

Hi all,

I am looking to build a 'standard' RNA-seq data analysis pipeline for analysing differential gene and possibly transcript expression.

I am aware that there are a variety of tools out there for the various steps (alignment, counting, differential expression), each with their respective pros and cons, cheerleaders and dissers.

So I have created a (short) survey which I think could be useful to all of us, to try and see if we are moving towards some consensus about the preferred methodology for each of the steps.

The survey is at
http://www.surveymonkey.com/s/72953N9
I would be very grateful if you could fill it out : it should only take a few minutes of your time.

You may prefer to respond within this thread itself but being an optimistic soul, I'm hoping that I get so many responses that I will need to use the results analysis tools on survey monkey!

Of course, I will make the results available either here or on request.

Thanks in advance.
bodhisattvax is offline   Reply With Quote
Old 11-28-2012, 03:39 PM   #2
Nicolas
Member
 
Location: new york city

Join Date: Apr 2009
Posts: 40
Default

Looking forward to seeing the results.
It may be too late, but what would also be valuable would be to survey the source of annotations people are using (RefSeq, Gencode, and so on). There are good reasons to use one or the other, but I would be curious to see the results.
Nicolas is offline   Reply With Quote
Old 11-28-2012, 11:24 PM   #3
bodhisattvax
Member
 
Location: Cambridge

Join Date: Nov 2011
Posts: 12
Default

Thanks to everyone who has responded so far!
Please keep responding ....
Nicolas - that is a great suggestion and I've included your question in the survey. Hopefully it is not too late and lots more people will answer the survey
bodhisattvax is offline   Reply With Quote
Old 11-28-2012, 11:54 PM   #4
syfo
Just a member
 
Location: Southern EU

Join Date: Nov 2012
Posts: 103
Default

Quote:
Originally Posted by Nicolas View Post
Looking forward to seeing the results.
Me too!

Quote:
Originally Posted by Nicolas View Post
It may be too late, but what would also be valuable would be to survey the source of annotations people are using (RefSeq, Gencode, and so on). There are good reasons to use one or the other, but I would be curious to see the results.
"Custom" annotations would have been nice too (import your own).
syfo is offline   Reply With Quote
Old 11-29-2012, 10:06 PM   #5
bodhisattvax
Member
 
Location: Cambridge

Join Date: Nov 2011
Posts: 12
Default

Thanks to everyone who has answered the survey so far - already some trends are becoming evident!

However for the results to be accurate and representative we need more respondents. So I urge any one who hasn't yet answered the survey to please do so - it really is a very short survey!
bodhisattvax is offline   Reply With Quote
Old 11-29-2012, 11:40 PM   #6
RickBioinf
Member
 
Location: Leiden, The Netherlands

Join Date: Sep 2012
Posts: 28
Default

For more respondents maybe also try: http://www.reddit.com/r/bioinformatics
RickBioinf is offline   Reply With Quote
Old 11-30-2012, 02:00 AM   #7
bodhisattvax
Member
 
Location: Cambridge

Join Date: Nov 2011
Posts: 12
Default

Thanks Rick!
Have also put in on BioStar
bodhisattvax is offline   Reply With Quote
Old 12-03-2012, 01:01 AM   #8
bodhisattvax
Member
 
Location: Cambridge

Join Date: Nov 2011
Posts: 12
Default

A quick update:
Again, thanks for all the responses so far.

I think I'm pretty satisfied with the number of responses and will start to collate the results and generate a report which I will share with everyone.

This is more non-trivial than I had initially thought as there doesn't seem to be an easy way to get the responses off SurveyMonkey without paying them for it. But I hope to have all this done over the next 2-3 days.

Meanwhile, if anyone else would like to complete the survey please feel free to do so!
Cheers
bodhisattvax is offline   Reply With Quote
Old 12-03-2012, 02:26 PM   #9
ramma
Member
 
Location: Washington

Join Date: Jun 2012
Posts: 16
Default

Great idea! Building a 'standard' pipeline is an idea I've toyed with for a while myself. The furthest I've made it is writing sets of scripts that work for all types of data my lab receives. Simply execute the few scripts in order, and most everything is taken care of. The differential analysis part still needs to be implemented, but it's fairly easy to do as is.
ramma is offline   Reply With Quote
Old 12-04-2012, 10:00 AM   #10
bodhisattvax
Member
 
Location: Cambridge

Join Date: Nov 2011
Posts: 12
Default The results!!!!!

Hi all

I've finally put together the results of the survey!

First of all, thanks to everyone who participated - the response has been great, with 93 people completing the survey as of today.

The respondents have been a varied bunch, including all levels of academia (pre-docs, grad-students, pot-docs and PIs), core bioinformaticians and bioinformatics managers, as well as many from the industry. The majority of respondents appear to be based in the US and Europe but also in China, Korea and Australia.

I provide below my own summary of the survey's findings, and I attach a document which contains all the results, including all unedited comments. As with any survey, we should probably be aware of potential biases (e.g. skews caused by people who are really annoyed with a particular tool!).

My inferences below are probably influenced by my own experiences, so feel free to rap my knuckles if you feel I am over-reaching my inferences or misinterpreted the data, and to air your doubts about the veracity and accuracy of the results and conclusions. I'd also like to declare here that I have no vested interests, have nothing to gain by promoting one tool over another, and have personally only used a small number of all the tools listed.

Now for the summary. Enjoy!

One of the take-home messages from the survey appears to be that the shadow of the Tuxedo Suite still looms large over the RNA-Seq analysis field. However there is a wide diversity of opinions and experiences, and many other tools appear to be in the ascendancy, especially when it comes to read-counting and differential expression analysis.

Q1. What do you prefer to align your reads to?

Most respondents align to the genome only (47.3%) , and this is closely followed by those who align to both genome and transcriptome (39.8%). Key to their choices has been the availability and reliability of data, as well as the question being asked in the experiment. Respondents who chose to align to the genome only appear to do so for various reasons such as the ability to discover new transcripts and splice variants. However many respondents have commented that aligning to both the genome and transcriptome offers several advantages, such as increased speed and accuracy. Thus , for a species, if both a reliable genome and transcriptome are available, this might be the optimal way forward.

Q2 and 3. What is your preferred aligner? And the reasons why.

Tophat rules the roost here, taking more than two-thirds of the vote (67.9%). Reasons for this include its ease of use, proven accuracy (which has improved over time), historical popularity, and that the alternatives available have not yet warranted a change from Tophat. Another Tuxedo suite aligner, Bowtie, comes in at a distant second (17.3%). STAR (6.2%) has been noted for its speed.

Q4 and 5. What is your preferred read-counting methodology? And the
reasons why.

Again, a Tuxedo suite tool, Cufflinks, took the majority of votes (57.1%). Reasons for this included its ease of use but many respondents appear to use this because it has been logical follow-on from using Tophat as per the Tuxedo workflow. The second-placed HTSeq-count appears to be in the ascendancy - many respondents appear to have been dissatisfied with Cufflinks and switched to HTSeq-count. This looks to be a good candidate to topple Cufflinks from the top in the near future. Other notable tools include easyRNASeq and RSEM. Also, many respondents use bedtools, samtools or in-house tools and custom scripts.

Q6 and 7. What is your preferred methodology to estimate differential expression? And the reasons why.

Finally, a non-Tuxedo suite tool wins the vote: DESeq/DEXSeq with 44.7%. CuffDiff is not too far behind (35.5%) and EdgeR (19.7%) brings up the rear. Going by the comments , we might expect usage of DESeq and EdgeR to increase as opposed to CuffDiff. Results from the latter have been variously described as weird, untrustworthy, having too many false positives and other problems.

Q8. Which annotation resource do you use?

Ensembl (46.6%) was the clear winner. Second and third places were closely contested between Refseq (25.9%) and UCSC(22.4%) respectively.

Q9. What software do you use for downstream analyses?

GOSeq (68.9%) is clearly very widely used. Many respondents also use the commercial options of Ingenuity IPA and Genego Metacore. DAVID was also an honourable mention.

P.S. Please note: the percentages quoted relate to the numbers of people who answered that particular question. This varies widely across questions, from all 93 respondents in the first question, to 45 for Q9. Please see attached file for all details
Attached Files
File Type: pdf RNA-Seq survey.pdf (325.0 KB, 191 views)

Last edited by bodhisattvax; 12-04-2012 at 10:32 AM.
bodhisattvax is offline   Reply With Quote
Old 12-04-2012, 10:38 AM   #11
ramma
Member
 
Location: Washington

Join Date: Jun 2012
Posts: 16
Default

Thanks for posting the responses. I'm glad you left all the comments in too.
ramma is offline   Reply With Quote
Old 12-04-2012, 10:49 AM   #12
Nicolas
Member
 
Location: new york city

Join Date: Apr 2009
Posts: 40
Default

I'd love to know how many people replied Cufflinks for quantif and DESeq or edgeR for DE analysis...
Nicolas is offline   Reply With Quote
Old 12-04-2012, 11:28 AM   #13
pbluescript
Senior Member
 
Location: Boston

Join Date: Nov 2009
Posts: 224
Default

Quote:
Originally Posted by Nicolas View Post
I'd love to know how many people replied Cufflinks for quantif and DESeq or edgeR for DE analysis...
Might be a good way to QC the survey.
pbluescript is offline   Reply With Quote
Old 12-05-2012, 12:47 AM   #14
bodhisattvax
Member
 
Location: Cambridge

Join Date: Nov 2011
Posts: 12
Default

You're welcome ramma!

Nicolas - that's a good question; good enough for me to answer despite requiring having to go through the data manually :-)

So after doing a quick, rough count, it looks like there were ~38 people who use Cufflinks for read quant and provided a specific answer for DE methods. Of these, ~14 used DESeq/EdgeR; the majority of the rest: CuffDiff.

Interestingly I found at least two examples of people using HTSeq-count and then CuffDiff!
bodhisattvax is offline   Reply With Quote
Old 06-12-2013, 09:06 AM   #15
jian_gao
Junior Member
 
Location: ucdavis

Join Date: Oct 2012
Posts: 1
Default

It is a good idea cool
jian_gao is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 03:44 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO