Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Survey: RNA-Seq analysis for Differential Gene/Transcript Expression

    Hi all,

    I am looking to build a 'standard' RNA-seq data analysis pipeline for analysing differential gene and possibly transcript expression.

    I am aware that there are a variety of tools out there for the various steps (alignment, counting, differential expression), each with their respective pros and cons, cheerleaders and dissers.

    So I have created a (short) survey which I think could be useful to all of us, to try and see if we are moving towards some consensus about the preferred methodology for each of the steps.

    The survey is at
    Create and publish online surveys in minutes, and view results graphically and in real time. SurveyMonkey provides free online questionnaire and survey software.

    I would be very grateful if you could fill it out : it should only take a few minutes of your time.

    You may prefer to respond within this thread itself but being an optimistic soul, I'm hoping that I get so many responses that I will need to use the results analysis tools on survey monkey!

    Of course, I will make the results available either here or on request.

    Thanks in advance.

  • #2
    Looking forward to seeing the results.
    It may be too late, but what would also be valuable would be to survey the source of annotations people are using (RefSeq, Gencode, and so on). There are good reasons to use one or the other, but I would be curious to see the results.

    Comment


    • #3
      Thanks to everyone who has responded so far!
      Please keep responding ....
      Nicolas - that is a great suggestion and I've included your question in the survey. Hopefully it is not too late and lots more people will answer the survey

      Comment


      • #4
        Originally posted by Nicolas View Post
        Looking forward to seeing the results.
        Me too!

        Originally posted by Nicolas View Post
        It may be too late, but what would also be valuable would be to survey the source of annotations people are using (RefSeq, Gencode, and so on). There are good reasons to use one or the other, but I would be curious to see the results.
        "Custom" annotations would have been nice too (import your own).

        Comment


        • #5
          Thanks to everyone who has answered the survey so far - already some trends are becoming evident!

          However for the results to be accurate and representative we need more respondents. So I urge any one who hasn't yet answered the survey to please do so - it really is a very short survey!

          Comment


          • #6
            For more respondents maybe also try: http://www.reddit.com/r/bioinformatics

            Comment


            • #7
              Thanks Rick!
              Have also put in on BioStar

              Comment


              • #8
                A quick update:
                Again, thanks for all the responses so far.

                I think I'm pretty satisfied with the number of responses and will start to collate the results and generate a report which I will share with everyone.

                This is more non-trivial than I had initially thought as there doesn't seem to be an easy way to get the responses off SurveyMonkey without paying them for it. But I hope to have all this done over the next 2-3 days.

                Meanwhile, if anyone else would like to complete the survey please feel free to do so!
                Cheers

                Comment


                • #9
                  Great idea! Building a 'standard' pipeline is an idea I've toyed with for a while myself. The furthest I've made it is writing sets of scripts that work for all types of data my lab receives. Simply execute the few scripts in order, and most everything is taken care of. The differential analysis part still needs to be implemented, but it's fairly easy to do as is.

                  Comment


                  • #10
                    The results!!!!!

                    Hi all

                    I've finally put together the results of the survey!

                    First of all, thanks to everyone who participated - the response has been great, with 93 people completing the survey as of today.

                    The respondents have been a varied bunch, including all levels of academia (pre-docs, grad-students, pot-docs and PIs), core bioinformaticians and bioinformatics managers, as well as many from the industry. The majority of respondents appear to be based in the US and Europe but also in China, Korea and Australia.

                    I provide below my own summary of the survey's findings, and I attach a document which contains all the results, including all unedited comments. As with any survey, we should probably be aware of potential biases (e.g. skews caused by people who are really annoyed with a particular tool!).

                    My inferences below are probably influenced by my own experiences, so feel free to rap my knuckles if you feel I am over-reaching my inferences or misinterpreted the data, and to air your doubts about the veracity and accuracy of the results and conclusions. I'd also like to declare here that I have no vested interests, have nothing to gain by promoting one tool over another, and have personally only used a small number of all the tools listed.

                    Now for the summary. Enjoy!

                    One of the take-home messages from the survey appears to be that the shadow of the Tuxedo Suite still looms large over the RNA-Seq analysis field. However there is a wide diversity of opinions and experiences, and many other tools appear to be in the ascendancy, especially when it comes to read-counting and differential expression analysis.

                    Q1. What do you prefer to align your reads to?

                    Most respondents align to the genome only (47.3%) , and this is closely followed by those who align to both genome and transcriptome (39.8%). Key to their choices has been the availability and reliability of data, as well as the question being asked in the experiment. Respondents who chose to align to the genome only appear to do so for various reasons such as the ability to discover new transcripts and splice variants. However many respondents have commented that aligning to both the genome and transcriptome offers several advantages, such as increased speed and accuracy. Thus , for a species, if both a reliable genome and transcriptome are available, this might be the optimal way forward.

                    Q2 and 3. What is your preferred aligner? And the reasons why.

                    Tophat rules the roost here, taking more than two-thirds of the vote (67.9%). Reasons for this include its ease of use, proven accuracy (which has improved over time), historical popularity, and that the alternatives available have not yet warranted a change from Tophat. Another Tuxedo suite aligner, Bowtie, comes in at a distant second (17.3%). STAR (6.2%) has been noted for its speed.

                    Q4 and 5. What is your preferred read-counting methodology? And the
                    reasons why.

                    Again, a Tuxedo suite tool, Cufflinks, took the majority of votes (57.1%). Reasons for this included its ease of use but many respondents appear to use this because it has been logical follow-on from using Tophat as per the Tuxedo workflow. The second-placed HTSeq-count appears to be in the ascendancy - many respondents appear to have been dissatisfied with Cufflinks and switched to HTSeq-count. This looks to be a good candidate to topple Cufflinks from the top in the near future. Other notable tools include easyRNASeq and RSEM. Also, many respondents use bedtools, samtools or in-house tools and custom scripts.

                    Q6 and 7. What is your preferred methodology to estimate differential expression? And the reasons why.

                    Finally, a non-Tuxedo suite tool wins the vote: DESeq/DEXSeq with 44.7%. CuffDiff is not too far behind (35.5%) and EdgeR (19.7%) brings up the rear. Going by the comments , we might expect usage of DESeq and EdgeR to increase as opposed to CuffDiff. Results from the latter have been variously described as weird, untrustworthy, having too many false positives and other problems.

                    Q8. Which annotation resource do you use?

                    Ensembl (46.6%) was the clear winner. Second and third places were closely contested between Refseq (25.9%) and UCSC(22.4%) respectively.

                    Q9. What software do you use for downstream analyses?

                    GOSeq (68.9%) is clearly very widely used. Many respondents also use the commercial options of Ingenuity IPA and Genego Metacore. DAVID was also an honourable mention.

                    P.S. Please note: the percentages quoted relate to the numbers of people who answered that particular question. This varies widely across questions, from all 93 respondents in the first question, to 45 for Q9. Please see attached file for all details
                    Attached Files
                    Last edited by bodhisattvax; 12-04-2012, 11:32 AM.

                    Comment


                    • #11
                      Thanks for posting the responses. I'm glad you left all the comments in too.

                      Comment


                      • #12
                        I'd love to know how many people replied Cufflinks for quantif and DESeq or edgeR for DE analysis...

                        Comment


                        • #13
                          Originally posted by Nicolas View Post
                          I'd love to know how many people replied Cufflinks for quantif and DESeq or edgeR for DE analysis...
                          Might be a good way to QC the survey.

                          Comment


                          • #14
                            You're welcome ramma!

                            Nicolas - that's a good question; good enough for me to answer despite requiring having to go through the data manually :-)

                            So after doing a quick, rough count, it looks like there were ~38 people who use Cufflinks for read quant and provided a specific answer for DE methods. Of these, ~14 used DESeq/EdgeR; the majority of the rest: CuffDiff.

                            Interestingly I found at least two examples of people using HTSeq-count and then CuffDiff!

                            Comment


                            • #15
                              It is a good idea cool

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM
                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              25 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 10:19 PM
                              0 responses
                              28 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 09:21 AM
                              0 responses
                              24 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-04-2024, 09:00 AM
                              0 responses
                              52 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X