Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Being "good" at analysis of nextgen sequencing data

    I am doing various types of analysis of Nextgen sequencing data. Basically I am using Excel to compare pieces of data between Nextgen data and other sources, such as data I've found online.

    It feels like I am going about the analysis really inefficiently and wasting a lot of time.

    Often the data from online is really disorganized, and a lot of time is wasted in standardizing this data.

    Also, the questions I am trying to answer are somewhat vague so time is wasted because I am not even always sure what question I am trying to answer.

    I don't really think I can get "good at bioinformatics" from this forum, and I think if I haven't developed the ways to do it well myself than I'm pretty much screwed,

    However I am just curious how others use Excel for Nextgen data analysis. Does knowing Excel VBA help? Are there any other free programs or other coding languages? Maybe it's just an art-form and I don't have it. I only have a Master's and it seems like thinking at a PhD level would be really helpful.

  • #2
    The first rule of bioinformatics is, "Don't use Excel." The second rule of bioinformatics is, "Don't use Excel."

    Excel is a terrible platform for data munging (i.e., taking disorganized files and changing them into a standard machine processable format) and analysis, you're just making your life difficult by using it.

    If you really want to get good at data analysis, then the tools you'll need will be the command line, python (or perl) and R. Once you come to grips with those you'll be able to do much more in a more efficient way.

    Comment


    • #3

      (https://mobile.twitter.com/tim_yates...7709504513?p=v)
      Last edited by blancha; 08-07-2015, 03:46 AM. Reason: Added source of image

      Comment


      • #4
        I can learn those things.

        It really worries me that my employer is expecting me to answer bioinformatics questions only with Excel. It seems like they have destined my failure.

        Comment


        • #5
          You don't need a Phd, but maybe a coursera course or two, or a few safari books on command line, R, and/or scripting languages could help.

          Comment


          • #6
            I kind of fell into this computation stuff about a year ago with no formal background in computer science, so I understand where you're coming from.

            First, as emphasize above, ditch excel for data exploration. It's simply not powerful enough for most tasks. Take some time to get familiarized with R. If you're mostly working with excel-style tables, it's a very easy language to pick up. The RStudio IDE (rstudio.com) also makes the transition much easier because it provides an interface that allows you to see the defined variables (you can click the view icon to view stored matrices/dataframes), it integrates generated plots, and a text editor for writing up code. Because your tasks can be scripted out, repeating them becomes incredibly fast (much faster than flying around 10 different excel files), and not to mention reduces the chance of human-error. Now, admittedly, if the data you download comes in some unorganized format (likely put together by someone who does not do computational biology), excel can be faster for re-organizing, but once it's in a standard rows/columns format, R is superior. Note that other languages can work equally well, but I think that R will have the easiest learning curve.

            Second, if you will actually be generating and processing your own NGS data, you'll need to learn some command line. It can look intimidating, but basic usage (ie. simply executing existing programs on your data) is very easy. I would recommend learning what the standard workflow for analyzing your data is (eg. ChIP-seq: Align with bowtie, call peaks with MACS), and simply google how to do each step.

            Third, I'm always worried when someone asks me to simply "analyze the data" without any question in mind. These datasets can be quite large, and there often won't be a pattern or answer staring you in the face when you open the file. Take some time to think of questions before you start digging through the data. What genes are differentially expressed across these two conditions? In this list of genes, is there an enrichment of some biological function? What genes correlate in expression with gene X? etc.

            Hope this helps!

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Strategies for Sequencing Challenging Samples
              by seqadmin


              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
              03-22-2024, 06:39 AM
            • seqadmin
              Techniques and Challenges in Conservation Genomics
              by seqadmin



              The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

              Avian Conservation
              Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
              03-08-2024, 10:41 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 03-27-2024, 06:37 PM
            0 responses
            12 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-27-2024, 06:07 PM
            0 responses
            11 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-22-2024, 10:03 AM
            0 responses
            53 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-21-2024, 07:32 AM
            0 responses
            68 views
            0 likes
            Last Post seqadmin  
            Working...
            X