Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Data analysis A-Z

    Hi,
    This is wonderful forum but i wonder why there isn't (at least I cant find) NGS data analysis from A-Z for dummies thread.
    Like:
    1. If you're using 454, then align reads using XX software and XX reference sequence with settings x y z
    example command: samtools mpileup [-EBug] [-C capQcoef] [-r reg] [-f in.fa] [-l list] [-M capMapQ] [-Q minBaseQ] [-q minMapQ] in.bam

    2...
    3...


    Z. And here is your genome sequence with genetic variant annotation and full common qc steps done, ready for case-control (familial, ect.) studies (or whatever you do with it)

    Any help with this dream of mine, because I am noob in this NGS thing but have to learn.

    Thank you!

  • #2
    If you want an honest but maybe unwelcome answer: Because good data analysis cannot be done by following a cookbook recipe. There is no single correct way to do it, because two different analysis tasks are hardly ever the same. There is a reason that most tools offer you so many options and tuning parameters.

    To be frank, it distresses me a bit how many new users stumble in this forum and hope that they can analyze their data just by googling, and without having to read papers and reviews and textbooks.

    I am not a wetlab person but I image there is also no single explanation on "how to perform a chromatin immunoprecipitation for dummies". I imagine you may need very different procedures depending on what kind of sample and what antibody you are working with, what you hope to find etc., and not o forget, you need to know how to check that everything went well, and hence, you will not get around reading a lot before starting.

    Once a technique is decades old, there might be standard approaches (typically, I imagine: buy some kit, put it into tube with sample, shake) but high-throughput sequencing is still under active development and recommended practices change monthly.

    So, please don't take this personally, but as an advice to you as a newcomer in the field: Please understand that this is a subject as complex and in need of good planning as any other part of an experiment.

    Comment


    • #3
      Thanks for honesty.

      Still most of commercial packages offer "fast analysis" with default settings in case there is no quality problems and offering novices point to start.
      And I believe that in your projects there are settings that you use for more than 50% of your samples (and changing parameters that are more connected with computing power available and samples needed to be analysed) (excuse me for making these assumptions, but my experience tells me that it is the case with most methods in wet lab and different data statistical analysis).

      I don't do ChIPing, but here is point to start:

      http://mcardle.oncology.wisc.edu/sug...%20Dummies.pdf


      Excuse me for my ignorance.

      Comment


      • #4
        That ChIP protocol is essentially the old Farnham protocol. While it has worked well for a lot of experiments, there have been significant improvements made. I am also sure that different protocols work better for some epitopes then others. IgG use to be the 'control' for ChIP and now people use relative enrichment compared to a negative locus normalized to input. That being said, ChIP has been around for a long time so there are some pretty good kits out there.

        Peak calling for ChIP has matured pretty well and you could get away with a cookie cutter approach for most experiments. But expect that to change still. And if you want to get more creative with your analysis you're stuck.

        But that is just ChIP. The samples sequenced by next-gen sequencing are not just from a bunch of techniques but are also from vastly different fields of biology. And as already mentioned things are moving forward at the speed of light right now. Look at all the file formats. It's a big mess. Everybody wants something different.

        Companies like CLC are working on making nice user friendly programs but you are going to be a step behind the curve and be pretty inflexible with your analysis if you limit yourself to such a program.
        --------------
        Ethan

        Comment


        • #5
          There are some guides around here if you look. You do have a point, it would be good if these more informative threads lived in a special place where they didn't get lost in threads about an error someone got running some program.

          Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc


          Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc
          --------------
          Ethan

          Comment


          • #6
            Kudos to Simon Anders and, Welcome Kashliks!

            Simon Anders' is a very important point and needs to be thoroughly understood. While a sequence is a sequence, the role of bioinfomatics in Next Gen sequence analysis is inseparable from the rest of the experiment; it makes or breaks its credibility. Biologists will soon be choking on sequence data (if not already doing so) so dry lab based scientists must be allowed to contribute their innovations within this research endeavor, more so at the leading edges.

            An introductory wet lab scientist may apply some generalized HTS methodology to obtain a result, but ought not interpret novelty beyond the limitations of the methods employed. This cannot happen unless the system limitations are appreciated and this is where the bioinfo side is critical.

            It's really a new species of collaboration.

            Comment


            • #7
              Going back to the original post: The wiki, for all of its weaknesses, is the place to look:



              Contribute to it as you can. The wiki is sort of cookbook-like but, as Anders implied, doing analysis via cookbook is limited.

              Comment


              • #8
                Partek has a "Step 1., Step 2., Step 3..." kind of interface. I haven't really used it, just been to info sessions for it. I've only used the various command line/open source programs, but our genomics core has a copy and a number of labs use it. Besides cost, what would be the main drawbacks of using Partek?

                Comment


                • #9
                  I wish I could add Partek to my options. As biologist it sounds easy. But then again I tried CLC when I was getting going and didn't really like it, I'm sure it is better now. But I wouldn't want to spend the money and be limited to Partek or CLC. The more tools you have in your toolbox that you know how to use the easier it will be and the better you'll be able to answer the questions you want to ask. And having more tools will probably even help you ask better and more interesting questions.

                  In response to what Joann says, I think the bench scientist that doesn't learn how to analyze their next-gen sequencing data is destined to the back seat of scientific discovery. It's not 1999 anymore when bioinformaticians were often more like support staff whose authorship on papers was somewhere in the middle of the list. There is real innovation and discovery in the analysis of data going on today. It use to be more like bioINFORMATICIAN now it's more like BIOinformatician. There are real biologists that work with computers now.
                  --------------
                  Ethan

                  Comment


                  • #10
                    I truthfully thank everyone for input.
                    And especially for links.
                    And totally agree that bioinformaticians in data stream of nawadays are key players to biodiscovery.

                    Comment


                    • #11
                      Hi Kashliks,

                      If you want to try the trial license for CLC bio's Genomics Workbench, it includes access to a variety of tutorials that are laced with references to foundation publications. It would be a nice orientation for you.

                      Best of luck with your analysis and research.

                      Naomi

                      Comment


                      • #12
                        Hi Kashliks,
                        We were all beginners in the world of NGS at some point

                        My 2c: I'd encourage you to see what the goal of NGS is and why it would fit your project goals. Like Simon Anders and others noted earlier: technologies come-and-go, just like computer languages. If you learn what they all have in common, you can easily adapt and mold to newer techs.
                        Like Simon Anders noted above: it is increasingly complex and 'quick start' guides may not cover the groundwork to execute strong, solid research. Keep up with new manuscripts, tools, conferences and all this combined will prove powerful.
                        Best,

                        Comment

                        Latest Articles

                        Collapse

                        • seqadmin
                          Current Approaches to Protein Sequencing
                          by seqadmin


                          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                          04-04-2024, 04:25 PM
                        • seqadmin
                          Strategies for Sequencing Challenging Samples
                          by seqadmin


                          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                          03-22-2024, 06:39 AM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by seqadmin, 04-11-2024, 12:08 PM
                        0 responses
                        18 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-10-2024, 10:19 PM
                        0 responses
                        22 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-10-2024, 09:21 AM
                        0 responses
                        17 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-04-2024, 09:00 AM
                        0 responses
                        49 views
                        0 likes
                        Last Post seqadmin  
                        Working...
                        X