Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Digging deeper into bioinformatics

    Hey there, SEQanswers people! I have just started my PhD after a year of "project work", and it seems that I will be doing a lot of bioinformatics - especially RNA-seq. Having done a search of the SEQanswers forum and finding some interesting things to read (this, among other things), I still feel that I have some questions that I'd like some opinions on, so here goes...

    My background is a Master's in biotechnology with a fair share of math, basic programming (Matlab) and some slightly more advanced programming from a bioinformatics course (Python). Over the least year I've continued working with the Master thesis project, which was absolute protein quantification with mass spectrometry, and I made a fair amount of small and medium-sized Python scripts for handling and analysing the resulting data. I've been working in OSX (as that was the computer I was given), and feel more or less comfortable in the shell. Having showed an aptitude in programming, I had the opportunity to start a PhD with RNA-seq as the main focus, and still keeping the MS-work as a little side-project.

    Now, I've been at it for about two months; trying to read up on RNA-seq, alignment tools (Tophat, ...), differential expression (DESeq, ...), splice variant analysis, and quite a bit of everything, really. I've come to a place where I feel that I can admit that I know very little, and that I have an opportunity to really learn things the right way, from the beginning. My PI knows nothing about coding and related stuff, so I was appointed a bioinformatics post-doc at a nearby department as a "trainer" (or whatever you want to call it). He's working, it seems, almost exclusively with RNA-seq, and we get along just fine, and he's very good at answering my questions and helping how he can. So far, all my DE-analyses have been done "manually" (i.e. without a script pipeline of some sort) with the various modules (Tophat -> featureCounts -> DESeq) in the OSX terminal, and I've just gotten to the part where I feel I should script my own pipeline.

    Having said all this (thanks if you read it!), I still feel that I'm lacking A LOT of knowledge. Not strange, since I'm new to the field, but that's why I'm here! So, some questions:

    1) Other than my work in RNA-seq, is there something(s) I should be learning in terms of bioinformatics or the field in general? I feel that more statistics would be a good bet, but are there other things?

    2) Me and my PI are currently looking for courses in either statistics, bioinformatics or both, but it seems the courses for the fall haven't been posted yet in most places. Do you have any good courses to recommend?

    3) Good programming practices... What are they, where can I read about them? I'm currently scripting in the PyCharm IDE, and I like it quite a bit, with the Anaconda Python distribution. I've read some stuff about naming variables and the like, but what about scripting in general; for example, if I make a pipeline as above, should I just do a script that I execute in PyCharm (like I've done so far), or do I make the scripts somehow executable and run them from the shell?

    4) I still read a fair amount that bioinformaticians work in Linux - I have Linux Mint on my home laptop (which barely sees any use at all, these days), so I have some minor experience with it (and the way it works, from OSX), but is it something you really need? Could I continue working in OSX, or should I get a dual-boot install of some Linux distribution (if so, which one)?

    5) I see a lot of people blogging about their bioinformatics work, and uploading their code. I read somewhere that this is a good way to get your code "out there", and for practice; is this something I should start doing?

    6) Knowing more or less self-taught Python, should I learn R? To what extent? What about SQL?

    7) Other than SEQanswers, are there other communities out there for bioinformatics, maybe even ones suited to a beginner as myself?

    Thanks a lot for reading, any help, tips or opinions are greatly appreciated!
    Erik

  • #2
    I will just chime in on a few...

    1) Statistics is always a good bet.

    2) Maybe this?

    6) Learn R! Bioconductor is almost a necessity if you're doing RNA-seq... See if there's a local R user group where you are, they may be helpful.

    7) https://www.biostars.org/

    Comment


    • #3
      Here are my answers to some of your questions:

      1. If you intend to have a future in bioinformatics, you'll probably want to familiarize yourself with some of the other types of experiments. There are lots of different kinds, e.g. variant discovery (especially in exome), ChIP-seq, DNA methylation, ATAC-seq. Many of these use statistics techniques that you'll use when doing differential expression analysis of RNA-seq data.

      2. There are lots of options when it comes to free online courses (MOOC). My favorite MOOC sites are coursera and edx. There's a biostatistics bootcamp course on coursera that seems to be pretty in depth. If I recall correctly, there are several other options for stats courses on those sites as well. You might be ok just by picking up what you need to know when you get there.

      4. This might make me sound ignorant, but isn't OSX a unix based system? By that I mean, if you use the command line in OSX, won't it be almost identical to a command line on pretty much any Linux system? That's how I've always assumed that it is, but to be honest, I'm not entirely sure. My work computer is a Windows machine, but I ssh to a linux machine to do almost all of my non-R work. You could also look into virtual box--I use that too. Just about every tool that I end up wanting to use has Linux and OSX install options (Windows is almost never supported).

      6. I use R all the time. It's pretty easy to pick up on, and it's awesome for graphics if you install ggplot2. You'll probably do great by using the command line version of R, but I recommend R Studio. I assume that you already know some R, because isn't DESeq part of bioconductor? R is widely used and supported, and there's a package for just about anything. I personally don't use SQL from day to day, but I'm sure some of the users here do. I would at least recommend learning some basic queries.

      7. Google is my favorite place to search for answers. The top hit usually sends me to seqanswers, or biostars.org. Also, I end up using unix.com for help with sed, awk, etc commands. If you don't know sed and awk or any commands like them, I strongly advise that you learn their basic features. I use them just about every day, especially awk.

      Comment


      • #4
        Oh, and for pure programming questions, stackoverflow is good. Search with [r] for R tagged questions, for example.

        Comment


        • #5
          BTW, if you are using OS X (ver. 10.5 or up) you already are using a fully POSIX compliant Linux. Anything you can do in any other flavor of Linux, you can do in OS X. If you do not know command line Linux already though, then OS X terminal will be just as foreign to you as any other terminal window.

          OS X's file system layout may seem a bit odd to old school UNIX/Linux users, but its not that radical, and every flavor of Linux I've ever used has something at least a bit different in the file system structure anyway.

          Along with some scripting (actually, simple shell scripting is worthwhile to learn as well), I'd suggest some database design and admin course. I came late to that party, but am finding that knowing some basics of building and using your own relational databases is very handy especially if you find yourself working on multiple projects and need to track data, and intermediate results over weeks and months.
          Last edited by mbblack; 08-08-2014, 11:28 AM.
          Michael Black, Ph.D.
          ScitoVation LLC. RTP, N.C.

          Comment


          • #6
            Err, OS X is Unix, not Linux (it's BSD based, in fact). Having said that, as long as your hardware suffices OS X is a great OS to use.

            Comment


            • #7
              Thanks for all the answers! I will definitely look up edX and Coursera - there seems to be a fair number of online courses to take there. Has anybody gone to an on-site course somewhere around the world that can recommend? Online courses are great and all that, but you don't get the interactiveness of having an actual teacher.

              Ok, so I think I'll stick to OSX for the moment then. I do know the Terminal fairly well now, after a couple of months of fiddling around with it.

              While DESeq does indeed come in R, I wouldn't really say that you need to know R to use it, at least as I've been using it so far. It's just writing the commands with the different parameters you want to run, very similar to just the terminal, and it just works. Actually coding in R is, I would imagine, quite different.

              Did you have any database design and admin courses in mind, mbblack, or was it just general advice? It would be nice to know a bit more about the Unix system in general, actually!

              When it comes to the "career in bioinformatics" part, I have been thinking about that a bit. Obvously I'm still in the very, very beginning of my PhD, so I have time, but I'm wondering what kind of bioinformatician I could become. As far as I understand it, most bioinformaticians are either from a biology/biotech background and learn programming as they go along, or they're a computer major that learns a bit of biology.

              Is either part valid? Is either better or worse as a career, and what kind of jobs can one expect to have? I mean, I don't really see myself learning to code in C++ (for example) and actually making the various types of packages from scratch, but rather analysing the data from a biological perspective with a little more programming knowledge than others. Most of the other PhDs in my department knows the basics of R but not that much about programming in general.

              Comment


              • #8
                Don't be too concerned about not knowing much -- almost every project I do requires me to learn something new. The most important thing is being able to learn new things quickly.

                If you haven't already, give Rosalind a go. You're presented with bioinformatics problems covering a range of different areas, and can solve problems using any programming language that you want (including your own brain + web searches, if that works). It's great for finding out about the breadth of bioinformatics problems, and discovering how many different ways there are to approach the same problem (by looking at other people's solutions once you've solved a problem). It can also be useful for giving you a starting pool of code that you can adapt to new problems -- I've had a few real bioinformatics projects where my solution was adapted from a toy problem presented in Rosalind.
                Last edited by gringer; 08-11-2014, 05:28 PM.

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Current Approaches to Protein Sequencing
                  by seqadmin


                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                  04-04-2024, 04:25 PM
                • seqadmin
                  Strategies for Sequencing Challenging Samples
                  by seqadmin


                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                  03-22-2024, 06:39 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 04-11-2024, 12:08 PM
                0 responses
                27 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 10:19 PM
                0 responses
                30 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 09:21 AM
                0 responses
                26 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-04-2024, 09:00 AM
                0 responses
                52 views
                0 likes
                Last Post seqadmin  
                Working...
                X