SEQanswers

Go Back   SEQanswers > General

Similar Threads
Thread Thread Starter Forum Replies Last Post
Computer scientist with experience in bioinformatics aurelielaugraud Academic/Non-Profit Jobs 1 05-30-2012 03:59 AM
C programming question arkal Bioinformatics 1 10-24-2011 10:48 PM
Is anybody developing some RNA-Seq softwares? fangquan RNA Sequencing 1 08-01-2011 09:57 AM
CLC Genomics WB and the AMOS Bambus (developing Pipeline...) gabriel.lichtenstein Bioinformatics 12 07-23-2010 10:33 PM

Reply
 
Thread Tools
Old 05-07-2012, 04:46 AM   #1
greenhilly
Member
 
Location: RTP, NC

Join Date: Jan 2012
Posts: 11
Default Developing programming experience for bioinformatics

I have an extensive molecular biology background but am relatively new to bioinformatics. Would like to extend my computational/programming skills to maximize utility in analyzing sequencing and other high-throughput data, as well as to improve my own marketability.

Many job postings refer to some combination of Perl/Python/C++/Java experience. Any suggestions regarding where to focus effort, particularly in a forward-looking manner?

Thanks for any suggestions.
greenhilly is offline   Reply With Quote
Old 05-07-2012, 04:50 AM   #2
Heisman
Senior Member
 
Location: St. Louis

Join Date: Dec 2010
Posts: 534
Default

I started teaching myself a year and a half ago (I'm a tech) and still consider myself a novice, so listen to others as well, but I have found that learning linux really well has been very beneficial. Getting a good understanding of how to write bash scripts as well as the basic linux commands (sed/tr/cut/sort/cat/paste/grep) in addition to learning a bit of awk has been tremendously useful for me.
Heisman is offline   Reply With Quote
Old 05-07-2012, 04:58 AM   #3
rnaseek
Member
 
Location: USA

Join Date: Nov 2011
Posts: 22
Default

I am biased and I would strongly encourage to start learning Python first and R as well. Lot of people find it easy to learn Python. Getting the hang of awesome unix commands would also be very useful.

Here are a few links to get started with R and Python
http://cmdlinetips.com/2011/09/free-...-books-online/
http://cmdlinetips.com/2011/11/free-...for-beginners/
rnaseek is offline   Reply With Quote
Old 05-07-2012, 06:27 AM   #4
Mark
Member
 
Location: Raleigh, NC

Join Date: Nov 2008
Posts: 42
Default

I like Perl for scripting. It's powerful, very widely used, and has lots of online resources. I also think it's easy to learn to a level that will quickly make you productive. Many people seem to think Python is easy to learn, but the O'Reilly (a publisher of typically great computer books) book "Learning Python" is ~3 times longer than "Learning Perl" and doesn't even cover regular expressions. That's like a driving class that doesn't cover steering. I would steer away from this book if you choose Python. And of course a strong command of unix/linux is highly recommended though I would choose Perl or Python over extensive shell scripting.
Mark is offline   Reply With Quote
Old 05-07-2012, 07:33 AM   #5
dpryan
Devon Ryan
 
Location: Bonn, Germany

Join Date: Jul 2011
Posts: 2,401
Default

My 2 cents:
  • The Unix CLI: This includes common commands such as awk, sed, and cut. I would also include shell scripting in this. This bullet point is required to put together any sort of basic analysis pipeline.
  • R: Inevitably, you end up needing to crunch number in R, so go ahead and get at least a passing familiarity with it. This may include various bioconductor packages, depending on what you're doing
  • Python or Perl: It doesn't really matter which one. You can do pretty much anything in these languages, though they have their limitations.
  • C/C++/Java: If you get to the point of writing more "heavy duty" programs that require any significant performance then you'll need one of these. You would generally learn one of these last.

It's probably best to learn things in that order, possibly swapping the order of R and Python/Perl.
dpryan is offline   Reply With Quote
Old 05-10-2012, 11:10 AM   #6
rskr
Senior Member
 
Location: Santa Fe, NM

Join Date: Oct 2010
Posts: 250
Default

In a forward looking manner I wouldn't bother with Perl/Python/Java they are mostly just fads and any location you might want to work is just as likely to use the one you don't know, for no other reason than the CEO liked the monty python jokes or coffee. These scripting languages are easy enough to pick up if you know how to program in C, and most cool molecular dynamics simulators are in C for obvious performance reasons. Unix command line utilities are very handy for getting things done, and PERL and Python both draw heavily on the conventions so if you encounter a script done in either of these you should be able to figure out what it does(knowing linux that is).
rskr is offline   Reply With Quote
Old 05-14-2012, 01:55 AM   #7
steven
Senior Member
 
Location: Southern France

Join Date: Aug 2009
Posts: 269
Default

Quote:
Originally Posted by dpryan View Post
My 2 cents:
  • The Unix CLI: This includes common commands such as awk, sed, and cut. I would also include shell scripting in this. This bullet point is required to put together any sort of basic analysis pipeline.
  • R: Inevitably, you end up needing to crunch number in R, so go ahead and get at least a passing familiarity with it. This may include various bioconductor packages, depending on what you're doing
  • Python or Perl: It doesn't really matter which one. You can do pretty much anything in these languages, though they have their limitations.
  • C/C++/Java: If you get to the point of writing more "heavy duty" programs that require any significant performance then you'll need one of these. You would generally learn one of these last.

It's probably best to learn things in that order, possibly swapping the order of R and Python/Perl.
+1
Totally agree - and swapping Python/Perl before R
steven is offline   Reply With Quote
Old 05-14-2012, 04:45 AM   #8
Mark
Member
 
Location: Raleigh, NC

Join Date: Nov 2008
Posts: 42
Default

To rskr's point, yes C is a good choice if you plan on doing a lot of fundemental algorithm development or are at a point in in your life where you are interested in programming from a mostly academic perspective and wouldn't be hampered by it's (or Java's) longer development times. If on the other hand you need to be quickly productive and are mostly interested in piecing together and interpreting NGS and other high-throughput data using the vast amount of open source analytic programs available, I believe (having worked in industrial bioinformatics for years) you would be much better off with Perl (my favorite) or Python, neither of which are fads.
Mark is offline   Reply With Quote
Old 05-14-2012, 05:05 AM   #9
ETHANol
Senior Member
 
Location: Greece

Join Date: Feb 2010
Posts: 309
Default

rskr does not understand that there are a lot of biologists that are more interested in biological questions then hard core computer science. He likes to try to diminish anyone and any work that uses an interpreted language. He has a lot of work ahead of him. We use the tools that allow us to answer our questions with the least work.

This is the path I would suggest:
1. You will not learn anything unless you are actively and currently using it for something. So come up with a project you will use this stuff.
2. Learn some Unix
3. Learn some Python and/or Perl (Python is structured more like R so it can help with the next step, but I know Perl better).
4. Learn some R.
5. Keep working on what interests you.

If after all this you decide you want to mostly give up molecular biology and become a hard core computer scientist, then you can move onto C++.

Great place to start:
http://korflab.ucdavis.edu/Unix_and_Perl/
__________________
--------------
Ethan
ETHANol is offline   Reply With Quote
Old 05-14-2012, 07:17 AM   #10
rskr
Senior Member
 
Location: Santa Fe, NM

Join Date: Oct 2010
Posts: 250
Default

Quote:
Originally Posted by ETHANol View Post
rskr does not understand
Sorry, I thought he was interested in "Forward looking" learning, not quick and dirty piece together a bunch algorithms someone else wrote, without much justification or understanding. I assure you, you don't need to learn much to do the latter, wait until the time comes.
rskr is offline   Reply With Quote
Old 05-14-2012, 07:32 AM   #11
ETHANol
Senior Member
 
Location: Greece

Join Date: Feb 2010
Posts: 309
Default

Here is some advice, anyone that says language xxxx is garbage and a waste of your time is blind (unless you are talking about some language of yesteryear).

Rskr, you ask, why piece together a bunch of algorithms that someone else wrote that are totally sufficient to answer the biological question you are addressing when you can make your own? Because it saves a lot of time, you'll publish your project sooner, which usually means in a higher impact journal, which means better career options.

We could blow our own pipets in the lab from glass, but that wouldn't make us better scientists.
__________________
--------------
Ethan
ETHANol is offline   Reply With Quote
Old 05-14-2012, 07:43 AM   #12
rskr
Senior Member
 
Location: Santa Fe, NM

Join Date: Oct 2010
Posts: 250
Default

Quote:
Originally Posted by ETHANol View Post
Here is some advice, anyone that says language xxxx is garbage and a waste of your time is blind (unless you are talking about some language of yesteryear).

Rskr, you ask, why piece together a bunch of algorithms that someone else wrote that are totally sufficient to answer the biological question you are addressing when you can make your own? Because it saves a lot of time, you'll publish your project sooner, which usually means in a higher impact journal, which means better career options.

We could blow our own pipets in the lab from glass, but that wouldn't make us better scientists.
Sometimes the pieced together programs are sufficient to answer the question, other times people don't understand what is in the program well enough to say one way or another(but that doesn't stop them). To that end even if one does not become very proficient in C it will give them an advantage over those who don't. Sometimes there is a missing piece or your perl program isn't fast enough to analyze modern high-throughput experiments, what do you do then? Its been my experience that these experience perl programmers have already figured out everything you can do with canned programs, so good luck publishing something that hasn't been done by taking output from program A and putting it into program B.
rskr is offline   Reply With Quote
Old 05-14-2012, 08:30 AM   #13
ETHANol
Senior Member
 
Location: Greece

Join Date: Feb 2010
Posts: 309
Default

Rskr, you are into pushing the state-of-the-art on the computing side. Some people are much more interested in the biology and find the computing side really boring. There is limited time in life and limited brain space (maybe not yours). So a lot of us learn what we have to to answer the biological question we are interested in.

I could go to a hot-spring and find a bug with a more efficient polymerase for PCR or I could just use one that is currently available to do interesting research. Some people have made their life's work the former, many more have focused on the latter. Some people learn C++ and come up with better algorithms some use the existing ones and write perl scripts to do interesting research. Why do you have a problem with that? How many Cell, Science and Nature papers have been published using reused Perl scripts and Bioconductor packages? Has that all been a waste of time?
__________________
--------------
Ethan
ETHANol is offline   Reply With Quote
Old 05-14-2012, 08:55 AM   #14
rskr
Senior Member
 
Location: Santa Fe, NM

Join Date: Oct 2010
Posts: 250
Default

Quote:
Originally Posted by ETHANol View Post
How many Cell, Science and Nature papers have been published using reused Perl scripts and Bioconductor packages? Has that all been a waste of time?
I don't have a problem with it. I just wouldn't consider the Perl scripts "forward thinking" compared to the program or programs that were used by the Perl scripts. It takes quite a bit of forethought to write a program that many people can use for many different purposes, and not much forethought to use these programs.
rskr is offline   Reply With Quote
Old 05-14-2012, 09:09 AM   #15
ETHANol
Senior Member
 
Location: Greece

Join Date: Feb 2010
Posts: 309
Default

That's it you think it is all about the program and never about the biology. Wake up and realize that a lot of people have other interests then you, which can be better served by learning one of these languages you despise so much. I am a wet lab scientist, I would be a total wast of my time to learn C++, while some Unix, Perl and R are extremely useful.
__________________
--------------
Ethan
ETHANol is offline   Reply With Quote
Old 05-14-2012, 09:21 AM   #16
rskr
Senior Member
 
Location: Santa Fe, NM

Join Date: Oct 2010
Posts: 250
Default

Quote:
Originally Posted by ETHANol View Post
That's it you think it is all about the program and never about the biology. Wake up and realize that a lot of people have other interests then you, which can be better served by learning one of these languages you despise so much. I am a wet lab scientist, I would be a total wast of my time to learn C++, while some Unix, Perl and R are extremely useful.
Well, go study something else then, and quit whining about the realities PERL's lack of merit.
rskr is offline   Reply With Quote
Old 05-14-2012, 09:28 AM   #17
ETHANol
Senior Member
 
Location: Greece

Join Date: Feb 2010
Posts: 309
Default

Why must you be such a troll.

Normally, I would ignore such blatant trolling but being a confused molecular biologist not so long ago I thought it was important for this guy to understand what we are dealing with here, i.e. trolling and not useful information.

rskr, I'm done you win. All interpreted languages are without merit and you cannot be a good scientist if you use them. People that program in C++ are superior.
__________________
--------------
Ethan
ETHANol is offline   Reply With Quote
Old 05-14-2012, 09:45 AM   #18
rskr
Senior Member
 
Location: Santa Fe, NM

Join Date: Oct 2010
Posts: 250
Default

Quote:
Originally Posted by ETHANol View Post
Why must you be such a troll.

Normally, I would ignore such blatant trolling but being a confused molecular biologist not so long ago I thought it was important for this guy to understand what we are dealing with here, i.e. trolling and not useful information.

rskr, I'm done you win. All interpreted languages are without merit and you cannot be a good scientist if you use them. People that program in C++ are superior.
I gave my "opinion", which has been attacked by people who can't tell an integer from a char, and obviously have some stake in having people believe certain ways about certain languages, these languages which they do not even know. For example that "C has a very long development time"(it probably does if you don't know it in the first place. Or that "C++ programmers aren't interested in biology". etc. Or that "C++ is poor career choice." Or "I believe the interpreted languages are without merit", I said lack of merit, not without merit. I use interpreted languages, however there are many a program which I wish were written in C++ instead, a prime example are the AMOS tools that didn't scale to high-throughput sequencing. If the programs had been written in C++ they would have been better.

So I would like an apology if you have the time, since many of the things you said were not true and offensive to me.
rskr is offline   Reply With Quote
Old 05-14-2012, 10:37 AM   #19
Joann
Senior Member
 
Location: Woodbridge CT

Join Date: Oct 2008
Posts: 218
Default Me too

Would like to weigh in on the side of rskr. As a biologist employed by hard core computer scientists, they do their thing and I do mine to complete our deliverables. True, we mostly find each other's science mostly boring and I admit I am not even motivated to learn Linnux or shell scripting or much else that would contribute to making me a practicing bioinformaticist. I strongly believe biology totally needs core computational scientists as allies to provide us their perspectives and facilitations to add to and progress our own field. Sometimes the realizations I arrive at here about biological scientists are shameful= we do not yet have a global, uniform and systematic way to name new genes and to mention practicing the use of a gene ID number over some letter symbol is talking to the wall, for example.
Joann is offline   Reply With Quote
Old 05-14-2012, 11:29 AM   #20
westerman
Rick Westerman
 
Location: Purdue University, Indiana, USA

Join Date: Jun 2008
Posts: 934
Default

Quote:
... I said lack of merit, not without merit...
A rather pedantic difference, if you ask me. But maybe we shouldn't expect anything else from a person who knows the difference between an 'int' and a 'char'. :-)

Given the number of people on this forum for whom American English is not their first language I think we should allow a bit of leeway for the subtle differences in stating their opinions.

BTW: I agree with "dpryan". Learn the shell. Learn Perl/Python (or maybe Ruby). Learn R. Learn C. As for the differences between the 4 -- R is the most different in syntax. The other 3 are similar enough to be easy to pick up once you know one of them (although all are hard to master.)
westerman is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 04:45 AM.


Powered by vBulletin® Version 3.8.6
Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.