SEQanswers

Go Back   SEQanswers > General



Similar Threads
Thread Thread Starter Forum Replies Last Post
Computer scientist with experience in bioinformatics aurelielaugraud Academic/Non-Profit Jobs 1 05-30-2012 03:59 AM
C programming question arkal Bioinformatics 1 10-24-2011 10:48 PM
Is anybody developing some RNA-Seq softwares? fangquan RNA Sequencing 1 08-01-2011 09:57 AM
CLC Genomics WB and the AMOS Bambus (developing Pipeline...) gabriel.lichtenstein Bioinformatics 12 07-23-2010 10:33 PM

Reply
 
Thread Tools
Old 05-14-2012, 11:59 AM   #21
HESmith
Senior Member
 
Location: Bethesda MD

Join Date: Oct 2009
Posts: 503
Default

"...the difference between an 'int' and a 'char'."

An ent (sic int) is a tree-like giant of Middle Earth; a char is a tasty cold-water fish.

(Sorry, I'm a little punchy from lack of sleep)
HESmith is offline   Reply With Quote
Old 05-14-2012, 12:43 PM   #22
SeekAnswers
Member
 
Location: USA

Join Date: Mar 2012
Posts: 21
Default

I normally think you'd need a good command over unix shell, your choice of scripting language (Perl/Pythin/Ruby) and one object oriented programming language. A decent understanding of SQL queries might be pretty helpful as well depending on the kind of set up you work in.

However for a biologist, writing bioinformatics software in C will be a very steep learning curve, mainly due to understanding memory management, not many computer science majors have a good command over it, so java is a much friendlier programming language which is well used around bioinformatics software.
SeekAnswers is offline   Reply With Quote
Old 05-14-2012, 02:27 PM   #23
krobison
Senior Member
 
Location: Boston area

Join Date: Nov 2007
Posts: 747
Default

Quote:
Originally Posted by rskr View Post
In a forward looking manner I wouldn't bother with Perl/Python/Java they are mostly just fads and any location you might want to work is just as likely to use the one you don't know, for no other reason than the CEO liked the monty python jokes or coffee. These scripting languages are easy enough to pick up if you know how to program in C, and most cool molecular dynamics simulators are in C for obvious performance reasons. Unix command line utilities are very handy for getting things done, and PERL and Python both draw heavily on the conventions so if you encounter a script done in either of these you should be able to figure out what it does(knowing linux that is).
The distinction between "scripting languages" and "real languages" is a silly one, propagated by snobs. Suggesting that Perl is a fad when it has contributed solidly to science for over 20 years is more than a bit silly. Java is core to using a number of modern high performance frameworks such as Hadoop. The Broad's GATK is entirely in Java.

I am a biologist first & program mostly in Perl, because it fits my brain well. So did C#, which I suspect you would also denigrate -- and I wrote some very sophisticated dynamic programming algorithms (if I do say so myself) in C#.

For most biologists, the extra bookkeeping required by C/C# isn't worth the execution speed advantage. Many other languages offer higher levels of abstraction that are a better fit to their line of thinking.

Ultimately, if you have the time it is worth exploring multiple languages, as many people find that there are a subset that fit their brain well. A rare few individuals are excellent at most. For me, Perl & C# have been the best fits, with Scala probably just missing out.

It's also worth contemplating the huge fraction of security holes in the world that are due to buffer overflow, an easy error to make in C/C++ and a challenging one to make in languages which supply memory management. It's also useful to think of all the poor user interfaces in the world, such as entry boxes for social security numbers or credit cards which do not accept human-friendly punctuation or spacing, that are there because it was hard to do in C or a similar language, and so trivial to do in Perl that almost nobody could be too lazy to do them.

Biologists & hard core computer scientists need to forge links, but I've always found it was the polylingual & inclusive computer whizzes who were a joy to work with; language snobs are likely to have other motes in their eyes which will interfere with collaborations.
krobison is offline   Reply With Quote
Old 05-30-2012, 10:09 AM   #24
Artem
Junior Member
 
Location: Vancouver, BC

Join Date: May 2012
Posts: 6
Default

I'm actually in the same position as Greenhilly, I have started seriously programming about a month ago with background knowledge in Python and bash. My work has been in bash and R though. I use R for calculations and I use bash for data formatting and pipe lining. Eventually I do hope to learn some C for writing functions but I see that as a while away.

Where does Perl/Python fit into the mix?
Artem is offline   Reply With Quote
Old 05-30-2012, 10:21 AM   #25
westerman
Rick Westerman
 
Location: Purdue University, Indiana, USA

Join Date: Jun 2008
Posts: 1,104
Default

Quote:
Originally Posted by Artem View Post
Where does Perl/Python fit into the mix?
They are a more powerful glue than Bash while being an easier language than C.

A person can write multi-hundred line Bash routines but at some point the scripts become hard to maintain and expand at which point you should use Perl/Python unless you wish to go into the complexities of C/C++.

BTW: My longest bash script is 430 lines and is used to set up ABySS runs in various combinations of paired-end and single-end runs. My Perl scripts can run many times that length.

As I have said before, I consider 'R' to be a different path than bash/perl/python/C. Those languages are similar enough to have a common way of thinking. 'R' is all about statistical computing.

Last edited by westerman; 05-30-2012 at 10:23 AM. Reason: Added a comment about 'R'.
westerman is offline   Reply With Quote
Old 06-03-2012, 12:35 PM   #26
rskr
Senior Member
 
Location: Santa Fe, NM

Join Date: Oct 2010
Posts: 250
Default

Quote:
Originally Posted by krobison View Post
It's also worth contemplating the huge fraction of security holes in the world that are due to buffer overflow, an easy error to make in C/C++ and a challenging one to make in languages which supply memory management.
In terms of security holes there is a reason no one uses PERL for web development, even though that is what it was originally designed for. Oops my input field has an @ or a $ in it.
Quote:
Originally Posted by krobison View Post
It's also useful to think of all the poor user interfaces in the world, such as entry boxes for social security numbers or credit cards which do not accept human-friendly punctuation or spacing, that are there because it was hard to do in C or a similar language, and so trivial to do in Perl that almost nobody could be too lazy to do them.
A) It is funny that people will spend tens of years programming languages that take five minutes to learn yet spend hours a day waiting for the programs to run.

B) Don't write thousands of lines of code in bash or perl they aren't designed for it. They are weakly typed and don't take advantage of compiler checking, not to mention the languages don't facilitate porting to many platforms.
rskr is offline   Reply With Quote
Old 06-07-2012, 05:55 PM   #27
samanta
Senior Member
 
Location: Seattle

Join Date: Feb 2010
Posts: 109
Default

Quote:
Originally Posted by greenhilly View Post
I have an extensive molecular biology background but am relatively new to bioinformatics. Would like to extend my computational/programming skills to maximize utility in analyzing sequencing and other high-throughput data, as well as to improve my own marketability.

Many job postings refer to some combination of Perl/Python/C++/Java experience. Any suggestions regarding where to focus effort, particularly in a forward-looking manner?

Thanks for any suggestions.
Please note that bioinformatics can be done at various levels. Here is my modest attempt to answer your question.

http://www.homolog.us/blogs/2011/07/...matics-part-i/

http://www.homolog.us/blogs/2011/07/...atics-part-ii/

Searching at a website for folding of a set of miRNA sequences is bioinformatics. Writing server side code for the program that does that folding is also bioinformatics. Analyzing hundreds of expression numbers in excel or R is bioinformatics as well. Those three tasks take three different skills.
__________________
http://homolog.us

Last edited by samanta; 06-11-2012 at 12:54 PM.
samanta is offline   Reply With Quote
Old 06-11-2012, 08:35 AM   #28
Joann
Senior Member
 
Location: Woodbridge CT

Join Date: Oct 2008
Posts: 231
Default

Hi Samanta,
Two very good links, thanks for the posts.
Joann is offline   Reply With Quote
Old 06-11-2012, 11:09 AM   #29
chadn737
Senior Member
 
Location: US

Join Date: Jan 2009
Posts: 392
Default

Quote:
Originally Posted by rskr View Post
A) It is funny that people will spend tens of years programming languages that take five minutes to learn yet spend hours a day waiting for the programs to run.

B) Don't write thousands of lines of code in bash or perl they aren't designed for it. They are weakly typed and don't take advantage of compiler checking, not to mention the languages don't facilitate porting to many platforms.
I don't think the majority of people really care. I know I don't. I'm first and foremost a biologist. Sequencing is just a tool. Bioinformatics is just a tool. The real scientific question is the biology, not which is the best programming language. 10 years from now C and most of the bioinformatics will be outdated and lie unused, sequencing will be completely different, but the biology will remain. I think most of the hard core computer scientists here get that and certainly the biologists do. For most of us it is a waste of time writing new programs or rewriting old ones in a different language. It is far far smarter spending an extra hour of my time reusing a slightly slower program written by someone else in perl or python or java and getting my answer that week than spending a year trying to develop something completely new and then getting scooped by the guy who focused on the biology.

I've collaborated with enough computer scientists to know that it typically goes one of two ways:

1) They reuse tools already out there, which would be no different than what I could do on my own.

or

2) They want to develop something completely new and then I don't get my answer for 6 months, when I could have had it within the week and begun doing the follow up experiments.

So I have come to the conclusion that if I am going to collaborate to have that nice new program written in C, I'd rather do my own work and get that published and let the Computer Scientist develop a program around already published data. Because if I get scooped waiting around that long, I'm the one whose screwed.

Last edited by chadn737; 06-11-2012 at 02:08 PM.
chadn737 is offline   Reply With Quote
Old 06-11-2012, 12:19 PM   #30
samanta
Senior Member
 
Location: Seattle

Join Date: Feb 2010
Posts: 109
Default

Quote:
Originally Posted by Joann View Post
Hi Samanta,
Two very good links, thanks for the posts.

You are welcome !!
__________________
http://homolog.us
samanta is offline   Reply With Quote
Old 06-11-2012, 12:53 PM   #31
samanta
Senior Member
 
Location: Seattle

Join Date: Feb 2010
Posts: 109
Default

Quote:
Originally Posted by chadn737 View Post
I don't think the majority of people really care. I know I don't. I'm first and foremost a biologist. Sequencing is just a tool. Bioinformatics is just a tool. The real scientific question is the biology, not which is the best programming language. 10 years from now C++ and most of the bioinformatics will be outdated and lie unused, sequencing will be completely different, but the biology will remain. I think most of the hard core computer scientists here get that and certainly the biologists do. For most of us it is a waste of time writing new programs or rewriting old ones in a different language. It is far far smarter spending an extra hour of my time reusing a slightly slower program written by someone else in perl or python or java and getting my answer that week than spending a year trying to develop something completely new and then getting scooped by the guy who focused on the biology.

I've collaborated with enough computer scientists to know that it typically goes one of two ways:

1) They reuse tools already out there, which would be no different than what I could do on my own.

or

2) They want to develop something completely new and then I don't get my answer for 6 months, when I could have had it within the week and begun doing the follow up experiments.

So I have come to the conclusion that if I am going to collaborate to have that nice new program written in C++, I'd rather do my own work and get that published and let the Computer Scientist develop a program around already published data. Because if I get scooped waiting around that long, I'm the one whose screwed.

Geez !! What a warped view of the world.

Computer science has two components - (i) algorithm development and (ii) coding the algorithm into some programming language. A new algorithm is a mathematical discovery that sometimes takes decades to develop, but once it is in place, if can revolutionize all aspects of science and non-science, including your beloved sequence analysis. Here is the development history of one chain of algorithms -

http://www.homolog.us/blogs/2011/10/...-and-fm-index/

You may notice that when Myers and Manbar were working on the concept of suffix arrays, they had no clue about how the future of sequencing technology would develop, yet two important lines of programs for short read analysis (Bowtie/BWA and String graph assemblers) rely on mathematical constructs developed by them.

Just like you have a reward structure regarding quick publication of your sequence-related paper, computer scientists have a different reward structure related to development of new algorithms. Historically it has been found that their reward structure contributes more to biology than another incremental biology paper. So biologists themselves (those more knowledgeable than you) encourage discoveries of new algorithms.
__________________
http://homolog.us
samanta is offline   Reply With Quote
Old 06-11-2012, 02:03 PM   #32
chadn737
Senior Member
 
Location: US

Join Date: Jan 2009
Posts: 392
Default

Quote:
Originally Posted by samanta View Post
Geez !! What a warped view of the world.

Computer science has two components - (i) algorithm development and (ii) coding the algorithm into some programming language. A new algorithm is a mathematical discovery that sometimes takes decades to develop, but once it is in place, if can revolutionize all aspects of science and non-science, including your beloved sequence analysis. Here is the development history of one chain of algorithms -

http://www.homolog.us/blogs/2011/10/...-and-fm-index/

You may notice that when Myers and Manbar were working on the concept of suffix arrays, they had no clue about how the future of sequencing technology would develop, yet two important lines of programs for short read analysis (Bowtie/BWA and String graph assemblers) rely on mathematical constructs developed by them.

Just like you have a reward structure regarding quick publication of your sequence-related paper, computer scientists have a different reward structure related to development of new algorithms. Historically it has been found that their reward structure contributes more to biology than another incremental biology paper. So biologists themselves (those more knowledgeable than you) encourage discoveries of new algorithms.
Please don't read into it something I did not say. I am not criticizing the work of computer scientists. I specifically criticize the attitude of rskr that is condescending and dismissive of any biologist or computer scientist that doesn't create a new program from scratch in his language of choice.


While the continued development of new algorithms and programs are of great use to Biologists, their development is the primary concern of the specialists, not the biologist. Even for the computer scientists, it makes no sense to develop new programs from scratch for everything. It also make no sense for the biologist to spend the vast amount of time necessary to learn C for simple and mundane applications if they already know Perl and can implement it in Perl in a shorter amount of time, even if it takes an hour longer to run. I can spend those extra few hours of runtime doing wet lab experiments. And since I don't know C, I can actually get more done using Perl then all the time it would take me learning a new language and going through the hassle of implementing it.

And frankly, why am I going to send my data to someone else to analyze if they are simply going to use the exact same tools that already exist and which I already know how to implement? Why should I wait 6 months or a year for my results while they create a completely new program when I can get the results in a week reusing tried and true programs?

If the computer scientist wants to create a new algorithm, then they are doing their job and that is sufficient for a paper in itself. Besides it is better for the computer scientist because then he gets the credit rather than having to be a co-author on a paper where the program takes second place to the data.

They have their own careers to look after, I have mine. I understand that bioinformatics takes time to develop and I applaud those who develop it. But I am not seeking a career in the development of bioinformatic tools, nor do most biologists. We just use them and then its on to the next step. If there is no preexisting tool, then I'll take the time to work with the computer scientist and wait for them to develop one and then use it to get to the question in hand. But otherwise, I see no reason for the biologist not to take advantage of pre-existing tools and it is absurd to be dismissive of them just because they use a pre-existing tool or a language they are more comfortable with.

I'm not dismissing anyone here or their relative contributions. I'm just being frank about the fact that there are more practical concerns.

Last edited by chadn737; 06-11-2012 at 02:58 PM.
chadn737 is offline   Reply With Quote
Old 06-19-2012, 12:58 PM   #33
samanta
Senior Member
 
Location: Seattle

Join Date: Feb 2010
Posts: 109
Default

Quote:
Originally Posted by chadn737 View Post
Please don't read into it something I did not say. I am not criticizing the work of computer scientists. I specifically criticize the attitude of rskr that is condescending and dismissive of any biologist or computer scientist that doesn't create a new program from scratch in his language of choice.

......
I'm not dismissing anyone here or their relative contributions. I'm just being frank about the fact that there are more practical concerns.
Sorry, I misunderstood your original comment. You make all valid points. You need to look after your self-interest (which is to get the best biological insights from your data) irrespective of which computer programs were used to get there. So, if you are competent enough to code/install and run computer programs, I see no reason for wasting time with a computer scientist trying to be in your shoes. The computer scientists, on the other hand, look after their self-interests of finding best algorithm.
__________________
http://homolog.us
samanta is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 06:57 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO