SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
SOAP2 -v doesn't work baohua100 Bioinformatics 6 06-26-2012 08:12 PM
Macrogen - EZ-Seq video guideline is now available in 8 different languages! Quinn Kim Vendor Forum 0 10-20-2011 10:03 PM
Does VarScan have a preferred choromosome order? Delphine Song Bioinformatics 2 09-28-2011 08:46 PM
Getting Bowtie to work jamminbeh Genomic Resequencing 9 02-23-2010 03:32 AM

Reply
 
Thread Tools
Old 04-22-2010, 09:53 AM   #21
JohnK
Senior Member
 
Location: Los Angeles, China.

Join Date: Feb 2010
Posts: 106
Default

Quote:
Originally Posted by Calico View Post
Hello everybody,

My rather limited bioinfromatics skills come from having done some microarray data analysis in R (following a template code) and some minor coursework. So, I consider myself quite a newbie to the subject. I will quite soon be shaking hands with some sequencing data (from a Helicos machine) and need to prepare myself for this.

Being of a younger generation, I would say I can handle computers pretty well. So far I have, as recommended in this nice thread, started to take a look at the Unix and Perl for Biologist tutorial and installed Ubuntu in Virtual PC on my Windows computer.

What I'd like to ask you, SEQanswers community, is whether you can suggest me anything helpful. Am I starting out in the right way? I will get some bioinformatics help along the way, though I am unsure to what extent. Also, I see this as a part of my future career, so I am not just doing this for one particular project.

Edit: I have just realized that the Helicos software package uses Python.
I wrote a longer response, but for some reason it never posted or maybe it posted to the wrong forum. Java and c++ are object oriented, which means everything you do will be within the confines of classes, objects, encapsulation and abstraction. For simple programs, you might want to leave simple flat/text file manipulation to a scripting language; ie leave the scripting tasks to scripting languages like perl, shell, and oop oriented tasks to the oop oriented programming languages. I think for bioinformatics purposes, a scripting language like perl, python, etc will suffice. Perl seems to be by far the most popular and there are large libraries like CPAN and BioPerl to support your coding. Perl is also very fast, but not as fast as shell programming. Learning shell programming in a Unix/Linux environment is a priceless skill and will save you much time. All the statisticians I've met have and are using 'R'. I myself haven't learned R, but it's definitely a future endeavor. Hope that helps.
JohnK is offline   Reply With Quote
Old 04-22-2010, 10:07 AM   #22
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

Quote:
Originally Posted by JohnK View Post
I wrote a longer response, but for some reason it never posted or maybe it posted to the wrong forum. Java and c++ are object oriented, which means everything you do will be within the confines of classes, objects, encapsulation and abstraction. For simple programs, you might want to leave simple flat/text file manipulation to a scripting language; ie leave the scripting tasks to scripting languages like perl, shell, and oop oriented tasks to the oop oriented programming languages. I think for bioinformatics purposes, a scripting language like perl, python, etc will suffice. Perl seems to be by far the most popular and there are large libraries like CPAN and BioPerl to support your coding. Perl is also very fast, but not as fast as shell programming. Learning shell programming in a Unix/Linux environment is a priceless skill and will save you much time. All the statisticians I've met have and are using 'R'. I myself haven't learned R, but it's definitely a future endeavor. Hope that helps.
Good catch, the system thought you were spamming, but that's not the case.
nilshomer is offline   Reply With Quote
Old 04-26-2010, 02:57 PM   #23
Calico
Member
 
Location: Houston

Join Date: Jan 2010
Posts: 13
Default

By the way, is the R environment not used in the HTS community anymore?
Calico is offline   Reply With Quote
Old 04-26-2010, 05:52 PM   #24
sameet
Member
 
Location: Earth

Join Date: Apr 2010
Posts: 34
Default

Quote:
Originally Posted by JohnK View Post
Aw man! I love perl, but scripting languages are all the same to me. Just remember to hit '#' constantly.
One more vote for Python. I am a biologist by training, and got into this business accidentally, and found that I liked it. I taught myself PERL, but once I came to know about Python and started using it there has been no looking back. All the points that @quiniana makes are true. I believe it is the best language to start programming in.
sameet is offline   Reply With Quote
Old 04-27-2010, 01:38 AM   #25
damiankao
Member
 
Location: UK

Join Date: Jan 2010
Posts: 49
Default

I got to go with Python too. Perl has great available libraries, but Python is just so much more agile than Perl. Just having the dot notation format and slice syntax makes things so much easier.
damiankao is offline   Reply With Quote
Old 05-02-2010, 06:50 AM   #26
flxlex
Moderator
 
Location: Oslo, Norway

Join Date: Nov 2008
Posts: 415
Default

I've come to rely more and more on awk. Being a self-trained perl programmer I find it fascinating to see how much I can do with (nearly) oneliners in awk instead of writing multiple-line perl scripts. Many of my input/output files are tab separated, which is ideal for awk.

Next on my list is learning python...
flxlex is offline   Reply With Quote
Old 05-02-2010, 11:05 AM   #27
JohnK
Senior Member
 
Location: Los Angeles, China.

Join Date: Feb 2010
Posts: 106
Default

Quote:
Originally Posted by damiankao View Post
I got to go with Python too. Perl has great available libraries, but Python is just so much more agile than Perl. Just having the dot notation format and slice syntax makes things so much easier.
I haven't used Python, but I may give it a whirl someday. Perl also supports slicing and the '->' oop/method invocation aspect of perl is supposed to be similar to and mimic c++ '->' pointer notation; my memory might have left me on this stuff though, so no finger pointing . Python might be better at these things than perl though. It's whatever floats your boat really. I believe some call perl a "c-like syntax" language.

As for CML, awk is awesome, but can lack some syntax sugar, in my opinion. A lot of the stuff I do on the CML in perl is just a manipulation along the lines of:

< <in_file> perl -e 'while(<>){ #splitting, pushing onto an array, printing out, and then piping to some text filters, etc...# }' | a filter > <out file> &

I wouldn't use perl to do things that filters and sed/awk can do for you, but knowing unix/linux filter and shell commands is a priceless skill that saves incredible amounts of time. I think I can say that without someone yelling at me.
JohnK is offline   Reply With Quote
Old 05-03-2010, 09:06 AM   #28
sameet
Member
 
Location: Earth

Join Date: Apr 2010
Posts: 34
Default

Quote:
Originally Posted by flxlex View Post
I've come to rely more and more on awk. Being a self-trained perl programmer I find it fascinating to see how much I can do with (nearly) oneliners in awk instead of writing multiple-line perl scripts. Many of my input/output files are tab separated, which is ideal for awk.

Next on my list is learning python...
Actually if you are using Linux (linux-like environment) for your work, then all the well-known linux tools like sed, awk, bash and their combination with python actually makes life very easy.
__________________
Sameet Mehta (Ph.D.),
Visiting Fellow,
National Cancer Insitute,
Bethesda,
US.
sameet is offline   Reply With Quote
Old 05-03-2010, 09:46 AM   #29
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 991
Default

Quote:
Originally Posted by ymc View Post
Does Python has a free bioinformatics library like BioPerl? I find that BioPython is quite lacking for now. I am wondering if there are better alternatives.
If you are specifically after high-throughput sequencing: I'm currently working on a framework for that, called HTSeq, and I've come reasonably for by now.

You can already do a lot with it: see this thread, in which I've advertised it, and, of course, the HTSeq web page.

Simon
Simon Anders is offline   Reply With Quote
Old 05-03-2010, 10:18 AM   #30
chaz81
Junior Member
 
Location: Seattle

Join Date: Mar 2010
Posts: 3
Default

Quote:
Originally Posted by quinlana View Post
I learned the basics from python.org and used O'Reilly's "Python Cookbook" to get a feel for the subtleties and advanced usage. In my case, however, I was mainly just needed to know the syntax and basic data structures as I already knew how to program.

There's a new Python book for Bioinformatics from O'Reilly. No idea what the quality is like.
http://oreilly.com/catalog/9780596154516/
That O'reilly book is currently sitting on my desk about halfway finished. I would not recommended it as a book to start learning python with. I bought O'reilly Learning Python a few years ago (cover says now includes python 2.3!) and this book has been invaluable and I can only imagine one that covers more recent versions would be better. The Learning Python book is well-written and has good examples to work through.
chaz81 is offline   Reply With Quote
Old 05-15-2010, 11:06 PM   #31
zbjorn
Junior Member
 
Location: Cambridge, MA

Join Date: May 2010
Posts: 7
Default

I use a handful of languages. Here goes.

Mathematica is my absolute favorite because of its versatility, high performance, incorporation of high level functions, incredible documentation and useful interactive front end. I'm also the only biologist I know who uses it, though I know the Lawrence National Labs at least prototypes code in it.

I rarely use perl. It is easy to be sloppy in it and I don't like that about it, but it is nice for some basic scripts (e.g. rearranging data). Python is about the same in terms of functionality, and I like its syntax better.

Matlab is so clunky and poorly documented that I don't think it's worth the trouble.

R, dislike syntax structure. Who came up with the reverse symbol assignments? Plenty of other languages do what R can do, no need for a dedicated statistics language.

Java is good for applications requiring GUIs. It's OK for backend work too... despite common belief it's not inherently slow.

C is the king if you need to do real software authoring.

Finally, I use any of the .NET languages (Windows!) for hardware interfaces (if I'm writing control software for lab robotics).

So, if I had to choose two, it would be Mathematica and C I suppose.
zbjorn is offline   Reply With Quote
Old 05-18-2010, 11:40 AM   #32
martian_bob
Member
 
Location: New York

Join Date: Feb 2010
Posts: 11
Default

I got my Ph.D in computer science as opposed to anything biological, so here's my two cents from that point of view...

- If you're writing for computational speed, go with C or C++
- If you're writing a GUI, go with Java
- If you just need to get it done and never look at it again, go with Perl
- If you're setting up a pipeline, especially one dealing with converting data formats, go with Python

People tend to advertise jobs looking for C++ or Java when they want folks to write code that'll eventually be released for other people to use as a stand-alone tool.

I do all of my analysis using Python to massage and analyze data for tools like Bowtie and genome browsers.
martian_bob is offline   Reply With Quote
Old 05-18-2010, 12:28 PM   #33
jiaco
Member
 
Location: GMT +1

Join Date: May 2010
Posts: 33
Default

First off, I do not have anything against all the shiny new languages and IDEs and such, but if a young programmer stumbles across this thread and takes to heart what people are saying, I feel obligated to input my opinion.

Learn C and bash and the most basic stuff first. LEARN vi as your IDE and your word processor and your only way of knowing how to enter text. Understand how to log into a machine with the most basic of linux available and to actually do something functional to bring it back to life. There will be times when there is no python, no jvm, no eclipse. If you cannot function in such an environment then you are shooting yourself in the foot.

The compiler is your friend. You can spend loads of time writing code. If you learn stellar white space habits, your code is readable and you can be fairly confident of what you have written. Then having the compiler pass over it before debugging is a great way to catch stupid and serious errors without wasting more time debugging.

While I now use exclusively Qt (C++), I still force myself to get dirty in C just to keep it fresh. Plus bash and the history command should be your best friends. They basically record your actions, give you an easy way to script up a pipeline and serve as a form of documentation. Once you have your script, be sure to comment it with database versions, and any other relevant info that may change in the future.

Best of luck

:wq
jiaco is offline   Reply With Quote
Old 05-18-2010, 03:43 PM   #34
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 991
Default

Quote:
Originally Posted by jiaco View Post
First off, I do not have anything against all the shiny new languages and IDEs and such, but if a young programmer stumbles across this thread and takes to heart what people are saying, I feel obligated to input my opinion.
Sorry, but for the benefit of these "young programmers" I have to disagree strongly.

Basically, you should distinguish two cases. (a) A young student aspires to become a professional in (scientific or other) software development. (b) A scientist who already works in research (or is studying biology, not CS) want to broaden his skills in order to perform some bioinformatics analyses himself.

Jiaco's advice is well suited for case (a). There are many professional developers with a CS degree but without an understanding of what is going on under the hood of a computer. These usually come from universities which kicked C out of the curriculum and I share Jiaco's frustration about this.

However, most reader here will fall into case (b). They don't want to be able to replace of a fully qualified computer scientist. Rather, they already have a qualification, and that is biology or biotech engineering.

Hence, I fully agree with the emphasis that was put in this thread on scripting languages, especially Python.

They allow you to get a job done fast, and they are much easier to learn.

What has not yet been mentioned here is the fundamental trade-off between compiled languages and scripting languages, namely runtime speed versus development speed: An developer experienced in both languages might need half a day to code something in Python and two days to get the same job done in C. However, the Python program may take, say five minutes to run, while the C program needs only half a minute. But only if you plan to run the program very often, you will get back your investment in development time from having to wait less for the program to run.

That is not to say that there are not a lot of problems in bioinformatics that require strong C/C++ and computer science theory skills but these skills are not something acquired within a few weeks or months. (I would be unemployed if all biologists were experts in computer science, too.)

Simon

Last edited by Simon Anders; 05-18-2010 at 03:44 PM. Reason: slight rewording
Simon Anders is offline   Reply With Quote
Old 05-18-2010, 06:36 PM   #35
quinlana
Senior Member
 
Location: Charlottesville

Join Date: Sep 2008
Posts: 119
Default

Quote:
Originally Posted by Simon Anders View Post
Sorry, but for the benefit of these "young programmers" I have to disagree strongly.

Basically, you should distinguish two cases. (a) A young student aspires to become a professional in (scientific or other) software development. (b) A scientist who already works in research (or is studying biology, not CS) want to broaden his skills in order to perform some bioinformatics analyses himself.

Jiaco's advice is well suited for case (a). There are many professional developers with a CS degree but without an understanding of what is going on under the hood of a computer. These usually come from universities which kicked C out of the curriculum and I share Jiaco's frustration about this.

However, most reader here will fall into case (b). They don't want to be able to replace of a fully qualified computer scientist. Rather, they already have a qualification, and that is biology or biotech engineering.

Hence, I fully agree with the emphasis that was put in this thread on scripting languages, especially Python.

They allow you to get a job done fast, and they are much easier to learn.

What has not yet been mentioned here is the fundamental trade-off between compiled languages and scripting languages, namely runtime speed versus development speed: An developer experienced in both languages might need half a day to code something in Python and two days to get the same job done in C. However, the Python program may take, say five minutes to run, while the C program needs only half a minute. But only if you plan to run the program very often, you will get back your investment in development time from having to wait less for the program to run.

That is not to say that there are not a lot of problems in bioinformatics that require strong C/C++ and computer science theory skills but these skills are not something acquired within a few weeks or months. (I would be unemployed if all biologists were experts in computer science, too.)

Simon
Simon is dead-on here. Ferraris aren't wise choices for trips to the grocery just as my uncle's Prius won't win LeMans. I use C++ (and sometimes C) for the races and Python for getting groceries, daily work and medium-size applications/prototypes. The two play very nicely and the general programming concepts are transferable. In one day you can get over the whitespace issue in Python and in two more you can get past the extra bit of work that one must do to for REGEX and multi-level hashes/dictionaries relative to Perl. Then learn iterators, comprehensions and what is and isn't mutable and after that, it's smooth sailing.

In closing, http://xkcd.com/353/
quinlana is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 09:42 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO