SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
anybody know this 5th Virtual Training Workshop on Bioinfo? SOLiDance Bioinformatics 0 01-10-2012 04:43 PM
Conf Bioinfo and Comp Bio (Mar 12-14, 2012, Las Vegas, USA) cdragon Events / Conferences 0 11-21-2011 12:40 PM
Web Developer eandrade Academic/Non-Profit Jobs 0 06-09-2011 10:34 AM
Bioinfo postdoc position available testhere Academic/Non-Profit Jobs 0 01-05-2011 06:43 PM
Bioinfo-core conference call on ChIP-Seq simonandrews Events / Conferences 5 10-05-2009 06:49 AM

Reply
 
Thread Tools
Old 04-20-2011, 10:44 AM   #1
DeNovoG
Junior Member
 
Location: South America

Join Date: May 2010
Posts: 7
Default A Message for the BioInfo Developer Community

Dear developers,

First of all, thanks for all the tools out there that makes our biological research and data analysis possible.

Today, I just wanted to lay a constructive critique about source code commenting.

Please be kind and thouroughly comment your source codes for software and scripts following the pragmatic and style rules suggested by your programming language. Especially tell us what your subroutines do and what are the variables for.

#
//

I believe that this practice will contribute with our community which shares many different programming languages and levels of understanding among users.

Thank you for your consideration,

Best Regards,


DeNovoG
DeNovoG is offline   Reply With Quote
Old 04-20-2011, 11:11 AM   #2
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

"My code is well-documented" does not get grants unfortunately.
nilshomer is offline   Reply With Quote
Old 04-20-2011, 11:54 AM   #3
dawe
Senior Member
 
Location: 45°30'25.22"N / 9°15'53.00"E

Join Date: Apr 2009
Posts: 258
Default

Quote:
Originally Posted by nilshomer View Post
"My code is well-documented" does not get grants unfortunately.
LOL.
sad but true.
BTW, DeNovoG is right about asking for better comments, unfortunately few bioinfo developers have a sufficient experience in large projects where good commenting is somehow mandatory. Also, I suspect most of the bosses do not look at the code but at results instead (especially when the boss is not a developer...).
My experience: every time I start I try to comment everything, in a few days I start skipping long comments because I'm lazy. After a couple of releases I try to do "offline commenting" and I inevitably wonder "why did I write this? What's that?". At least I try to give functions and variables elucidating names (not simply x, y and i)

d
dawe is offline   Reply With Quote
Old 04-20-2011, 12:03 PM   #4
biznatch
Senior Member
 
Location: Canada

Join Date: Nov 2010
Posts: 124
Default

I extensively comment all my scripts because I'm pretty new at this and if I didn't I'd forgot what most of it did and have to spend a long time figuring it out again every time I went back to make modifications.
biznatch is offline   Reply With Quote
Old 04-20-2011, 01:11 PM   #5
dp05yk
Member
 
Location: Brock University

Join Date: Dec 2010
Posts: 66
Default

This rings true for me - It was difficult to start creating pBWA due to lack of commenting (in fact I am still having trouble understanding how some of the biology stuff works due to this!), however I totally get where Nils is coming from.
dp05yk is offline   Reply With Quote
Old 04-21-2011, 07:52 AM   #6
apfejes
Senior Member
 
Location: Oakland, California

Join Date: Feb 2008
Posts: 236
Default

I would also add, "my code is well documented" does not fly with PhD committees either.
__________________
The more you know, the more you know you don't know. —Aristotle
apfejes is offline   Reply With Quote
Old 04-21-2011, 10:07 AM   #7
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

A great example of well documented and well written code can be found from the Broad (GATK, and especially Picard). How do we incentivize other groups or graduate students to produce quality and commented code beyond simple altruism? My advisor wanted it yesterday and there is a one-in-a-million case where the competing tool is better are opposite to this goal.

An extreme requirement would be that if any software is being produced as part of a grant, the code documentation system (javadoc/doxygen/etc) as well as the coding standards are proposed. We could also educate biologists (non-programmers) on the importance of good software engineering practices (beyond timeliness).
nilshomer is offline   Reply With Quote
Old 04-21-2011, 10:24 AM   #8
apfejes
Senior Member
 
Location: Oakland, California

Join Date: Feb 2008
Posts: 236
Default

Open source code with multiple developers does tend to have better documentation, by necessity.On the flip side, single person projects don't have the time or necessity to documenting everything. Thus, larger teams encourage better code both because they have more manpower to devote to documentation and because they have the need for better documentation.

It's no surprise that Picard is well documented: It's open source, it's well funded, and used/worked on by a lot of people. [Edit: I don't actually know that it's well funded - I've just always thought it was because I know it has several developers working on it concurrently.]

The recipe for good documentation is to fund software projects so that more people work on them, which means documentation goes from being optional to being required, and the resources exist to do it well. As long as the funds aren't there, you'll get single developer software, which doesn't require the same documentation and the developer probably doesn't have the time to devote to it anyhow.
__________________
The more you know, the more you know you don't know. —Aristotle
apfejes is offline   Reply With Quote
Old 04-21-2011, 03:16 PM   #9
cwhelan
Member
 
Location: Cambridge, MA

Join Date: Nov 2010
Posts: 23
Default

My thoughts on this are biased by my training in Agile development environments, but here's my two cents on one way to incentivize programmers to write documentation (if you'll stretch the definition a little ):

My favorite type of documentation is an automated unit test suite, and the best way to get developers to write documentation and keep it up to date is to show them that writing automated tests for their code has benefits for them in the actual software development process. Commenting and documenting always used to end up a low priority for me because I was never sure if I'd end up keeping my code, changing it, or even throwing it away later, as my requirements or understanding of the problem changed. Plus as noted above there's not much external incentive to write them. So I added comments as an afterthought and often let them get out of date.

When I learned test-driven development, I found that my code got much cleaner and easier to maintain if I wrote unit tests at the same time as production code. Plus it gave me nice documentation; if I forget what a method does the best way to understand it is to read through a well-written unit test.

Of course, some algorithms are more testable than others (it's really hard to write a unit test that documents a dynamic programming algorithm in my experience), and it's hard to write tests for some of the more script-y things we have to do in bioinformatics. Some things just aren't worth the extra effort to test too, although those are exactly the things that tend to come back to bite me later when I skip testing. Overall, though, I really find writing and working with tested code productive and satisfying.
cwhelan is offline   Reply With Quote
Old 04-22-2011, 01:25 PM   #10
Michael.James.Clark
Senior Member
 
Location: Palo Alto

Join Date: Apr 2009
Posts: 213
Default

There's the point that strong documentation and commenting throughout code doesn't tend to yield grant money or result in stronger publications, but I think it does in a tangential way. For example, making it so other people can utilize your program and code more effectively means you're more likely to have it used in their publications downstream. In that sense, it's a good idea to document well.

The other major benefit is it typically leads to receiving many fewer questions on usage that suck up time down the line (assuming you actually support your program, which I have to say most academic bioinformatics developers do a bang up job of in my experience).

Without pointing out specific examples, I can think of a number of programs that will not end up in my publications in the future because they were frankly too hard to use and not well documented enough. One in particular seemed impenetrable despite being incredibly useful in theory--I just had to give up after two days of trying with no email reply to my questions and no strong documentation because it wasn't worth my time.

On the flip side, some positive examples of programs I've found benefiting from strong documentation include Annovar, BEDtools, BFAST, Dindel, GATK, VCFtools (and others). These are all from academic sources (with varying levels of funding and teamwork on them) and despite being fairly complicated programs generally (okay, BEDtools/VCFtools are very straightforward, but still great) I was able to get them up and running very quickly despite a relatively weak background in programming.

Decent documentation probably should be a requirement when academic publishing bioinformatic software. We've all seen those programs where there's a paper, a program, and any questions about the program get referred to the paper, which isn't typically helpful. Hopefully that's been changing because of the above-mentioned advantages to strong documentation.
__________________
Mendelian Disorder: A blogshare of random useful information for general public consumption. [Blog]
Breakway: A Program to Identify Structural Variations in Genomic Data [Website] [Forum Post]
Projects: U87MG whole genome sequence [Website] [Paper]
Michael.James.Clark is offline   Reply With Quote
Old 04-22-2011, 01:40 PM   #11
apfejes
Senior Member
 
Location: Oakland, California

Join Date: Feb 2008
Posts: 236
Default

Hi Michael,

I agree about your points when applied to "user documentation", but this particular thread was about "code documentation" and I'm not sure they're interchangeable.
__________________
The more you know, the more you know you don't know. —Aristotle
apfejes is offline   Reply With Quote
Old 04-22-2011, 01:57 PM   #12
Michael.James.Clark
Senior Member
 
Location: Palo Alto

Join Date: Apr 2009
Posts: 213
Default

Well, I am probably expanding on the original topic a bit.

I think the over-arching topic here is things developers probably don't need for themselves that are very useful to users. In that case, commenting code and strong documentation are both part of that.

The other thing we all benefit from is strong error reporting, which again I think goes in the same bin of things users would love that developers don't necessarily need (or directly benefit from) for themselves.

The point I'm making is that including things that may not count to your CS Ph.D committee or that may not have meaning on a grant app can still have an impact on you personally--making your code and programs easier to use for others makes it more likely they'll use it and therefore publish with it/communicate with you/et cetera, which benefits you personally.
__________________
Mendelian Disorder: A blogshare of random useful information for general public consumption. [Blog]
Breakway: A Program to Identify Structural Variations in Genomic Data [Website] [Forum Post]
Projects: U87MG whole genome sequence [Website] [Paper]
Michael.James.Clark is offline   Reply With Quote
Old 04-22-2011, 02:23 PM   #13
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

To summarize, unit tests and code documentation are great make robust code and better communicate across developers. User documentation and usability allow the software to be used more easily by a wider group of people, making your software invaluable and more likely to receive software support funding. Most of this is software engineering 101.

Nonetheless, these are all indirect incentives, and are not built pro-actively into the funding or even training mechanisms; we have not talked about training individuals for software engineering (is this the PIs role and/or coursework?).

What I really want to see come from this discussion is how can we train and support computer scientists, bioinformaticians, and biologists so that they write well documented (user and code) usable software?

I will throw a suggestion of the top of my head to get the ball rolling. How about an open source software repository, like an bio-app store, which allows for a central repository of cutting edge (not ten release ago) bio-apps that have software that meet (or are ranked on) certain criteria on documentation, code standards, and portability. I am sure a grant reviewer would then find it easy to look up past software and say, person Y always produces well-documented and well-used software. Similarly, a requirement of applying for a grant that includes software is submission and release of software that meets these standards. Accountability and training is the key.
nilshomer is offline   Reply With Quote
Old 04-22-2011, 03:19 PM   #14
apfejes
Senior Member
 
Location: Oakland, California

Join Date: Feb 2008
Posts: 236
Default

Michael: I think you're preaching to the choir - but really, there's a huge difference between documentation for developers and documentation for users. Conflating the two - or trying to discuss them both at once - isn't going to result in a productive set of action items.

Personally, I've spent far more time maintaining, cleaning and documenting my code than my committee or my advisor really would like. It slows down development and only shows benefits in the long term. As long as code is being developed by people who expect to work on a project for less than a year or so, you're going to have a hard time convincing them of the benefits of good coding practice. And, I think it's relatively obvious, post-docs and grad students rarely have that kind of long term vision unless the project is actively managed by their PI or an institute staff member. (I'd like to think my own code is an exception to the rule, just because I really believe strongly in good coding practices.)

At any rate, I really like Nils' suggestion of an open source repository for bioinformatics software. I currently use SourceForge (as a few others do), but there's no sense of developer community specific to bioinformatics in that environment. Nor is there a "bioinformatics app-store" specific for that community - thus, it would also be able to help with organizing projects and directing people to contribute to existing software, as well as providing forums much more tailored to bioinformatics needs. Even better, if it could be used to do automatic nightly builds of the software, it would force developers to use unit tests to keep from breaking the head of their trees - nightly builds are a good indication of the stability of software.

Edit: Just to be clear, nightly builds + nightly unit tests would be a great indication of the stability, as long as the visitors to the site get some stats on the number of unit tests passed, etc. I realize that nightly builds on their own would only fail when compilation errors are present, which, on it's own would be good start.
__________________
The more you know, the more you know you don't know. —Aristotle

Last edited by apfejes; 04-22-2011 at 04:20 PM. Reason: clarity
apfejes is offline   Reply With Quote
Old 04-25-2011, 06:32 PM   #15
Michael.James.Clark
Senior Member
 
Location: Palo Alto

Join Date: Apr 2009
Posts: 213
Default

Quote:
Originally Posted by apfejes View Post
Michael: I think you're preaching to the choir - but really, there's a huge difference between documentation for developers and documentation for users. Conflating the two - or trying to discuss them both at once - isn't going to result in a productive set of action items.
I haven't "conflated" anything, though. It's your opinion that the two subjects--code comments and user documentation--can't be discussed simultaneously. I disagree--I think both are important components that contribute to the quality of open-source software and, frankly, to how strong of a software engineer/bioinformaticist/team you are. So kudos to you for "over"-commenting and documenting your own code. Were I on your committee, that would count for something in my book.

And I think our topic here is really that a greater incentive and emphasis should be made on code commenting, unit testing, user documentation, and interaction than is currently being made in the CS/computational biology/bioinformatics academic community.

As Nils said, software engineering 101--the software needs to be accessible to both users and developers, and I do think it should be judged as such.

Quote:
Originally Posted by nilshomer View Post
I will throw a suggestion of the top of my head to get the ball rolling. How about an open source software repository, like an bio-app store, which allows for a central repository of cutting edge (not ten release ago) bio-apps that have software that meet (or are ranked on) certain criteria on documentation, code standards, and portability. I am sure a grant reviewer would then find it easy to look up past software and say, person Y always produces well-documented and well-used software. Similarly, a requirement of applying for a grant that includes software is submission and release of software that meets these standards. Accountability and training is the key.
Certainly a central repository for such software would be fantastic. I think it's a great idea, but who would determine the criteria and judge the software and how?

I'm thinking somewhere in between the like button and the help mailing list.

Maybe hand-in-hand with that is a more general and open feedback mechanism. I recognize sites like SourceForge/GoogleCode/GitHub/etc. give you the opportunity to give a rating and feedback, but it's under-utilized and rather primitive from what I've seen. It might be nice to have a breakdown of different features--usability, documentation, error reporting, et cetera--that could be rated (perhaps by identified users/developers rather than random faceless anons). A review system might be nice.
__________________
Mendelian Disorder: A blogshare of random useful information for general public consumption. [Blog]
Breakway: A Program to Identify Structural Variations in Genomic Data [Website] [Forum Post]
Projects: U87MG whole genome sequence [Website] [Paper]
Michael.James.Clark is offline   Reply With Quote
Old 04-25-2011, 09:56 PM   #16
apfejes
Senior Member
 
Location: Oakland, California

Join Date: Feb 2008
Posts: 236
Default

Quote:
Originally Posted by Michael.James.Clark View Post
It's your opinion that the two subjects--code comments and user documentation--can't be discussed simultaneously. I disagree--I think both are important components that contribute to the quality of open-source software and, frankly, to how strong of a software engineer/bioinformaticist/team you are. So kudos to you for "over"-commenting and documenting your own code. Were I on your committee, that would count for something in my book.
Thanks Michael. I really appreciate that. (=

I would like to clarify one thing, however. It's not that the two types of documentation can't be discussed simultaneously, but that they shouldn't be discussed simultaneously. One targets developers, one targets users - and both need to be there. Since the thread started explicitly as a comment on the need to document better for future developers, I thought it might be more productive to keep them separate.

In terms of attracting users, the issues are (IMHO) more straightforward: a good manual, functionality and interface. However, All of that documentation needs to be complete for the development community to embrace a piece of software, in addition to the many other factors involved in getting new developers to buy in. (e.g. choice of language, modularity of code, design, development model, ease of contribution, etc.) I'd rather focus on the later set, as that's where the real meat of this conversation seems to be for me.


Quote:
Originally Posted by Michael.James.Clark View Post
And I think our topic here is really that a greater incentive and emphasis should be made on code commenting, unit testing, user documentation, and interaction than is currently being made in the CS/computational biology/bioinformatics academic community.

As Nils said, software engineering 101--the software needs to be accessible to both users and developers, and I do think it should be judged as such.
First, I agree with your point that a greater commitment is required to good code practices in development. No doubt academic code has a long way to go.

Second, I'd still separate the user/developer communities. While bioinformaticians do an ok job of appealing to users (far from good, but the basic elements are there), we do a terrible job of creating community projects, which is what I feel is really holding the field back.

Where the real gains are to be made in bioinformatics are in making better use of developer's time - If we had all 40 people who had written their own ChIP-Seq code working together instead of generating 40 different peak finders, I think epigenetics would really be accelerated as a field. That would have required a serious, coordinated central project or two in which the learning curve for new developers was as small as possible, aka: good code documentation, etc. Think of a single peak-finder core with the ability for different developers/labs to strap on new modules for it, much the same way R has modules... but now I'm starting to digress.

Anyhow, I agree with your other points on both sourceforge and rating systems. Though, I still think Nils' point makes sense: If we had a single collective bioinformatics repository, it would be much easier to build resources like the ones you've described around such a facility. Rating, project evaluations, and more open feedback mechanisms could all be integrated. I suspect by simply enriching for a bioinformatics population you'd find many of the mechanisms better used.

As long as we lack a unified project location (which is encouraged by the academic environment where each lab hosts their own projects), it would be a much greater challenge to integrate all of this. However, I can imagine a bioinformatics portal fulfilling that function.

Perhaps ECO would be interested in adding a new developer specific component to seqanswers?

(And yes, I just said portal - the 90's are going to hunt me down for using such an outdated meme.)
__________________
The more you know, the more you know you don't know. —Aristotle
apfejes is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 06:50 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO