Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • A Message for the BioInfo Developer Community

    Dear developers,

    First of all, thanks for all the tools out there that makes our biological research and data analysis possible.

    Today, I just wanted to lay a constructive critique about source code commenting.

    Please be kind and thouroughly comment your source codes for software and scripts following the pragmatic and style rules suggested by your programming language. Especially tell us what your subroutines do and what are the variables for.

    #
    //

    I believe that this practice will contribute with our community which shares many different programming languages and levels of understanding among users.

    Thank you for your consideration,

    Best Regards,


    DeNovoG

  • #2
    "My code is well-documented" does not get grants unfortunately.

    Comment


    • #3
      Originally posted by nilshomer View Post
      "My code is well-documented" does not get grants unfortunately.
      LOL.
      sad but true.
      BTW, DeNovoG is right about asking for better comments, unfortunately few bioinfo developers have a sufficient experience in large projects where good commenting is somehow mandatory. Also, I suspect most of the bosses do not look at the code but at results instead (especially when the boss is not a developer...).
      My experience: every time I start I try to comment everything, in a few days I start skipping long comments because I'm lazy. After a couple of releases I try to do "offline commenting" and I inevitably wonder "why did I write this? What's that?". At least I try to give functions and variables elucidating names (not simply x, y and i)

      d

      Comment


      • #4
        I extensively comment all my scripts because I'm pretty new at this and if I didn't I'd forgot what most of it did and have to spend a long time figuring it out again every time I went back to make modifications.

        Comment


        • #5
          This rings true for me - It was difficult to start creating pBWA due to lack of commenting (in fact I am still having trouble understanding how some of the biology stuff works due to this!), however I totally get where Nils is coming from.

          Comment


          • #6
            I would also add, "my code is well documented" does not fly with PhD committees either.
            The more you know, the more you know you don't know. —Aristotle

            Comment


            • #7
              A great example of well documented and well written code can be found from the Broad (GATK, and especially Picard). How do we incentivize other groups or graduate students to produce quality and commented code beyond simple altruism? My advisor wanted it yesterday and there is a one-in-a-million case where the competing tool is better are opposite to this goal.

              An extreme requirement would be that if any software is being produced as part of a grant, the code documentation system (javadoc/doxygen/etc) as well as the coding standards are proposed. We could also educate biologists (non-programmers) on the importance of good software engineering practices (beyond timeliness).

              Comment


              • #8
                Open source code with multiple developers does tend to have better documentation, by necessity.On the flip side, single person projects don't have the time or necessity to documenting everything. Thus, larger teams encourage better code both because they have more manpower to devote to documentation and because they have the need for better documentation.

                It's no surprise that Picard is well documented: It's open source, it's well funded, and used/worked on by a lot of people. [Edit: I don't actually know that it's well funded - I've just always thought it was because I know it has several developers working on it concurrently.]

                The recipe for good documentation is to fund software projects so that more people work on them, which means documentation goes from being optional to being required, and the resources exist to do it well. As long as the funds aren't there, you'll get single developer software, which doesn't require the same documentation and the developer probably doesn't have the time to devote to it anyhow.
                The more you know, the more you know you don't know. —Aristotle

                Comment


                • #9
                  My thoughts on this are biased by my training in Agile development environments, but here's my two cents on one way to incentivize programmers to write documentation (if you'll stretch the definition a little ):

                  My favorite type of documentation is an automated unit test suite, and the best way to get developers to write documentation and keep it up to date is to show them that writing automated tests for their code has benefits for them in the actual software development process. Commenting and documenting always used to end up a low priority for me because I was never sure if I'd end up keeping my code, changing it, or even throwing it away later, as my requirements or understanding of the problem changed. Plus as noted above there's not much external incentive to write them. So I added comments as an afterthought and often let them get out of date.

                  When I learned test-driven development, I found that my code got much cleaner and easier to maintain if I wrote unit tests at the same time as production code. Plus it gave me nice documentation; if I forget what a method does the best way to understand it is to read through a well-written unit test.

                  Of course, some algorithms are more testable than others (it's really hard to write a unit test that documents a dynamic programming algorithm in my experience), and it's hard to write tests for some of the more script-y things we have to do in bioinformatics. Some things just aren't worth the extra effort to test too, although those are exactly the things that tend to come back to bite me later when I skip testing. Overall, though, I really find writing and working with tested code productive and satisfying.

                  Comment


                  • #10
                    There's the point that strong documentation and commenting throughout code doesn't tend to yield grant money or result in stronger publications, but I think it does in a tangential way. For example, making it so other people can utilize your program and code more effectively means you're more likely to have it used in their publications downstream. In that sense, it's a good idea to document well.

                    The other major benefit is it typically leads to receiving many fewer questions on usage that suck up time down the line (assuming you actually support your program, which I have to say most academic bioinformatics developers do a bang up job of in my experience).

                    Without pointing out specific examples, I can think of a number of programs that will not end up in my publications in the future because they were frankly too hard to use and not well documented enough. One in particular seemed impenetrable despite being incredibly useful in theory--I just had to give up after two days of trying with no email reply to my questions and no strong documentation because it wasn't worth my time.

                    On the flip side, some positive examples of programs I've found benefiting from strong documentation include Annovar, BEDtools, BFAST, Dindel, GATK, VCFtools (and others). These are all from academic sources (with varying levels of funding and teamwork on them) and despite being fairly complicated programs generally (okay, BEDtools/VCFtools are very straightforward, but still great) I was able to get them up and running very quickly despite a relatively weak background in programming.

                    Decent documentation probably should be a requirement when academic publishing bioinformatic software. We've all seen those programs where there's a paper, a program, and any questions about the program get referred to the paper, which isn't typically helpful. Hopefully that's been changing because of the above-mentioned advantages to strong documentation.
                    Mendelian Disorder: A blogshare of random useful information for general public consumption. [Blog]
                    Breakway: A Program to Identify Structural Variations in Genomic Data [Website] [Forum Post]
                    Projects: U87MG whole genome sequence [Website] [Paper]

                    Comment


                    • #11
                      Hi Michael,

                      I agree about your points when applied to "user documentation", but this particular thread was about "code documentation" and I'm not sure they're interchangeable.
                      The more you know, the more you know you don't know. —Aristotle

                      Comment


                      • #12
                        Well, I am probably expanding on the original topic a bit.

                        I think the over-arching topic here is things developers probably don't need for themselves that are very useful to users. In that case, commenting code and strong documentation are both part of that.

                        The other thing we all benefit from is strong error reporting, which again I think goes in the same bin of things users would love that developers don't necessarily need (or directly benefit from) for themselves.

                        The point I'm making is that including things that may not count to your CS Ph.D committee or that may not have meaning on a grant app can still have an impact on you personally--making your code and programs easier to use for others makes it more likely they'll use it and therefore publish with it/communicate with you/et cetera, which benefits you personally.
                        Mendelian Disorder: A blogshare of random useful information for general public consumption. [Blog]
                        Breakway: A Program to Identify Structural Variations in Genomic Data [Website] [Forum Post]
                        Projects: U87MG whole genome sequence [Website] [Paper]

                        Comment


                        • #13
                          To summarize, unit tests and code documentation are great make robust code and better communicate across developers. User documentation and usability allow the software to be used more easily by a wider group of people, making your software invaluable and more likely to receive software support funding. Most of this is software engineering 101.

                          Nonetheless, these are all indirect incentives, and are not built pro-actively into the funding or even training mechanisms; we have not talked about training individuals for software engineering (is this the PIs role and/or coursework?).

                          What I really want to see come from this discussion is how can we train and support computer scientists, bioinformaticians, and biologists so that they write well documented (user and code) usable software?

                          I will throw a suggestion of the top of my head to get the ball rolling. How about an open source software repository, like an bio-app store, which allows for a central repository of cutting edge (not ten release ago) bio-apps that have software that meet (or are ranked on) certain criteria on documentation, code standards, and portability. I am sure a grant reviewer would then find it easy to look up past software and say, person Y always produces well-documented and well-used software. Similarly, a requirement of applying for a grant that includes software is submission and release of software that meets these standards. Accountability and training is the key.

                          Comment


                          • #14
                            Michael: I think you're preaching to the choir - but really, there's a huge difference between documentation for developers and documentation for users. Conflating the two - or trying to discuss them both at once - isn't going to result in a productive set of action items.

                            Personally, I've spent far more time maintaining, cleaning and documenting my code than my committee or my advisor really would like. It slows down development and only shows benefits in the long term. As long as code is being developed by people who expect to work on a project for less than a year or so, you're going to have a hard time convincing them of the benefits of good coding practice. And, I think it's relatively obvious, post-docs and grad students rarely have that kind of long term vision unless the project is actively managed by their PI or an institute staff member. (I'd like to think my own code is an exception to the rule, just because I really believe strongly in good coding practices.)

                            At any rate, I really like Nils' suggestion of an open source repository for bioinformatics software. I currently use SourceForge (as a few others do), but there's no sense of developer community specific to bioinformatics in that environment. Nor is there a "bioinformatics app-store" specific for that community - thus, it would also be able to help with organizing projects and directing people to contribute to existing software, as well as providing forums much more tailored to bioinformatics needs. Even better, if it could be used to do automatic nightly builds of the software, it would force developers to use unit tests to keep from breaking the head of their trees - nightly builds are a good indication of the stability of software.

                            Edit: Just to be clear, nightly builds + nightly unit tests would be a great indication of the stability, as long as the visitors to the site get some stats on the number of unit tests passed, etc. I realize that nightly builds on their own would only fail when compilation errors are present, which, on it's own would be good start.
                            Last edited by apfejes; 04-22-2011, 04:20 PM. Reason: clarity
                            The more you know, the more you know you don't know. —Aristotle

                            Comment


                            • #15
                              Originally posted by apfejes View Post
                              Michael: I think you're preaching to the choir - but really, there's a huge difference between documentation for developers and documentation for users. Conflating the two - or trying to discuss them both at once - isn't going to result in a productive set of action items.
                              I haven't "conflated" anything, though. It's your opinion that the two subjects--code comments and user documentation--can't be discussed simultaneously. I disagree--I think both are important components that contribute to the quality of open-source software and, frankly, to how strong of a software engineer/bioinformaticist/team you are. So kudos to you for "over"-commenting and documenting your own code. Were I on your committee, that would count for something in my book.

                              And I think our topic here is really that a greater incentive and emphasis should be made on code commenting, unit testing, user documentation, and interaction than is currently being made in the CS/computational biology/bioinformatics academic community.

                              As Nils said, software engineering 101--the software needs to be accessible to both users and developers, and I do think it should be judged as such.

                              Originally posted by nilshomer View Post
                              I will throw a suggestion of the top of my head to get the ball rolling. How about an open source software repository, like an bio-app store, which allows for a central repository of cutting edge (not ten release ago) bio-apps that have software that meet (or are ranked on) certain criteria on documentation, code standards, and portability. I am sure a grant reviewer would then find it easy to look up past software and say, person Y always produces well-documented and well-used software. Similarly, a requirement of applying for a grant that includes software is submission and release of software that meets these standards. Accountability and training is the key.
                              Certainly a central repository for such software would be fantastic. I think it's a great idea, but who would determine the criteria and judge the software and how?

                              I'm thinking somewhere in between the like button and the help mailing list.

                              Maybe hand-in-hand with that is a more general and open feedback mechanism. I recognize sites like SourceForge/GoogleCode/GitHub/etc. give you the opportunity to give a rating and feedback, but it's under-utilized and rather primitive from what I've seen. It might be nice to have a breakdown of different features--usability, documentation, error reporting, et cetera--that could be rated (perhaps by identified users/developers rather than random faceless anons). A review system might be nice.
                              Mendelian Disorder: A blogshare of random useful information for general public consumption. [Blog]
                              Breakway: A Program to Identify Structural Variations in Genomic Data [Website] [Forum Post]
                              Projects: U87MG whole genome sequence [Website] [Paper]

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM
                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 06:37 PM
                              0 responses
                              8 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, Yesterday, 06:07 PM
                              0 responses
                              8 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-22-2024, 10:03 AM
                              0 responses
                              49 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-21-2024, 07:32 AM
                              0 responses
                              67 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X