Seqanswers Leaderboard Ad

**luc** · 01-11-2013, 08:34 PM

I guess it would be helpful to know a little bit more about the projects you are planning?

**dnart** · 01-12-2013, 02:13 AM

Sure. We do a variety of RNA and DNA studies, so RNA-Seq, and whole exome DNA for now, contemplating some ChIP-Seq and Methylation in the future.

**yaximik** · 01-12-2013, 07:53 AM

Very interesting discussion, I just want to add my couple of cents. I ended up (in a good sense) in the Linux world because I hate commercial software. I came into computing in general spoiled with Mac GUI when Windows were nowhere close yet to that. I was surprised though why people around the world did not appreciate Macs and preferred handicap Windows 3. The reason was that most of the smart and curious people do not like be enclosed into preset conditions, and this is what Mac GUI was, forget convenience. As processors improved, desktops could handle more and more GUI overhead, so Windows95 and later showed up following the Mac path in general, yet getting more and more stink. I guess that was one of reasons Linux was gaining popularity, for the same reason that smart and curious people did not like to deal with preset conditions set forth by Microsoft programmers. In Linux, one can use several GUI if prefers, still with much less overhead, or just use only CLI using computing resources in a most efficient way.
The same is with commercial software. First, the desire to increase its market share drives commercial soft to be broadly applicable within the scope, include a lot of bells and whistles - which all unavoidably makes it very cumbersome, expensive and difficult to learn to use efficiently what you have paid for. How many of us know all features of Photoshop or MS Office? But these just a few hundred bucks. Second, you pay for a way overpriced product (all resources in biomedical science are a way overpriced and government funding contributes a lot to this unspoken inflation) just to find out how irritated you get with solutions stuck in your face by programmers, who in the vast majority of cases do not work in the field and have no idea what end user needs, yet are very proud creating all these "beautiful" solutions and do not appreciate advice or critique. The bigger the company the more stupid and cumbersome program is. For example, a story with VectorNTI suite, which was very nicely integrated set of tools for handling a variety of bioinfomatics tasks. It was very popular with users, which brought its demise. It was purchased by monstrous Invitrogen, which succeeded to convert it into a sluggish piece of s###. The problem is that I already stuck with it as the database cannot be accessed by anything but VectorNTI, and it became quite large. Luckily, Geneious can import it , but it converts it into ts own format. So now I stuck with Geneious, which I do not like either as it is again a way too cumbersome and takes a long time to learn.
So this is how I ended up learning Linux and using free programs. I am not GUI or CLI fanatic, I use both as I find convenient. Why I need to type very long full path for a file if I just can open its containing folder, copy the path with a mouse click and paste it into a terminal window or into a script? I am not a programmer, but I can learn how I can find and accommodate for my specific needs efficient pieces of code thanks to experts out there (here, in particular), who are always good to offer a helping hand. I think those times when someone could get away being just a biologist or chemist are far in the past, unless one wants to work on the philosophical stone like alchemist in a cave. To use the vast amount of knowledge and information out there requires learning how to search, find and manipulate it and scripting or high level languages like Perl offer a good help, whether it is Windows, Mac or Linux - based effort. After all, even my professor had to learn in his late 60s how to use email.

**FractalExpression** · 01-12-2013, 12:36 PM

With all the functioning formats and process established for bioinformatics with Unix it's irrational for any commercial suite to introduce proprietary filetypes and modules. For example, we know BWA is highly-cited, that samtools is ubiquitous, and any new software GUI or CLI should adhere.

This is where I believe we need to create inclusive tools, and preferably commercial, not only because they provide incentive for polished stable products but also because it creates a healthy eco-system for computational biologists. For software to be truly inclusive, it must be welcoming to users of previous generations (point-click) while being a gateway to advanced operations (command line).

Anything you learn from purchased bioinformatics software should help you on your way towards becoming an advanced user, otherwise you are paying someone to help you become dependent.

**rfilbert** · 01-13-2013, 06:52 PM

I would like to add my persspective on "freeware" - it's absolutely not free! Besides the cost of hiring bioinformaticians to stitch it all together, it is paid for by the taxpayers - usually in the United States. Take Galaxy for example - they don't charge the worldwide users of Galaxy to run their organization - they charge the American taxpayers as the Galaxy project is government funded. At a time when it should be clear that our government is near bankruptcy, why don't we buy software from a US company who pays taxes to help pay down the US debt instead of increasing our debt and funding the world?

**yaximik** · 01-13-2013, 07:11 PM

Following this logic we need to shut down all NCBI databases for good - all this is also funded by taxpayers, and let everything run by private business, so they can happily charge 300% to get a nice profit margin. If one can buy exactly the same tupperware from Walmart for 10 bucks when it is offered by Pharmacia for 150 it is simply sick.

**rfilbert** · 01-13-2013, 11:44 PM

Not quite getting the tupperware analogy, but I take it from your logic that Ford charges 300% of a fair price for a car, and the solution is for the government to create a new tax to make cars and give these cars to US taxpayers and non-US taxpayers alike. To be clear, you are 100% anti-business and believe the government should produce all products?

**rboettcher** · 01-14-2013, 12:49 AM

Originally posted by rfilbert View Post

I would like to add my persspective on "freeware" - it's absolutely not free! Besides the cost of hiring bioinformaticians to stitch it all together, it is paid for by the taxpayers - usually in the United States. Take Galaxy for example - they don't charge the worldwide users of Galaxy to run their organization - they charge the American taxpayers as the Galaxy project is government funded. At a time when it should be clear that our government is near bankruptcy, why don't we buy software from a US company who pays taxes to help pay down the US debt instead of increasing our debt and funding the world?

Since the U.S. are the only state in the world funding research and none of the big U.S. companies have benefits from scientific results produced in other countries... (please note the irony).

If you want to clamor about how the government spends tax payers' money, maybe you want to switch to a different kind of forums as this is completely off topic here.

Now back to topic:

dnart, I would suggest you download/ask for a trial licence of one or more of the tool boxes available. This way, you can see for yourself whether you are confident in conducting a bioinformatics analysis on your own.
What I experienced so far, is that biologists want to simplify the informatics analysis as much as possible (which is understandable), but normally they fail to realize that this part can be equally demanding compared to lab work (which has been explained by previous posts).

Also please note that setting up a local computational solution will either way cost quite a lot of money, as NGS analysis requires quite a lot of computational resources. For instance, aligning a batch of 32 RNAseq samples took me 4 weeks on a decent workstation (8 cores, 24GB RAM), so we upgraded our capacities. However, the next batch will be ~60 samples with more to come in the future, thus we went for grid computing instead to circumvent this bottleneck. This also required help from our local IT support team but it reduced the waiting time to 2 weeks only.

Best regards

**xied75** · 01-14-2013, 10:15 AM

Originally posted by rboettcher View Post

Also please note that setting up a local computational solution will either way cost quite a lot of money, as NGS analysis requires quite a lot of computational resources. For instance, aligning a batch of 32 RNAseq samples took me 4 weeks on a decent workstation (8 cores, 24GB RAM), so we upgraded our capacities. However, the next batch will be ~60 samples with more to come in the future, thus we went for grid computing instead to circumvent this bottleneck. This also required help from our local IT support team but it reduced the waiting time to 2 weeks only.

Agree with you to go the cloud, otherwise you'll end up with many islands that need look after by IT.

Just curious, how many nodes you got from the Grid, did you manage to get a linear speedup?

Best,

dong

**dnart** · 01-14-2013, 05:07 PM

We've decided to evaluate a commercial product, Partek Flow, and a freeware solution, Galaxy. The reason for narrowing it down to those two was that we have already had a good experience with Partek's desktop solution to microarray, and we don't want to buy and maintain the hardware, Linux OS, softwares, etc. and these two seemed like the most viable options. If people are interested, I'll post a review of our comparison of the two solutions.

**BAMseek** · 01-14-2013, 05:59 PM

If people are interested, I'll post a review of our comparison of the two solutions.

I'd be interested to hear what you think about those tools. You might also want to check out

BaseSpace (Illumina)

I've never used it and it's fairly new but might also be of interest.

Justin

**Kennels** · 01-14-2013, 08:51 PM

May be a little late, but my two cents.

I come from lab like yours with people with basically no experience with NGS analysis and using linux environment. I was hired specifically for this, but my background is microbiology with very limited unix/NGS analysis experience prior to this position. I got this position with the understanding that I would do most of the 'learning' and educate the lab while performing the analysis, so to speak. In a way I guess I was bridging the biology to the analysis side of things, as I understood the strife that biologists often feel when trying to get into this area (no offense to anyone).

And I can say that after two years on the job, it is a lot of time to invest not only to learn how to use the software (both commercial, and open source, and GUIs and command line based. In fact this is the easy part), but to understand the parameters/algorithm behind them, how to properly interpret the output, and also very importantly, how to format datasets. Quite frankly commercial packages and even galaxy are not flexible enough to handle alot of the text based files and give you what you want, and even a half-decent knowledge of the command-line environment helps tremendously. Sequence data format vary from sequencing platform to platform, and manufacturers in fact change their output format time and time again, so it is still a very fluid space with few standards. Infrastructure to store and access the data is also important, and paid-for support are often limited and slow.

So even if you do decide on some commercial/freeware hybrid approach, there is a significant amount of time you still need to invest to understand what goes on behind the wheel, both of the software, and of the results. I often get asked things like 'why don't i just map this data to that reference, and presto, results! It's easy...', and commercial packages do offer this. But it often is not that simple. For example, if you read this thread: http://seqanswers.com/forums/showthread.php?t=24270
It says bowtie2 (an aligner) may not handle reporting multi-aligning reads well, and expression results or small RNA mapping may be underestimated. However if you are not familiar with the command line parameters, or even know what goes on 'behind the wheel', it can be hard to identify whether your results are optimal.

So a dedicated person (don't have to be a bioinformatician per se), or at least someone who is willing to invest the time to understand the software and data in the linux environment is best in my opinion. At least this way you know what questions to ask.

**scbaker** · 01-15-2013, 11:45 AM

I'm curious as to what people think about outsourcing their NGS analysis needs. Philosophically, this would be similar to using commercial software, but one step further. The advantage would be that you don't need to invest the time to become proficient in using the software (or the infrastructure for maintaining it), but the disadvantage is that you'd be one more step removed from your own data.

Or is outsourcing essentially just hiring your own bioinformatician (but only on an 'as needed' basis)?

**Zigster** · 01-15-2013, 12:45 PM

Originally posted by rfilbert View Post

Not quite getting the tupperware analogy, but I take it from your logic that Ford charges 300% of a fair price for a car, and the solution is for the government to create a new tax to make cars and give these cars to US taxpayers and non-US taxpayers alike. To be clear, you are 100% anti-business and believe the government should produce all products?

You are either a troll or unfamiliar with how scientific software is developed. I will assume the latter.

Scientific software is written largely by the underpaid masses in academia. Many of these developers are actually international graduate students at US institutions who pay their own way. So in addition to the $21 billion they bring into our economy, US companies can build directly on their research. This is a good thing.

Most commercial NGS software consists largely of GUI wrappers of existing algorithms. (I will exclude CLC and Novocraft from this, perhaps a few others, as they have put considerable effort into their own algorithms. Perhaps it is not a coincidence these are foreign companies.) Other companies specialize in compiling and cleaning data from public resources.

Commercial software companies are largely ill equipped to do the research necessary to develop programming APIs and platforms such as Bioconductor that are the cornerstone of how bioinformatics is actually conducted.

Developing front ends for biologists is a valuable service at which commercial providers excel, but this should not occur until the kinks and best practices have been worked out in the open source community over several years.

Most NGS analyses are simply not yet commoditized enough to be shoehorned into some point and click desktop app. Those commercial packages that are available often fall short in terms of being extensible, reproducible, or scalable.

**dpryan** · 01-16-2013, 01:56 AM

Originally posted by scbaker View Post

I'm curious as to what people think about outsourcing their NGS analysis needs. Philosophically, this would be similar to using commercial software, but one step further. The advantage would be that you don't need to invest the time to become proficient in using the software (or the infrastructure for maintaining it), but the disadvantage is that you'd be one more step removed from your own data.

Or is outsourcing essentially just hiring your own bioinformatician (but only on an 'as needed' basis)?

We did that early on, as no one in our group had any experience with bioinformatics. I have to say that outsourcing (at least to a company) has some serious issues. In my experience, most companies provide shoddy statistics using highly questionable procedures. Basically, unless you're using long-established protocols (e.g., differential expression using microarray), I wouldn't trust any company to do the work.

Collaborating with another group is probably the best way to go.

Frankly, familiarising yourself with the tools of bioinformatics (including R and Linux, I'm obviously not talking about any programming here) is not terribly difficult. If someone can't do that, they should probably pick a different field.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 22 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 24 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 20 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 52 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News