SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Commercial qPCR advantage? lorendarith Sample Prep / Library Generation 3 11-20-2012 10:24 AM
commercial software in sequencing analysis slny Bioinformatics 7 10-13-2012 08:57 AM
Non-commercial SNP calling soft *#1* Bioinformatics 4 07-08-2012 02:16 AM
Recommended Windows based freeware for sequence aligment and variant calling jcgrant31 Bioinformatics 12 04-22-2012 10:37 AM
PacBio reveals (few) commercial specs Fred The Pipeline 9 08-09-2010 09:39 AM

Reply
 
Thread Tools
Old 01-13-2013, 06:11 PM   #21
yaximik
Senior Member
 
Location: Oregon

Join Date: Apr 2011
Posts: 205
Default

Following this logic we need to shut down all NCBI databases for good - all this is also funded by taxpayers, and let everything run by private business, so they can happily charge 300% to get a nice profit margin. If one can buy exactly the same tupperware from Walmart for 10 bucks when it is offered by Pharmacia for 150 it is simply sick.
yaximik is offline   Reply With Quote
Old 01-13-2013, 10:44 PM   #22
rfilbert
Member
 
Location: San Diego, CA

Join Date: Dec 2012
Posts: 43
Default

Not quite getting the tupperware analogy, but I take it from your logic that Ford charges 300% of a fair price for a car, and the solution is for the government to create a new tax to make cars and give these cars to US taxpayers and non-US taxpayers alike. To be clear, you are 100% anti-business and believe the government should produce all products?
rfilbert is offline   Reply With Quote
Old 01-13-2013, 11:49 PM   #23
rboettcher
Member
 
Location: Berlin

Join Date: Oct 2010
Posts: 71
Default

Quote:
Originally Posted by rfilbert View Post
I would like to add my persspective on "freeware" - it's absolutely not free! Besides the cost of hiring bioinformaticians to stitch it all together, it is paid for by the taxpayers - usually in the United States. Take Galaxy for example - they don't charge the worldwide users of Galaxy to run their organization - they charge the American taxpayers as the Galaxy project is government funded. At a time when it should be clear that our government is near bankruptcy, why don't we buy software from a US company who pays taxes to help pay down the US debt instead of increasing our debt and funding the world?
Since the U.S. are the only state in the world funding research and none of the big U.S. companies have benefits from scientific results produced in other countries... (please note the irony).

If you want to clamor about how the government spends tax payers' money, maybe you want to switch to a different kind of forums as this is completely off topic here.

Now back to topic:

dnart, I would suggest you download/ask for a trial licence of one or more of the tool boxes available. This way, you can see for yourself whether you are confident in conducting a bioinformatics analysis on your own.
What I experienced so far, is that biologists want to simplify the informatics analysis as much as possible (which is understandable), but normally they fail to realize that this part can be equally demanding compared to lab work (which has been explained by previous posts).

Also please note that setting up a local computational solution will either way cost quite a lot of money, as NGS analysis requires quite a lot of computational resources. For instance, aligning a batch of 32 RNAseq samples took me 4 weeks on a decent workstation (8 cores, 24GB RAM), so we upgraded our capacities. However, the next batch will be ~60 samples with more to come in the future, thus we went for grid computing instead to circumvent this bottleneck. This also required help from our local IT support team but it reduced the waiting time to 2 weeks only.

Best regards

Last edited by rboettcher; 01-14-2013 at 05:02 AM.
rboettcher is offline   Reply With Quote
Old 01-14-2013, 09:15 AM   #24
xied75
Senior Member
 
Location: Oxford

Join Date: Feb 2012
Posts: 129
Default

Quote:
Originally Posted by rboettcher View Post
Also please note that setting up a local computational solution will either way cost quite a lot of money, as NGS analysis requires quite a lot of computational resources. For instance, aligning a batch of 32 RNAseq samples took me 4 weeks on a decent workstation (8 cores, 24GB RAM), so we upgraded our capacities. However, the next batch will be ~60 samples with more to come in the future, thus we went for grid computing instead to circumvent this bottleneck. This also required help from our local IT support team but it reduced the waiting time to 2 weeks only.
Agree with you to go the cloud, otherwise you'll end up with many islands that need look after by IT.

Just curious, how many nodes you got from the Grid, did you manage to get a linear speedup?

Best,

dong
xied75 is offline   Reply With Quote
Old 01-14-2013, 04:07 PM   #25
dnart
Junior Member
 
Location: London

Join Date: Jan 2013
Posts: 9
Default

We've decided to evaluate a commercial product, Partek Flow, and a freeware solution, Galaxy. The reason for narrowing it down to those two was that we have already had a good experience with Partek's desktop solution to microarray, and we don't want to buy and maintain the hardware, Linux OS, softwares, etc. and these two seemed like the most viable options. If people are interested, I'll post a review of our comparison of the two solutions.
dnart is offline   Reply With Quote
Old 01-14-2013, 04:59 PM   #26
BAMseek
Senior Member
 
Location: St. Louis, MO, USA

Join Date: Apr 2011
Posts: 124
Default

Quote:
If people are interested, I'll post a review of our comparison of the two solutions.
I'd be interested to hear what you think about those tools. You might also want to check out

BaseSpace (Illumina)

I've never used it and it's fairly new but might also be of interest.

Justin
BAMseek is offline   Reply With Quote
Old 01-14-2013, 07:51 PM   #27
Kennels
Senior Member
 
Location: Sydney

Join Date: Feb 2011
Posts: 149
Default

May be a little late, but my two cents.

I come from lab like yours with people with basically no experience with NGS analysis and using linux environment. I was hired specifically for this, but my background is microbiology with very limited unix/NGS analysis experience prior to this position. I got this position with the understanding that I would do most of the 'learning' and educate the lab while performing the analysis, so to speak. In a way I guess I was bridging the biology to the analysis side of things, as I understood the strife that biologists often feel when trying to get into this area (no offense to anyone).

And I can say that after two years on the job, it is a lot of time to invest not only to learn how to use the software (both commercial, and open source, and GUIs and command line based. In fact this is the easy part), but to understand the parameters/algorithm behind them, how to properly interpret the output, and also very importantly, how to format datasets. Quite frankly commercial packages and even galaxy are not flexible enough to handle alot of the text based files and give you what you want, and even a half-decent knowledge of the command-line environment helps tremendously. Sequence data format vary from sequencing platform to platform, and manufacturers in fact change their output format time and time again, so it is still a very fluid space with few standards. Infrastructure to store and access the data is also important, and paid-for support are often limited and slow.

So even if you do decide on some commercial/freeware hybrid approach, there is a significant amount of time you still need to invest to understand what goes on behind the wheel, both of the software, and of the results. I often get asked things like 'why don't i just map this data to that reference, and presto, results! It's easy...', and commercial packages do offer this. But it often is not that simple. For example, if you read this thread: http://seqanswers.com/forums/showthread.php?t=24270
It says bowtie2 (an aligner) may not handle reporting multi-aligning reads well, and expression results or small RNA mapping may be underestimated. However if you are not familiar with the command line parameters, or even know what goes on 'behind the wheel', it can be hard to identify whether your results are optimal.

So a dedicated person (don't have to be a bioinformatician per se), or at least someone who is willing to invest the time to understand the software and data in the linux environment is best in my opinion. At least this way you know what questions to ask.
Kennels is offline   Reply With Quote
Old 01-15-2013, 10:45 AM   #28
scbaker
Shawn Baker
 
Location: San Diego

Join Date: Aug 2008
Posts: 84
Default

I'm curious as to what people think about outsourcing their NGS analysis needs. Philosophically, this would be similar to using commercial software, but one step further. The advantage would be that you don't need to invest the time to become proficient in using the software (or the infrastructure for maintaining it), but the disadvantage is that you'd be one more step removed from your own data.

Or is outsourcing essentially just hiring your own bioinformatician (but only on an 'as needed' basis)?
scbaker is offline   Reply With Quote
Old 01-15-2013, 11:45 AM   #29
Zigster
(Jeremy Leipzig)
 
Location: Philadelphia, PA

Join Date: May 2009
Posts: 116
Default

Quote:
Originally Posted by rfilbert View Post
Not quite getting the tupperware analogy, but I take it from your logic that Ford charges 300% of a fair price for a car, and the solution is for the government to create a new tax to make cars and give these cars to US taxpayers and non-US taxpayers alike. To be clear, you are 100% anti-business and believe the government should produce all products?
You are either a troll or unfamiliar with how scientific software is developed. I will assume the latter.

Scientific software is written largely by the underpaid masses in academia. Many of these developers are actually international graduate students at US institutions who pay their own way. So in addition to the $21 billion they bring into our economy, US companies can build directly on their research. This is a good thing.

Most commercial NGS software consists largely of GUI wrappers of existing algorithms. (I will exclude CLC and Novocraft from this, perhaps a few others, as they have put considerable effort into their own algorithms. Perhaps it is not a coincidence these are foreign companies.) Other companies specialize in compiling and cleaning data from public resources.

Commercial software companies are largely ill equipped to do the research necessary to develop programming APIs and platforms such as Bioconductor that are the cornerstone of how bioinformatics is actually conducted.

Developing front ends for biologists is a valuable service at which commercial providers excel, but this should not occur until the kinks and best practices have been worked out in the open source community over several years.

Most NGS analyses are simply not yet commoditized enough to be shoehorned into some point and click desktop app. Those commercial packages that are available often fall short in terms of being extensible, reproducible, or scalable.
__________________
--
Jeremy Leipzig
Bioinformatics Programmer
--
My blog
Twitter

Last edited by Zigster; 01-15-2013 at 11:50 AM.
Zigster is offline   Reply With Quote
Old 01-16-2013, 12:56 AM   #30
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

Quote:
Originally Posted by scbaker View Post
I'm curious as to what people think about outsourcing their NGS analysis needs. Philosophically, this would be similar to using commercial software, but one step further. The advantage would be that you don't need to invest the time to become proficient in using the software (or the infrastructure for maintaining it), but the disadvantage is that you'd be one more step removed from your own data.

Or is outsourcing essentially just hiring your own bioinformatician (but only on an 'as needed' basis)?
We did that early on, as no one in our group had any experience with bioinformatics. I have to say that outsourcing (at least to a company) has some serious issues. In my experience, most companies provide shoddy statistics using highly questionable procedures. Basically, unless you're using long-established protocols (e.g., differential expression using microarray), I wouldn't trust any company to do the work.

Collaborating with another group is probably the best way to go.

Frankly, familiarising yourself with the tools of bioinformatics (including R and Linux, I'm obviously not talking about any programming here) is not terribly difficult. If someone can't do that, they should probably pick a different field.
dpryan is offline   Reply With Quote
Old 01-16-2013, 01:19 AM   #31
priesgo
Member
 
Location: Spain

Join Date: Aug 2012
Posts: 22
Default

In my honest opinion if you are doing biomedical research and you don't have bioinformaticians around and you are not planning to create a whole department it would be better to go for commercial software.
I am a bioinformatician and processing data from let's say RNAseq, DNAseq and ChIPseq would need three different analysis pipelines that even taking into account that you will be using open source existing tools it takes a loooooot of work and time to get everything ready and efficiently working.

About learning the command line, everybody can learn the command line, but you will also need after learning the time to do the work. It is not a thing of a couple of minutes even for a command line ninja. I agree anyway that it will be no bad practice to get a bit into the command line.

There it goes an example of commercial software doing the work for RNAseq and ChIPseq: https://www.integromics.com/products/genomics/ngs/.
No command line and publication ready figures. But this suits for a particular case you will have to find the one for your precise case.

(Don't blame for the ad, yes I work for this enterprise, but I think it is really a constructive example for this discussion)
priesgo is offline   Reply With Quote
Old 01-16-2013, 06:07 AM   #32
rfilbert
Member
 
Location: San Diego, CA

Join Date: Dec 2012
Posts: 43
Default

It was inevitable that the commercial software vendors would recommend their own tools on this thread. If I want to go commercial, why do I want to pay Integromics and also pay Spotfire for this? Spotfire has never been able to deal with data of NGS size and the fact that Integromics is adding an additional layer of expense is extremely unattractive.
rfilbert is offline   Reply With Quote
Old 01-16-2013, 07:05 AM   #33
priesgo
Member
 
Location: Spain

Join Date: Aug 2012
Posts: 22
Default

It was not my intention to make of this a discussion about Integromics or Spotfire software, it was just an example to justify the use of commercial software for a biomedical researcher with no bioinformatics expertise. And obviously this is the case that I best know.

I will just say about your critics that we use Spotfire for results visualization, which is OK. I don't know if Spotfire deals or not with NGS data but we do. We are not loading a 20GB BAM file into the Spotfire...

Coming back to the main topic I am still of the opinion that for a researcher with no bioinformatics expertise at hand, it will be cheaper, considering that time is money, to go for the commercial software.
priesgo is offline   Reply With Quote
Old 01-19-2013, 03:27 AM   #34
Nomijill
Member
 
Location: Southwest Florida

Join Date: Sep 2009
Posts: 24
Default As a taxpayer

I work for a commercial company, and our software is compared to Galaxy all the time, and we are glad that Galaxy and other publicly available tools are creating software. It drives us and other commercial companies to be better. Therefore, even the users of commercial software benefit from the government spending. Commercial products have to be a lot better than the free products in order to compete. Otherwise, why would someone pay for something they can get for free.
Nomijill is offline   Reply With Quote
Old 01-31-2013, 04:05 AM   #35
MenzZana
Junior Member
 
Location: Stockholm

Join Date: Jan 2013
Posts: 6
Default

Galaxy is a fine piece of software but it is lacking in certain aspects when it comes to analyzing NGS data.

How about an alternative approach?
Nowadays there are a lot of pipeline workflow programs that covers NGS analyzing.
The interesting concept is that...
1 Biologist produce the data
2 Bioinformaticians create a pipeline which they share (or some other knowledgeable scientist)
3 Biologist just utilizes the shared pipeline for their analysis

At the moment there are also people who share their pipelines in a variety of software.
Software is in this area come from both the commercial side and freeware side
Suggestions....

Taverna http://www.taverna.org.uk Freeware
KNIME http://www.knime.org/ mostly freeware (HPC version costs)
pipeline pilot http://accelrys.com/products/pipeline-pilot/ commercial

I would suggest a closer look at KNIME. A software which is free for your desktop to play around with and has some NGS tools
http://tech.knime.org/community/next...tionsequencing

Taverna seems a bit immature as a product compared to KNIME, and Pipeline Pilot costs a lot

Or does anyone have other suggestions?

Last edited by MenzZana; 01-31-2013 at 10:40 AM.
MenzZana is offline   Reply With Quote
Old 02-06-2013, 12:58 AM   #36
MenzZana
Junior Member
 
Location: Stockholm

Join Date: Jan 2013
Posts: 6
Default

On the same line of inguiries, there is a commercial product which seems promising
namely DNANexus.

https://platform.dnanexus.com/?utm_s...lassic_LeadsEA

That perhaps could answer you questions.
Perhaps someone around here has some prior experience with it
MenzZana is offline   Reply With Quote
Old 02-06-2013, 06:59 PM   #37
gringer
David Eccles (gringer)
 
Location: Wellington, New Zealand

Join Date: May 2011
Posts: 838
Default

Quote:
Originally Posted by dnart View Post
I don't think you are getting it. I am not going to learn command line programming, instead I am going to focus on learning more biology. The question was whether it is more cost effective to hire a bioinformaticist (~$75k/year) or buy commercial software (~5k/year).
I've mentioned this before, but you will have a lot of trouble getting reliable, repeatable results out of any software without a bioinformaticist. It requires a bit of computer / maths / biology knowledge in order to interpret the results and understand what results are real, and what results are due to other factors (e.g. input data garbage, covariates, normalisation methods, etc.).

If you're going to focus completely on biology, your choice is basically between a) hiring a bioinformaticist and giving them a minimal budget for spending on software and computing capability and b) hiring a bioinformaticist and giving them a large budget for spending on software and computing capability -- maybe more like $75k/year + $2k/year vs $65k/year + $10k/year.

Last edited by gringer; 02-06-2013 at 07:41 PM.
gringer is offline   Reply With Quote
Old 02-14-2013, 09:39 PM   #38
dandrews
Junior Member
 
Location: Canberra, Australia

Join Date: Apr 2010
Posts: 3
Default

Very interesting thread. From my perspective, as a bioinformatician, is that commercial bioinformatics software lags well behind the leading edge. If you are doing research and your questions contain bioinformatics challenges, then you can't use commercial software - you need bioinformaticians to talk with and/or to build you an appropriate solution. However, as certain analysis techniques get reduced to common practice commercial tools do a good job in packaging this in an easy to use manner.

For example, ten years ago even research bioinformaticians were only just feeling their way through the complexities of microarray analysis and commercial analysis tools produced garbage. Now that the complexities of microarray analysis are better understood, the commercial tools do a very good job and it would take quite a skilled bioinformatician to replicate their results.

I think with most applications of NGS we are still in the learning phase. Some of the commercial tools do an okay job for certain analyses. However, for many NGS applications, even the cutting edge command-line tools still produce dubious results and in this case you are probably better served with experienced bioinformatician who knows the pitfalls.
dandrews is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 01:03 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO