Go Back   SEQanswers > Bioinformatics > Bioinformatics

Similar Threads
Thread Thread Starter Forum Replies Last Post
Publicly available FASTA Database ? kursuni Bioinformatics 15 09-30-2011 12:49 AM
Mirroring/creating the database VIX_Z General 4 07-04-2011 12:25 AM
Transcriptome database? kumtl General 13 06-13-2010 08:59 PM
Database of BLAST CarlElit Bioinformatics 1 01-04-2010 06:23 AM
biological database onmcv Bioinformatics 3 07-20-2009 07:09 AM

Thread Tools
Old 02-23-2011, 11:41 AM   #1
Location: Pennsylvania

Join Date: Jan 2011
Posts: 21
Default What do you do with your database?

Dear All,
In a different thread which I posted earlier (Core cluster setup...), westerman suggested I post a new thread with the question what do you do with your DB. I am indeed curious what do you do with your database? I would believe that trying to store the NGS data in something like a SQL database is a lost enterprise. So my questions are

1) What do you do with your Database?

2) How do you store your NGS data?

3) Do you have any troubles with accessing your data on a repeated basis?

4) What are the biggest bottlenecks you commonly encounter with regards to the data management?

5) Do you have a commercial solution or a home-grown one?

Thank you for your time and I shall look forward to your replies.
quantrix is offline   Reply With Quote
Old 02-24-2011, 07:13 AM   #2
Senior Member
Location: Germany

Join Date: Oct 2008
Posts: 414

At the moment we don't use a database. As you say the files are huge. It would be important to store variants etc for comparison if you are always working on one large project, but here we have a lot of smaller/medium projects which aren't relevant to compare to each other.
Also keep in mind your users might not be trained in database-based analysis, so a good front end will be important.
colindaven is offline   Reply With Quote
Old 02-25-2011, 12:14 AM   #3
Location: India

Join Date: Nov 2010
Posts: 17


Here are some links which will be helpful
mapper is offline   Reply With Quote
Old 02-25-2011, 12:24 AM   #4
Senior Member
Location: Cambridge, UK

Join Date: Jul 2008
Posts: 146

We had an oracle DB housing our SRF + fastq files when we were still generating those. It was huge, but worked well and had a fuse layer to transparently make it visible to the users. The DB was in two halves - a large set of partitions holding blobs (actually oracle "secure files" I think) and a far smaller meta-data component that tracked where things were. It would have worked OK using a filesystem instead of the binary blobs though - there are pros and cons to each method.

We've since switched both format and DB mechanism for raw data: we store BAM files in an iRODs system.

The analysis bams & co (ie mapped or assembled data, vcf files, etc) are less clearly divided - stored in various project/group directories over a variety of file system types; slow & fast NFS storage, lustre, etc.

The only real bottlenecks are if someone tries to access a single DB layer (like the fuse layer) from 1000+ cores on our cpu farm. We require that people copy data to something more scalable first which we use Lustre for.
jkbonfield is offline   Reply With Quote
Old 02-25-2011, 05:39 PM   #5
Location: Pennsylvania

Join Date: Jan 2011
Posts: 21

Hi Mapper and jkbonfield,
That is very helpful indeed!
quantrix is offline   Reply With Quote

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

All times are GMT -8. The time now is 12:23 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO