Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • What do you do with your database?

    Dear All,
    In a different thread which I posted earlier (Core cluster setup...), westerman suggested I post a new thread with the question what do you do with your DB. I am indeed curious what do you do with your database? I would believe that trying to store the NGS data in something like a SQL database is a lost enterprise. So my questions are

    1) What do you do with your Database?

    2) How do you store your NGS data?

    3) Do you have any troubles with accessing your data on a repeated basis?

    4) What are the biggest bottlenecks you commonly encounter with regards to the data management?

    5) Do you have a commercial solution or a home-grown one?

    Thank you for your time and I shall look forward to your replies.
    Regards
    Quantrix

  • #2
    At the moment we don't use a database. As you say the files are huge. It would be important to store variants etc for comparison if you are always working on one large project, but here we have a lot of smaller/medium projects which aren't relevant to compare to each other.
    Also keep in mind your users might not be trained in database-based analysis, so a good front end will be important.

    Comment


    • #4
      We had an oracle DB housing our SRF + fastq files when we were still generating those. It was huge, but worked well and had a fuse layer to transparently make it visible to the users. The DB was in two halves - a large set of partitions holding blobs (actually oracle "secure files" I think) and a far smaller meta-data component that tracked where things were. It would have worked OK using a filesystem instead of the binary blobs though - there are pros and cons to each method.

      We've since switched both format and DB mechanism for raw data: we store BAM files in an iRODs system.

      The analysis bams & co (ie mapped or assembled data, vcf files, etc) are less clearly divided - stored in various project/group directories over a variety of file system types; slow & fast NFS storage, lustre, etc.

      The only real bottlenecks are if someone tries to access a single DB layer (like the fuse layer) from 1000+ cores on our cpu farm. We require that people copy data to something more scalable first which we use Lustre for.

      Comment


      • #5
        Hi Mapper and jkbonfield,
        That is very helpful indeed!
        Thanks
        Quantrix

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Recent Advances in Sequencing Analysis Tools
          by seqadmin


          The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
          05-06-2024, 07:48 AM
        • seqadmin
          Essential Discoveries and Tools in Epitranscriptomics
          by seqadmin




          The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
          04-22-2024, 07:01 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, Yesterday, 06:57 AM
        0 responses
        12 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 05-06-2024, 07:17 AM
        0 responses
        16 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 05-02-2024, 08:06 AM
        0 responses
        20 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-30-2024, 12:17 PM
        0 responses
        24 views
        0 likes
        Last Post seqadmin  
        Working...
        X