I am interested in people's best practices in using a database to store sequencing data.
I perform lots of immune sequencing, and each one of my reads gets highly annotated. Early on, when I would want to compute something, I used to iterated through an entire flatfile, building some data structure in python (typically dictionaries of dictionaries). Then I realized that I am doing exactly what databases were designed for, including searching, grouping, etc.
So I elected to try MongoDB, which is a noSQL database. It was easy to use, as you can just dump JSON objects into it, which was a very natural conversion from python. So far, the performance has been very fast, and I am quite pleased.
But I am now preparing to generate a large amount of additional data, and was wondering whether it would make sense to try out a traditional SQL database, like MySQL or Postgres. Any thoughts as to how they all compare for sequencing data? How do people typically organize their data in a relational database?
Thanks!
Uri
I perform lots of immune sequencing, and each one of my reads gets highly annotated. Early on, when I would want to compute something, I used to iterated through an entire flatfile, building some data structure in python (typically dictionaries of dictionaries). Then I realized that I am doing exactly what databases were designed for, including searching, grouping, etc.
So I elected to try MongoDB, which is a noSQL database. It was easy to use, as you can just dump JSON objects into it, which was a very natural conversion from python. So far, the performance has been very fast, and I am quite pleased.
But I am now preparing to generate a large amount of additional data, and was wondering whether it would make sense to try out a traditional SQL database, like MySQL or Postgres. Any thoughts as to how they all compare for sequencing data? How do people typically organize their data in a relational database?
Thanks!
Uri
Comment