Hi,
I'm sort of new to NGS, particularly HiSeq. Currently at Vanderbilt Medical Center. We get billions of reads back from antibody sequences and was wondering if anyone had any literature or links I could look at for optimizing databases for handling such large amounts of data. I'm finding that mySQL is having a hard time indexing, mongodb is having trouble with strings etc. The sequences are beautiful, the processing is hard.
I'm sort of new to NGS, particularly HiSeq. Currently at Vanderbilt Medical Center. We get billions of reads back from antibody sequences and was wondering if anyone had any literature or links I could look at for optimizing databases for handling such large amounts of data. I'm finding that mySQL is having a hard time indexing, mongodb is having trouble with strings etc. The sequences are beautiful, the processing is hard.
Comment