Intelligently manage your large HDF and HDF-based file collections on Azure cost-efficiently.
The Highly Scalable Data Service (HSDS) is a REST-based solution for reading and writing complex binary data formats within object-based storage environments such as the Cloud. Developed to make large datasets accessible in a manner that’s both fast and cost-effective, HSDS stores HDF5 file using a sharded data schema, but provides the functionality traditionally offered by the HDF5 library as accessible by any HTTP client.
As many organizations face the challenges of moving their data to the cloud, for many, major code changes and cost increases are top concerns. The Highly Scalable Data Service, running on Microsoft Azure, provides benefits for anyone with large amounts of data in HDF5 based files. Current users include those in the oil and gas, financial, and government sectors among others.
HSDS provides benefits across nearly every need for organizations have for their data. See improvements in scalability, with the ability to store petabytes of data, scale across multiple servers, and dynamically change the number of server nodes to meet client demands. You'll also see improved performance by leveraging smart data caching to accelerate object storage queries, processing single requests in parallel on the server, and running existing HDF5 applications faster by utilizing the automatic parallelization features of HSDS. Further performance improvements are brought through concurrency as HSDS supports multiple writers/multiple readers (even to the same file), simultaneous use from thousands of clients, and enabling applications to use multithreading. Even with these benefits, HSDS also brings simplicity and compatibility to the table by allowing users to rapidly shift large HDF5 files, applications, and infrastructure to the cloud while being compatible with any HDF5 based data (e.g. NetCDF4, Energistics, etc.) and enabling existing applications to use HSDS with minimal changes (HDF5 API and Python h5py API compatibility).
With these benefits, security and reliability are still an important aspect of HSDS performance. HTTP and HTTPs are supported, clients don’t need access to cloud storage and Role Base Access control (RBAC) can be used to easily manage group access with Access Control Lists (ACLs) to enable control on which users have access to individual data files. HSDS offers reliability as multiple copies of each object are stored (no danger of data being lost) and object updates are atomic, so no danger of files being corrupted.
The HDF Group is the developer of HDF5®, a high-performance software library and data format that has been adopted across multiple industries and is the de facto standard in the scientific and research community. HSDS offers users of HDF5-based files the next generation of performance and reliability.