August 19, 2015
What is HDF5, and why should you use it ?

http://www.hdfgroup.org/why_hdf/

(My summary):
  - very large data sets, very fast access requirements, and complex datasets
  - share data across variety of platforms
  - many open-source and commercial tools that understand HDF
  - self-describing and can specify complex data relationships and dependencies
  - can contain binary data in many representations
  - allow direct access to parts of file without first parsing whole contents
  - hierarchical data objects can be expressed in natural manner (contrast
    experience with realational database tables)
  - n-dimensional datasets and each element in set may be complex object
  - relational databases good for field matching queries but not for
    sequentially processing all records in database or for subsetting data
    based on co-ordinate style lookup
  - custom proprietary binary formats often not portable, not extensible and
    not high-performance.  technical debt to maintain data management part of
    code

I personally find it useful for storing price data for financial instruments, and also economic data.  There are bespoke time series databases, but they come at a price, which is not purely a pecuniary one.

Updated wrappers are here:
https://github.com/Laeeth/d_hdf5

Changes since last time - some fixes to bindings and updates to later version of HDF5 API.  There is more to go to make it accessible idiomatically from D, but it's usable today.  A simple example of mapping D structs to HDF5 types and back again is in the examples/traits directory.

Pull requests and offers to help maintain it are welcome.  It's still at an alpha stage, but already useful.



Laeeth.