Thread overview
HDF5 bindings for D
Dec 22, 2014
Laeeth Isharc
Dec 22, 2014
Rikki Cattermole
Dec 22, 2014
Laeeth Isharc
Dec 22, 2014
John Colvin
December 22, 2014
https://github.com/Laeeth/d_hdf5

HDF5 is a very valuable tool for those working with large data sets.

From HDF5group.org

HDF5 is a unique technology suite that makes possible the management of extremely large and complex data collections. The HDF5 technology suite includes:

* A versatile data model that can represent very complex data objects and a wide variety of metadata.
* A completely portable file format with no limit on the number or size of data objects in the collection.
* A software library that runs on a range of computational platforms, from laptops to massively parallel systems, and implements a high-level API with C, C++, Fortran 90, and Java interfaces.
* A rich set of integrated performance features that allow for access time and storage space optimizations.
* Tools and applications for managing, manipulating, viewing, and analyzing the data in the collection.
* The HDF5 data model, file format, API, library, and tools are open and distributed without charge.

From h5py.org:
[HDF5] lets you store huge amounts of numerical data, and easily manipulate that data from NumPy. For example, you can slice into multi-terabyte datasets stored on disk, as if they were real NumPy arrays. Thousands of datasets can be stored in a single file, categorized and tagged however you want.

H5py uses straightforward NumPy and Python metaphors, like dictionary and NumPy array syntax. For example, you can iterate over datasets in a file, or check out the .shape or .dtype attributes of datasets. You don't need to know anything special about HDF5 to get started.

In addition to the easy-to-use high level interface, h5py rests on a object-oriented Cython wrapping of the HDF5 C API. Almost anything you can do from C in HDF5, you can do from h5py.

Best of all, the files you create are in a widely-used standard binary format, which you can exchange with other people, including those who use programs like IDL and MATLAB.

===========
As far as I know there has not really been a complete set of HDF5 bindings for D yet.

Bindings should have three levels:
1. pure C API declaration
2. 'nice' D wrapper around C API (eg that knows about strings, not just char*)
3. idiomatic D interface that uses CTFE/templates

I borrowed Stefan Frijter's work on (1) above to get started.  I cannot keep track of things when split over too many source files, so I put everything in one file - hdf5.d.

Have implemented a basic version of 2.  Includes throwOnError rather than forcing checking status C style, but the exception code is not very good/complete (time + lack of experience with D exceptions).

(3) will have to come later.

It's more or less complete, and the examples I have translated so far mostly work.  But still a work in progress.  Any help/suggestions appreciated.  [I am doing this for myself, so project is not as pretty as I would like in an ideal world].


https://github.com/Laeeth/d_hdf5
December 22, 2014
On 22/12/2014 5:51 p.m., Laeeth Isharc wrote:
> https://github.com/Laeeth/d_hdf5
>
> HDF5 is a very valuable tool for those working with large data sets.
>
>  From HDF5group.org
>
> HDF5 is a unique technology suite that makes possible the management of
> extremely large and complex data collections. The HDF5 technology suite
> includes:
>
> * A versatile data model that can represent very complex data objects
> and a wide variety of metadata.
> * A completely portable file format with no limit on the number or size
> of data objects in the collection.
> * A software library that runs on a range of computational platforms,
> from laptops to massively parallel systems, and implements a high-level
> API with C, C++, Fortran 90, and Java interfaces.
> * A rich set of integrated performance features that allow for access
> time and storage space optimizations.
> * Tools and applications for managing, manipulating, viewing, and
> analyzing the data in the collection.
> * The HDF5 data model, file format, API, library, and tools are open and
> distributed without charge.
>
>  From h5py.org:
> [HDF5] lets you store huge amounts of numerical data, and easily
> manipulate that data from NumPy. For example, you can slice into
> multi-terabyte datasets stored on disk, as if they were real NumPy
> arrays. Thousands of datasets can be stored in a single file,
> categorized and tagged however you want.
>
> H5py uses straightforward NumPy and Python metaphors, like dictionary
> and NumPy array syntax. For example, you can iterate over datasets in a
> file, or check out the .shape or .dtype attributes of datasets. You
> don't need to know anything special about HDF5 to get started.
>
> In addition to the easy-to-use high level interface, h5py rests on a
> object-oriented Cython wrapping of the HDF5 C API. Almost anything you
> can do from C in HDF5, you can do from h5py.
>
> Best of all, the files you create are in a widely-used standard binary
> format, which you can exchange with other people, including those who
> use programs like IDL and MATLAB.
>
> ===========
> As far as I know there has not really been a complete set of HDF5
> bindings for D yet.
>
> Bindings should have three levels:
> 1. pure C API declaration
> 2. 'nice' D wrapper around C API (eg that knows about strings, not just
> char*)
> 3. idiomatic D interface that uses CTFE/templates
>
> I borrowed Stefan Frijter's work on (1) above to get started.  I cannot
> keep track of things when split over too many source files, so I put
> everything in one file - hdf5.d.
>
> Have implemented a basic version of 2.  Includes throwOnError rather
> than forcing checking status C style, but the exception code is not very
> good/complete (time + lack of experience with D exceptions).
>
> (3) will have to come later.
>
> It's more or less complete, and the examples I have translated so far
> mostly work.  But still a work in progress.  Any help/suggestions
> appreciated.  [I am doing this for myself, so project is not as pretty
> as I would like in an ideal world].
>
>
> https://github.com/Laeeth/d_hdf5

You seem to be missing your dub file. Would be rather hard to get it onto dub repository without it ;)
Oh and keep the bindings separate from wrappers in terms of subpackages.
December 22, 2014
On Monday, 22 December 2014 at 05:04:10 UTC, Rikki Cattermole wrote:
> You seem to be missing your dub file. Would be rather hard to get it onto dub repository without it ;)
> Oh and keep the bindings separate from wrappers in terms of subpackages.

Thanks - added now.

Will work on separating out bindings when have a bit more time, but it should be easy enough.
December 22, 2014
On Monday, 22 December 2014 at 04:51:44 UTC, Laeeth Isharc wrote:
> https://github.com/Laeeth/d_hdf5
>
> HDF5 is a very valuable tool for those working with large data sets.
>
> From HDF5group.org
>
> HDF5 is a unique technology suite that makes possible the management of extremely large and complex data collections. The HDF5 technology suite includes:
>
> * A versatile data model that can represent very complex data objects and a wide variety of metadata.
> * A completely portable file format with no limit on the number or size of data objects in the collection.
> * A software library that runs on a range of computational platforms, from laptops to massively parallel systems, and implements a high-level API with C, C++, Fortran 90, and Java interfaces.
> * A rich set of integrated performance features that allow for access time and storage space optimizations.
> * Tools and applications for managing, manipulating, viewing, and analyzing the data in the collection.
> * The HDF5 data model, file format, API, library, and tools are open and distributed without charge.
>
> From h5py.org:
> [HDF5] lets you store huge amounts of numerical data, and easily manipulate that data from NumPy. For example, you can slice into multi-terabyte datasets stored on disk, as if they were real NumPy arrays. Thousands of datasets can be stored in a single file, categorized and tagged however you want.
>
> H5py uses straightforward NumPy and Python metaphors, like dictionary and NumPy array syntax. For example, you can iterate over datasets in a file, or check out the .shape or .dtype attributes of datasets. You don't need to know anything special about HDF5 to get started.
>
> In addition to the easy-to-use high level interface, h5py rests on a object-oriented Cython wrapping of the HDF5 C API. Almost anything you can do from C in HDF5, you can do from h5py.
>
> Best of all, the files you create are in a widely-used standard binary format, which you can exchange with other people, including those who use programs like IDL and MATLAB.
>
> ===========
> As far as I know there has not really been a complete set of HDF5 bindings for D yet.
>
> Bindings should have three levels:
> 1. pure C API declaration
> 2. 'nice' D wrapper around C API (eg that knows about strings, not just char*)
> 3. idiomatic D interface that uses CTFE/templates
>
> I borrowed Stefan Frijter's work on (1) above to get started.  I cannot keep track of things when split over too many source files, so I put everything in one file - hdf5.d.
>
> Have implemented a basic version of 2.  Includes throwOnError rather than forcing checking status C style, but the exception code is not very good/complete (time + lack of experience with D exceptions).
>
> (3) will have to come later.
>
> It's more or less complete, and the examples I have translated so far mostly work.  But still a work in progress.  Any help/suggestions appreciated.  [I am doing this for myself, so project is not as pretty as I would like in an ideal world].
>
>
> https://github.com/Laeeth/d_hdf5

Also relevant to some: http://code.dlang.org/packages/netcdf