January 14, 2015
>>> struct File { Location _location; alias _location this; ... }
>>>
>>> // group.d
>>> public import commonfg;
>>> struct File { Location _location; alias _location this; ... }
>>>
>>> // commonfg.d { ... }
>>> enum isContainer(T) = is(T: File) || is(T : Group);
>>> auto method1(T)(T obj, args) if (isContainer!T) { ... }
>>> auto method2(T)(T obj, args) if (isContainer!T) { ... }
>>>
>>> I guess two of my gripes with UFCS is (a) you really have to
>>
>>>
>>> // another hdf-specific thing here but a good example in general is that some functions return you an id for an object which is one of the location subtypes (e.g. it could be a File or could be a Group depending on run-time conditions), so it kind of feels natural to use polymorphism and classes for that, but what would you do with the struct approach? The only thing that comes to mind is Variant, but it's quite meh to use in practice.
>>
>> Void unlink(File f){}
>> Void unlink(Group g){}
>>
>> For simple cases maybe one can keep it simple, and despite the Byzantine interface what one is trying to do when using HDF5 is not intrinsically so complex.
> So your solution is copying and pasting the code?
>
> But now repeat that for 200 other functions and a dozen more types that can be polymorphic in weirdest ways possible...

If you are simply have a few lines calling the API and the validation is different enough for file and group (I haven't written unlink yet) then why not (and move proper shared code out into helper functions).  The alternative is a long method with lots of conditions, which may be the best in some cases but may be harder to follow.

I do like the h5py and pytables approaches.  One doesn't need to bother too much with the implementation when using their library.  However, what I am doing is quite simple from a data perspective - a decent amount of it, but it is not an interesting problem from a theoretical perspective - just execution.  Now if you are higher octane as a user you may be able to see what I cannot.  But on the other hand, the Pareto principle applies, and in my view a library should make it simple to do simple things.  One can't get there if the primary interface is a direct mapping of the HDF5 hierarchy, and I also think that is unnecessary with D.

But I very much appreciate your work as the final result is better for everyone that way, and you are evidently a much longer running user of D than me.  I never used C++ as it just seemed too ugly! and I suspect the difference in backgrounds is shaping perspectives.

What do you think the trickiest parts are with HDF5?  (You mention weird polymorphism).



Laeeth
January 14, 2015
On Wednesday, 14 January 2015 at 16:27:17 UTC, Laeeth Isharc wrote:
>
>>>> struct File { Location _location; alias _location this; ... }
>>>>
>>>> // group.d
>>>> public import commonfg;
>>>> struct File { Location _location; alias _location this; ... }
>>>>
>>>> // commonfg.d { ... }
>>>> enum isContainer(T) = is(T: File) || is(T : Group);
>>>> auto method1(T)(T obj, args) if (isContainer!T) { ... }
>>>> auto method2(T)(T obj, args) if (isContainer!T) { ... }
>>>>
>>>> I guess two of my gripes with UFCS is (a) you really have to
>>>
>>>>
>>>> // another hdf-specific thing here but a good example in general is that some functions return you an id for an object which is one of the location subtypes (e.g. it could be a File or could be a Group depending on run-time conditions), so it kind of feels natural to use polymorphism and classes for that, but what would you do with the struct approach? The only thing that comes to mind is Variant, but it's quite meh to use in practice.
>>>
>>> Void unlink(File f){}
>>> Void unlink(Group g){}
>>>
>>> For simple cases maybe one can keep it simple, and despite the Byzantine interface what one is trying to do when using HDF5 is not intrinsically so complex.
>> So your solution is copying and pasting the code?
>>
>> But now repeat that for 200 other functions and a dozen more types that can be polymorphic in weirdest ways possible...
>
> If you are simply have a few lines calling the API and the validation is different enough for file and group (I haven't written unlink yet) then why not (and move proper shared code out into helper functions).  The alternative is a long method with lots of conditions, which may be the best in some cases but may be harder to follow.
>
> I do like the h5py and pytables approaches.  One doesn't need to bother too much with the implementation when using their library.
>  However, what I am doing is quite simple from a data perspective - a decent amount of it, but it is not an interesting problem from a theoretical perspective - just execution.  Now if you are higher octane as a user you may be able to see what I cannot.  But on the other hand, the Pareto principle applies, and in my view a library should make it simple to do simple things.  One can't get there if the primary interface is a direct mapping of the HDF5 hierarchy, and I also think that is unnecessary with D.
>
> But I very much appreciate your work as the final result is better for everyone that way, and you are evidently a much longer running user of D than me.  I never used C++ as it just seemed too ugly! and I suspect the difference in backgrounds is shaping perspectives.
>
> What do you think the trickiest parts are with HDF5?  (You mention weird polymorphism).
>
>
>
> Laeeth
I don't think you've read h5py source in enough detail :) It's based HEAVILY on duck typing. In addition, it has way MORE classes than the C++ hierarchy does. E.g., the high-level File object actually has these parents: File : Group, Group : HLObject, MutableMappingWithLock, HLObject : CommonStateObject and internally the File also keeps a reference to file id which is an instance of FileID which inherits from GroupID which inherits from ObjectID, do I need to continue? :) PyTables, on the contrary is quite badly written (although it works quite well and there are brilliant folks on the dev team like francesc alted) and looks like a dump of C code interweaved with hackish Python code.

In h5py you can do things like file["/dataset"].write(...) --> this just wouldn't work as is in a strictly typed language since the indexing operator generally returns you something of a Location type (or an interface, rather) which can be a group/datatype/dataset which is only known at runtime. Out of all of them, only the dataset supports the write method but you don't know it's going to be a dataset. See the problem? I don't want the user code to deal with any of the HDF5 C API and/or have a bunch of if conditions or explicit casts which is outright ugly. Ideally, it would work kind of like H5PY, abstracting the user away from refcounting, error code checking after each operation, object type checking and all that stuff.
January 14, 2015
> I don't think you've read h5py source in enough detail :)

You're right - I haven't done more than browsed it.

> It's based HEAVILY on duck typing.

There is a question here about what to do in D.  On the one hand, the flexibility of being able to open a foreign HDF5 file where you don't know beforehand the dataset type is very nice.  On the other, the adaptations needed to handle this flexibly get in the way when you are dealing with your own data that has a set format and where recompilation is acceptable if it changes.  Looking at the 'ease' of processing JSON, even using vibed, I think that one will need to implement both eventually, but perhaps starting with static typing.


> In addition, it has way MORE classes than the C++ hierarchy does. E.g., the high-level File object actually has these parents: File : Group, Group : HLObject, MutableMappingWithLock, HLObject : CommonStateObject and internally the File also keeps a reference to file id which is an instance of FileID which inherits from GroupID which inherits from ObjectID, do I need to continue?

Okay - I guess there is a distinction between the interface to the outside world (where I think the h5py etc way is superior for most uses) and the implementation.  Is not the reason h5py has lots of classes primarily because that is how you write good code in python, whereas in many cases this is not true in D (not that you should ban classes, but often structs + free floating functions are more suitable).

> PyTables, on the contrary is quite badly written (although it works quite well and there are brilliant folks on the dev team like francesc alted) and looks like a dump of C code interweaved with hackish Python code.

Interesting.  What do you think is low quality about the design?

> In h5py you can do things like file["/dataset"].write(...) --> this just wouldn't work as is in a strictly typed language since the indexing operator generally returns you something of a Location type (or an interface, rather) which can be a group/datatype/dataset which is only known at runtime.

Well, if you don't mind recompiling your code when the data set type changes (or you encounter a new data set) then you can do that (which is what I posted a link to earlier).

It depends on your use case.  It's hard to think of an application more dynamic than web sites, and yet people seem happy enough with vibed's use of compiled diet templates as the primary implementation.  They would like the option of dynamic ones too, and I think this would be useful in this domain too, since one does look at foreign data on occasion.  One could of course use the quick compilation of D to regenerate parts of the code when this happens.  Whether or not this is acceptable depends on your use case - for some it might be okay, but obviously it is no good if you are writing a generic H5 browser/charting tool.

So I think if you don't allow static dataset typing it means the flexibility  of dynamic typing gets in the way for some uses (which might be most of them), but you need to add dynamic typing too.

Shall we move this to a different thread and/or email, as I am afraid I have hijacked the poor original poster's request.

On the refcounting question, I confess that I do not fully understand your concern, which may well reflect a lack of deep experience with D on my part.  Adam Ruppe suggests that it's generally okay to rely on a struct destructor to call C cleanup code.  I can appreciate this may not be true with h5 and, if you can spare the time, I would love to understand more precisely why not.

> Out of all of them, only the dataset supports the write method but you don't know it's going to be a dataset. See the problem?

In this case I didn't quite follow.  Where does this fall down ?

void h5write(T)(Dataset x, T data)


I have your email somewhere and will drop you a line.  Or you can email me laeeth at laeeth.com.  And let's create a new thread.



Laeeth.
1 2 3
Next ›   Last »