August 08, 2010
I agree, but templates will always be a problem.

Sent from my iPhone

On Aug 8, 2010, at 6:48 AM, Jacob <doob at me.com> wrote:

> 
> On 8 aug 2010, at 14:26, Michel Fortin wrote:
> 
>> Le 2010-08-08 ? 1:47, Andrei Alexandrescu a ?crit :
>> 
>>> I think that would be great. Knowing nothing about Orange, I visited the website and read the feature lists and the tutorial (the reference seems to be missing for now). The latter contains:
>>> 
>>> auto a2 = serializer.deserialize!(A)(data);
>>> 
>>> which seems to require compile-time knowledge of the deserialized type. I'd expect the library to support something like
>>> 
>>> Object a2 = serializer.deserialize!Object(data);
>>> 
>>> and fill the object with an A. I'm pretty certain you've done that, it would be great to feature that within the tutorials and documentation. I'd also expect Variant to play a role there, e.g. you deserialize something and you get a Variant.
>> 
>> My own unreleased, unfinished and in-need-of-a-refactoring serialization module does that... but unfortunately dynamically recreating the right type cannot be so straightforward in the current state of runtime reflection.
>> 
>> This post turned out longer that I expected, please stay with me.
>> 
>> Runtime reflection currently gives you access *only* to the default constructor, so this is what my module do internally when unserializing a class:
>> 
>>    ClassInfo c = findClass(classNameFromSerializationStream);
>>    Object o = c.create();
>>    (cast(Unserializable)o).unserialize(serialiationStream);
>> 
>> Since we can't access a constructor with a different signature, we can't unserialize directly from the constructor. This is rather a weak point as it forces all objects to have a default constructor. Another options is for the user to manually register his own constructor with the serialization system prior unserializing, but that's much less convenient.
> 
> Currently I don't call the constructor, just creating an instance of the class and sets its fields. I don't know how good or bad that actually is. Another option would be to use the __ctor and call one of the constructors (if it has multiple constructors) with the default values for the signature.
> 
>> The unserialize member function called above must be explicitly added by the user (either manually or with a mixin) because the fields don't reflect at runtime and the actual class is unknown at compile-time. And the class needs to conform to an interface that contains that unserialize function so we can find it at runtime.
> 
> I think that is too much extra work. One of my goals was to be able to serialize third party types.
> 
>> So before adding a serialization library, I would suggest we solve the runtime-reflection problem and find a standard way to attach various attributes to types and members. That could be done as a library, but ideally it'd have some help from the compiler which could put this stuff where it really belongs: ClassInfo. Currently, QtD has its own mixins for that, my D/Objective-C bridge has its own mixins and class registration system, my serialization module has its own, surely Orange has its own, I believe PyD has its own... this is going to be a mess pretty soon if it isn't already.
>> 
>> Once we have a proper standardized runtime-reflection and attribute system, then the serialization module can focus on serialization instead of implementing various hacks to add and get to the information it needs.
> 
> That is absolutely the best solution. I tried to do the best I could with the current compiler/runtime.
> 
>> -- 
>> Michel Fortin
>> michel.fortin at michelf.com
>> http://michelf.com/
>> 
>> 
>> 
>> _______________________________________________
>> phobos mailing list
>> phobos at puremagic.com
>> http://lists.puremagic.com/mailman/listinfo/phobos
> 
> _______________________________________________
> phobos mailing list
> phobos at puremagic.com
> http://lists.puremagic.com/mailman/listinfo/phobos
August 08, 2010
On Sat, 2010-08-07 at 17:19 +0200, Jacob wrote:
> Is there any interest in having a serializer in Phobos? I have a serializer compatible with D2 which I licensed under the Boost license. This is the description from the project page:
> 
> Orange is a serialization library for D1 and D2, supporting both Tango and Phobos. It can serialize most of the available types in D, including third party types and can serialize through base class references. It supports fully automatic serialization of all supported types and also supports several ways to customize the serialization. Orange has a separate front end (the serializer) and back end (the archive) making it possible for the user to create new archive types that can be used with the existing serializer.
> 
> It's not very well tested but if there's some interest I'm hoping on getting more people to test the library. The project page is: http://dsource.org/projects/orange/

I agree (with everyone else) that Phobos should have a serialization lib.  And now it seems we're spoilt for choices -- both Masahiro's MsgPack serializer and Jacob's Orange are more or less complete, working solutions being offered to us.

I have very little experience with using serialization libs, so I have no idea how to determine which one is the better choice.  How do we decide which one to use?  Perhaps a vote on the NG?

-Lars

August 08, 2010
Let me explain with an example:

interface Container {}
class SList(T) : Container {}

-- App A --

SList!(int) x;
x.serialize();

-- App B --

Container x = deserialize();

This will only work if App B has instantiated an SList!(int) somewhere, otherwise the TypeInfo won't exist.  This particular issue is only a problem in one part of std.concurrency: priority messages.  If the user does a receive(int), let's say, and there's a priority message waiting of type string it will be thrown as PriorityMessageException!(string).  If the priority message is a template...

Other than that, std.concurrency will work just fine with Orange as-is because receive() is a template so the type of the expected data is available at compile-time.

Regarding the issue above, what I'll probably end up doing is throwing a PriorityMessageException!(SerializedType) (where SerializedType is a lot like a Variant) and the user can call .deserialize!(string) or whatever if he wants to extract the data.  If someone has a better suggestion for how to handle this, I'd love to hear it.

On Aug 8, 2010, at 7:03 AM, Sean Kelly wrote:

> I agree, but templates will always be a problem.
> 
> Sent from my iPhone
> 
> On Aug 8, 2010, at 6:48 AM, Jacob <doob at me.com> wrote:
> 
>> 
>> On 8 aug 2010, at 14:26, Michel Fortin wrote:
>> 
>>> Le 2010-08-08 ? 1:47, Andrei Alexandrescu a ?crit :
>>> 
>>>> I think that would be great. Knowing nothing about Orange, I visited the website and read the feature lists and the tutorial (the reference seems to be missing for now). The latter contains:
>>>> 
>>>> auto a2 = serializer.deserialize!(A)(data);
>>>> 
>>>> which seems to require compile-time knowledge of the deserialized type. I'd expect the library to support something like
>>>> 
>>>> Object a2 = serializer.deserialize!Object(data);
>>>> 
>>>> and fill the object with an A. I'm pretty certain you've done that, it would be great to feature that within the tutorials and documentation. I'd also expect Variant to play a role there, e.g. you deserialize something and you get a Variant.
>>> 
>>> My own unreleased, unfinished and in-need-of-a-refactoring serialization module does that... but unfortunately dynamically recreating the right type cannot be so straightforward in the current state of runtime reflection.
>>> 
>>> This post turned out longer that I expected, please stay with me.
>>> 
>>> Runtime reflection currently gives you access *only* to the default constructor, so this is what my module do internally when unserializing a class:
>>> 
>>>   ClassInfo c = findClass(classNameFromSerializationStream);
>>>   Object o = c.create();
>>>   (cast(Unserializable)o).unserialize(serialiationStream);
>>> 
>>> Since we can't access a constructor with a different signature, we can't unserialize directly from the constructor. This is rather a weak point as it forces all objects to have a default constructor. Another options is for the user to manually register his own constructor with the serialization system prior unserializing, but that's much less convenient.
>> 
>> Currently I don't call the constructor, just creating an instance of the class and sets its fields. I don't know how good or bad that actually is. Another option would be to use the __ctor and call one of the constructors (if it has multiple constructors) with the default values for the signature.
>> 
>>> The unserialize member function called above must be explicitly added by the user (either manually or with a mixin) because the fields don't reflect at runtime and the actual class is unknown at compile-time. And the class needs to conform to an interface that contains that unserialize function so we can find it at runtime.
>> 
>> I think that is too much extra work. One of my goals was to be able to serialize third party types.
>> 
>>> So before adding a serialization library, I would suggest we solve the runtime-reflection problem and find a standard way to attach various attributes to types and members. That could be done as a library, but ideally it'd have some help from the compiler which could put this stuff where it really belongs: ClassInfo. Currently, QtD has its own mixins for that, my D/Objective-C bridge has its own mixins and class registration system, my serialization module has its own, surely Orange has its own, I believe PyD has its own... this is going to be a mess pretty soon if it isn't already.
>>> 
>>> Once we have a proper standardized runtime-reflection and attribute system, then the serialization module can focus on serialization instead of implementing various hacks to add and get to the information it needs.
>> 
>> That is absolutely the best solution. I tried to do the best I could with the current compiler/runtime.
>> 
>>> -- 
>>> Michel Fortin
>>> michel.fortin at michelf.com
>>> http://michelf.com/
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> phobos mailing list
>>> phobos at puremagic.com
>>> http://lists.puremagic.com/mailman/listinfo/phobos
>> 
>> _______________________________________________
>> phobos mailing list
>> phobos at puremagic.com
>> http://lists.puremagic.com/mailman/listinfo/phobos
> _______________________________________________
> phobos mailing list
> phobos at puremagic.com
> http://lists.puremagic.com/mailman/listinfo/phobos

August 08, 2010
On 8 aug 2010, at 15:30, Michel Fortin wrote:

> Le 2010-08-08 ? 6:46, Jacob a ?crit :
> 
>> On 8 aug 2010, at 07:47, Andrei Alexandrescu wrote:
>> 
>>> I think that would be great. Knowing nothing about Orange, I visited the website and read the feature lists and the tutorial (the reference seems to be missing for now). The latter contains:
>>> 
>>> auto a2 = serializer.deserialize!(A)(data);
>>> 
>>> which seems to require compile-time knowledge of the deserialized type. I'd expect the library to support something like
>>> 
>>> Object a2 = serializer.deserialize!Object(data);
>> 
>> This is currently not possible in the library. I'm not sure if that would be possible, how would you deserialize a struct for example? There is no factory function for structs like there is for classes.
> 
> But there's no concept of derived struct. For a struct you always know the type at compile-time. The only way to hide a struct would be behind a void* or void[], but trying to serialize/unserialize that type automatically (without the user writing the serialization code itself) is pointless.

Yes, exactly, that is how the library currently works. But I can see how starting out by deserializing with the type Object could work. This is a description of how the serializer "thinks" when it deserializes a value:

auto a2 = serializer.deserialize!(Foo)(data);

"Ok, I'm deserializing a Foo"

1. Start by creating a new instance of Foo
2. Loop through all the instance variables

"Oh, I found a struct of the type Bar"

1. Create a new Bar
2. Loop through all the instance variables
3. Deserialize the values for each variable
4. Set the values for all the instance variables

3. Set the value for the instance variable of type Bar

continue deserializing...

Using the approach above I have all (or as much as possible) compile time information available, like the types of all the instance variables, the serializer is in control. Using Andrei's approach it seems more like this:

"Start looking in the archive after types"
"Ok, the archive wants me to deserialize an instance of Foo"

1. Start by creating a new instance of Foo using reflection:

Object foo = Object.factory("Foo");

2. In the archive, loop through all the instance variables
3. See if there is a corresponding field in the deserialized object by loop through all the instance variables using foo.getMembers
4. "Ok the archive wants me to deserialize a struct of the type Bar, hm how do I do that? I only have Bar as a string"

Using this approach all compile time information is lost, the archive is in control. Probably not the best explanation.

> Or you could hide it behind a variant, in which case the variant's serialization should remember the type name so it can find a proper deserializer on the other side. How exactly it does that? Either with better runtime-reflection, or with pre-registered handlers on the unserializer's side (not very convenient).
> 
> 
>> Since all the static types of the objects would be Object how would I set the values when deserializing? Or would Variant be useful here? I have not used Variant.
> 
> At this point I've been unable to serialize/unserialize a variant.
> 
> Another interesting point: it's probably necessary to be tolerant of type differences. For instance, if I serialize a size_t on a machine and unserialize it elsewhere, it might not be the same underlying integral type.

That deserves to think about.

> -- 
> Michel Fortin
> michel.fortin at michelf.com
> http://michelf.com/
> 
> 
> 
> _______________________________________________
> phobos mailing list
> phobos at puremagic.com
> http://lists.puremagic.com/mailman/listinfo/phobos

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.puremagic.com/pipermail/phobos/attachments/20100808/150d9a51/attachment.html>
August 08, 2010
On 8 aug 2010, at 16:16, Lars Tandle Kyllingstad wrote:

> On Sat, 2010-08-07 at 17:19 +0200, Jacob wrote:
>> Is there any interest in having a serializer in Phobos? I have a serializer compatible with D2 which I licensed under the Boost license. This is the description from the project page:
>> 
>> Orange is a serialization library for D1 and D2, supporting both Tango and Phobos. It can serialize most of the available types in D, including third party types and can serialize through base class references. It supports fully automatic serialization of all supported types and also supports several ways to customize the serialization. Orange has a separate front end (the serializer) and back end (the archive) making it possible for the user to create new archive types that can be used with the existing serializer.
>> 
>> It's not very well tested but if there's some interest I'm hoping on getting more people to test the library. The project page is: http://dsource.org/projects/orange/
> 
> I agree (with everyone else) that Phobos should have a serialization lib.  And now it seems we're spoilt for choices -- both Masahiro's MsgPack serializer and Jacob's Orange are more or less complete, working solutions being offered to us.

Can MessagePack serialize an object? I'm looking at the website and can't see that is has direct support for that.

> I have very little experience with using serialization libs, so I have no idea how to determine which one is the better choice.  How do we decide which one to use?  Perhaps a vote on the NG?

I think one could create a MessagePack archive for my serializer.

> -Lars
> 
> _______________________________________________
> phobos mailing list
> phobos at puremagic.com
> http://lists.puremagic.com/mailman/listinfo/phobos

August 08, 2010
Le 2010-08-08 ? 10:37, Jacob a ?crit :

> On 8 aug 2010, at 15:30, Michel Fortin wrote:
>> Another interesting point: it's probably necessary to be tolerant of type differences. For instance, if I serialize a size_t on a machine and unserialize it elsewhere, it might not be the same underlying integral type.
> 
> That deserves to think about.

Well, how it works in my serialization module is that all integral types are semantically identical once inside the serialized stream. So any integral type can be serialized/unserialized to another integral type. There's a runtime check that throws if the value to unserialize can't fit the type you're unserializing to.

Same for floating point.

-- 
Michel Fortin
michel.fortin at michelf.com
http://michelf.com/



August 08, 2010
Le 2010-08-08 ? 10:37, Jacob a ?crit :

> Yes, exactly, that is how the library currently works. But I can see how starting out by deserializing with the type Object could work. This is a description of how the serializer "thinks" when it deserializes a value:
> 
> auto a2 = serializer.deserialize!(Foo)(data);
> 
> "Ok, I'm deserializing a Foo"
> 
> 1. Start by creating a new instance of Foo
> 2. Loop through all the instance variables
> 
> "Oh, I found a struct of the type Bar"
> 
> 1. Create a new Bar
> 2. Loop through all the instance variables
> 3. Deserialize the values for each variable
> 4. Set the values for all the instance variables
> 
> 3. Set the value for the instance variable of type Bar
> 
> continue deserializing...
> 
> Using the approach above I have all (or as much as possible) compile time information available, like the types of all the instance variables, the serializer is in control. Using Andrei's approach it seems more like this:
> 
> "Start looking in the archive after types"
> "Ok, the archive wants me to deserialize an instance of Foo"
> 
> 1. Start by creating a new instance of Foo using reflection:
> 
> Object foo = Object.factory("Foo");
> 
> 2. In the archive, loop through all the instance variables
> 3. See if there is a corresponding field in the deserialized object by loop through all the instance variables using foo.getMembers
> 4. "Ok the archive wants me to deserialize a struct of the type Bar, hm how do I do that? I only have Bar as a string"
> 
> Using this approach all compile time information is lost, the archive is in control. Probably not the best explanation.

A better way to say it is that the archive tells you which class to instantiate, and this class should tell you (at runtime) how it can deserialize itself.

With my serialization module a class needs to be defined in a special way to be serializable: it needs to implement the encode/decode methods of the KeyArchivable interface (a mixin can implement them for you). And the class needs to have a default constructor. Or you could define an external handler for a certain class and register it prior unserialization. There's no other way with the current state of runtime-reflection in D.

-- 
Michel Fortin
michel.fortin at michelf.com
http://michelf.com/



August 08, 2010
On 08/08/2010 05:46 AM, Jacob wrote:
> On 8 aug 2010, at 07:47, Andrei Alexandrescu wrote:
>
>> I think that would be great. Knowing nothing about Orange, I visited the website and read the feature lists and the tutorial (the reference seems to be missing for now). The latter contains:
>>
>> auto a2 = serializer.deserialize!(A)(data);
>>
>> which seems to require compile-time knowledge of the deserialized type. I'd expect the library to support something like
>>
>> Object a2 = serializer.deserialize!Object(data);
>
> This is currently not possible in the library.

I see. This is probably the single most important requirement of a serialization library, by a large margin. A classic example of object orientation is the Shape hierarchy and the array of Shape objects that you draw on the screen etc. Where books are usually coy (and where classic object technology took a while to get up to snuff) is the save/restore part, e.g. once you have an array of Shapes, how do you save it to disk and how do you load it back?

Saving is easy because you already know the types of objects involved so you could define a virtual function save() that is customizable per type. Loading is not that easy because you need to bootstrap object types from the input stream - and here's where the factory pattern etc. come into play.

It is absolutely necessary that a serialization library makes scenarios like the above simple and fool-proof.

> I'm not sure if that
> would be possible, how would you deserialize a struct for example?
> There is no factory function for structs like there is for classes.

To deserialize a struct, I think it's reasonable to require that the receiver knows the struct statically. In the Thrift protocol things are more lax - you can e.g. write a struct Point containing two ints, and you could deserialize it as two ints. I think that's reasonable. The point is the stream contains primitive type information and class field information about all data trafficked.

> Since all the static types of the objects would be Object how would I set the values when deserializing?

Deserialize into Object and then cast the Object to Shape.

> Or would Variant be useful here? I
> have not used Variant.

Probably Variant would play a role when e.g. one wants to deserialize "the next primitive type" without needing to know exactly what type that is (e.g. different integer widths).


Andrei
August 08, 2010
Good point. This is not a template-specific problem; if you try to deserialize a derived class and the client doesn't know of that class... well there's not a lot one can do, unless you package the code of the methods with the object.

I think it's reasonable to limit (at least for now) things to requiring that the client knows about the exact type serialized. And they need to have a layout-compatible version. Which brings us to a related problem - versioning...


Andrei

On 08/08/2010 09:20 AM, Sean Kelly wrote:
> Let me explain with an example:
>
> interface Container {}
> class SList(T) : Container {}
>
> -- App A --
>
> SList!(int) x;
> x.serialize();
>
> -- App B --
>
> Container x = deserialize();
>
> This will only work if App B has instantiated an SList!(int) somewhere, otherwise the TypeInfo won't exist.  This particular issue is only a problem in one part of std.concurrency: priority messages.  If the user does a receive(int), let's say, and there's a priority message waiting of type string it will be thrown as PriorityMessageException!(string).  If the priority message is a template...
>
> Other than that, std.concurrency will work just fine with Orange as-is because receive() is a template so the type of the expected data is available at compile-time.
>
> Regarding the issue above, what I'll probably end up doing is throwing a PriorityMessageException!(SerializedType) (where SerializedType is a lot like a Variant) and the user can call .deserialize!(string) or whatever if he wants to extract the data.  If someone has a better suggestion for how to handle this, I'd love to hear it.
>
> On Aug 8, 2010, at 7:03 AM, Sean Kelly wrote:
>
>> I agree, but templates will always be a problem.
>>
>> Sent from my iPhone
>>
>> On Aug 8, 2010, at 6:48 AM, Jacob<doob at me.com>  wrote:
>>
>>>
>>> On 8 aug 2010, at 14:26, Michel Fortin wrote:
>>>
>>>> Le 2010-08-08 ? 1:47, Andrei Alexandrescu a ?crit :
>>>>
>>>>> I think that would be great. Knowing nothing about Orange, I visited the website and read the feature lists and the tutorial (the reference seems to be missing for now). The latter contains:
>>>>>
>>>>> auto a2 = serializer.deserialize!(A)(data);
>>>>>
>>>>> which seems to require compile-time knowledge of the deserialized type. I'd expect the library to support something like
>>>>>
>>>>> Object a2 = serializer.deserialize!Object(data);
>>>>>
>>>>> and fill the object with an A. I'm pretty certain you've done that, it would be great to feature that within the tutorials and documentation. I'd also expect Variant to play a role there, e.g. you deserialize something and you get a Variant.
>>>>
>>>> My own unreleased, unfinished and in-need-of-a-refactoring serialization module does that... but unfortunately dynamically recreating the right type cannot be so straightforward in the current state of runtime reflection.
>>>>
>>>> This post turned out longer that I expected, please stay with me.
>>>>
>>>> Runtime reflection currently gives you access *only* to the default constructor, so this is what my module do internally when unserializing a class:
>>>>
>>>>    ClassInfo c = findClass(classNameFromSerializationStream);
>>>>    Object o = c.create();
>>>>    (cast(Unserializable)o).unserialize(serialiationStream);
>>>>
>>>> Since we can't access a constructor with a different signature, we can't unserialize directly from the constructor. This is rather a weak point as it forces all objects to have a default constructor. Another options is for the user to manually register his own constructor with the serialization system prior unserializing, but that's much less convenient.
>>>
>>> Currently I don't call the constructor, just creating an instance of the class and sets its fields. I don't know how good or bad that actually is. Another option would be to use the __ctor and call one of the constructors (if it has multiple constructors) with the default values for the signature.
>>>
>>>> The unserialize member function called above must be explicitly added by the user (either manually or with a mixin) because the fields don't reflect at runtime and the actual class is unknown at compile-time. And the class needs to conform to an interface that contains that unserialize function so we can find it at runtime.
>>>
>>> I think that is too much extra work. One of my goals was to be able to serialize third party types.
>>>
>>>> So before adding a serialization library, I would suggest we solve the runtime-reflection problem and find a standard way to attach various attributes to types and members. That could be done as a library, but ideally it'd have some help from the compiler which could put this stuff where it really belongs: ClassInfo. Currently, QtD has its own mixins for that, my D/Objective-C bridge has its own mixins and class registration system, my serialization module has its own, surely Orange has its own, I believe PyD has its own... this is going to be a mess pretty soon if it isn't already.
>>>>
>>>> Once we have a proper standardized runtime-reflection and attribute system, then the serialization module can focus on serialization instead of implementing various hacks to add and get to the information it needs.
>>>
>>> That is absolutely the best solution. I tried to do the best I could with the current compiler/runtime.
>>>
>>>> --
>>>> Michel Fortin
>>>> michel.fortin at michelf.com
>>>> http://michelf.com/
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> phobos mailing list
>>>> phobos at puremagic.com
>>>> http://lists.puremagic.com/mailman/listinfo/phobos
>>>
>>> _______________________________________________
>>> phobos mailing list
>>> phobos at puremagic.com
>>> http://lists.puremagic.com/mailman/listinfo/phobos
>> _______________________________________________
>> phobos mailing list
>> phobos at puremagic.com
>> http://lists.puremagic.com/mailman/listinfo/phobos
>
> _______________________________________________
> phobos mailing list
> phobos at puremagic.com
> http://lists.puremagic.com/mailman/listinfo/phobos
August 08, 2010
Le 2010-08-08 ? 11:31, Andrei Alexandrescu a ?crit :

> Good point. This is not a template-specific problem; if you try to deserialize a derived class and the client doesn't know of that class... well there's not a lot one can do, unless you package the code of the methods with the object.
> 
> I think it's reasonable to limit (at least for now) things to requiring that the client knows about the exact type serialized. And they need to have a layout-compatible version. Which brings us to a related problem - versioning...

My preferred way to handle versioning is to not have to handle it. I generally use key-value pairs to store the content of aggregates (structs, classes). This means I can grow the number of members over time while keeping backward compatibility. If a member is missing from the serialization I use the default value or the class/struct can handle the case with a more specialized behaviour. I can also continue serializing a no-logner necessary value to keep the aggregate backward compatible with older code.

This also makes things less fragile in regard to layout changes: you don't need to keep variables in the same order on all platforms, in all versions, etc.

So that's how I've build my serialization module. I probably should stop talking about it and finish it instead... :-)

-- 
Michel Fortin
michel.fortin at michelf.com
http://michelf.com/