October 01, 2011
On Thu, 29 Sep 2011 14:58:30 -0400, Jacob Carlborg <doob@me.com> wrote:

> I would like to have some form of pre-review of my serialization library
> Orange for later inclusion in Phobos as std.serialization (or similar).

[snip]

(1)
The first example in class Serializer is:

auto serializer = new Serializer;

Shouldn't it be

auto serializer = new Serializer(archive);

(2)
orange.serialization.archives.XmlArchive need to be documented.

(3)
if Archive.Array (which is poorly named btw) "is a type independent representation of an array" then why does it contain an elementSize field?
(3a)
Also by the time archiving is called, isSliceOf should always return false. That this function exists speaks to design problems both large and small. On the small scale, isSliceOf indicates that you are testing every array against every other array for slicing, which I hope is not the case. On the large scale, all the alias/object/pointer/arrays/etc resolutions should be done by the serializer not the archive. The archive should only be responsible for converting the internal representation of the serializer to JSON/XML/YAML/etc.
(3b)
Given that Slice has an ID field, why doesn't array.

(4)
Why have an Archive Interface and a Base class with common functionality? Why not an abstract class? Also, 'Base' isn't an acceptable class name for a Phobos module. Use 'ArchiveBase' or something more unique instead.
October 01, 2011
On 2011-10-01 05:00, Robert Jacques wrote:
> I agree, which is why I suggested lookup should have some granuality.
> i.e. that there is both a global store of serialization methods and a
> per instance store of serialization methods. Lookup would first look in
> the local store before defaulting to the global store. But this should
> be a separate pair of functions.

Aah, now I get it. That's a good idea. The question is what to name the two functions. Yet another use case for overloading methods on static.

> I'm sorry, I was thinking about archive types (i.e. JSON vs XML) and
> somehow thinking that the Serializers would be different for each. (I
> was thinking that the serializer was templated on the archive for some
> reason.)

Ok, no problem.

> Both
>
> T deserialize (T)();
> T deserialize (T)(string key);
>
> have the following example:
>
> class Foo
> {
> int a;
>
> void fromData (Serializer serializer, Serializer.Data key)
> {
> a = serializer!(int)("a");
> }
> }


No, "T deserialize (T)(string key)" has:

a = serializer!(int)("a");

And "T deserialize (T)()" has:

a = serializer!(int)();

Both are correct. This is a complete example of using one of these methods: https://github.com/jacob-carlborg/orange/blob/master/tests/Custom.d

Let me know if anything is confusing.

-- 
/Jacob Carlborg
October 01, 2011
On 2011-10-01 06:29, Robert Jacques wrote:
> On Thu, 29 Sep 2011 14:58:30 -0400, Jacob Carlborg <doob@me.com> wrote:
>
>> I would like to have some form of pre-review of my serialization library
>> Orange for later inclusion in Phobos as std.serialization (or similar).
>
> [snip]
>
> (1)
> The first example in class Serializer is:
>
> auto serializer = new Serializer;
>
> Shouldn't it be
>
> auto serializer = new Serializer(archive);

You're right, thanks.

> (2)
> orange.serialization.archives.XmlArchive need to be documented.

I was hoping the Archive interface and the Base abstract class would be enough.

> (3)
> if Archive.Array (which is poorly named btw) "is a type independent
> representation of an array" then why does it contain an elementSize field?

Suggestions for other names are welcome. Perhaps it was poorly worded, but what I mean is that this type can represent all array types.

> (3a)
> Also by the time archiving is called, isSliceOf should always return
> false.

Why is that?

> That this function exists speaks to design problems both large
> and small. On the small scale, isSliceOf indicates that you are testing
> every array against every other array for slicing, which I hope is not
> the case.

I do. How would I otherwise discover if an array is a slice of another array or not?

> On the large scale, all the alias/object/pointer/arrays/etc
> resolutions should be done by the serializer not the archive. The
> archive should only be responsible for converting the internal
> representation of the serializer to JSON/XML/YAML/etc.

isSliceOf is never called by any existing archive. Perhaps Slice and Array should be moved to the serializer module. I can also remove isSliceOf from Base. I think I was trying to keep the archives independent of the serializer, in the sense that the archives  never should have to import the serializer. That's not currently that case so these structs could be moved.

> (3b)
> Given that Slice has an ID field, why doesn't array.

That's a good question. I don't think it's used at all. archiveSlice takes a Slice and a sliceId. This field should be removed.

> (4)
> Why have an Archive Interface and a Base class with common
> functionality? Why not an abstract class? Also, 'Base' isn't an
> acceptable class name for a Phobos module. Use 'ArchiveBase' or
> something more unique instead.

As you can see Base is a template class and therefore is dependent on the archive type. My goal was that Serializer should not be a template class.

I thought it was unnecessary to repeat the module name in the class name. But in this case it might be a good idea. Or perhaps call it AbstractArchive.

-- 
/Jacob Carlborg
October 01, 2011
On Sat, 01 Oct 2011 07:18:59 -0400, Jacob Carlborg <doob@me.com> wrote:
> On 2011-10-01 06:29, Robert Jacques wrote:
>> On Thu, 29 Sep 2011 14:58:30 -0400, Jacob Carlborg <doob@me.com> wrote:

[snip]

>> (2)
>> orange.serialization.archives.XmlArchive need to be documented.
>
> I was hoping the Archive interface and the Base abstract class would be
> enough.

For the pre-review its okay. But you'll need it for the actual review.

>> (3)
>> if Archive.Array (which is poorly named btw) "is a type independent
>> representation of an array" then why does it contain an elementSize field?
>
> Suggestions for other names are welcome. Perhaps it was poorly worded,
> but what I mean is that this type can represent all array types.
>
>> (3a)
>> Also by the time archiving is called, isSliceOf should always return
>> false.
>
> Why is that?

If isSliceOf can return true, then that means that the archive is responsible for alias detection, management, etc. That means that every single archive format must implement an alias resolution algorithm. This results in a lot of copy/paste boiler plate which has to be maintained. It also makes it more difficult to get extra formats supported. Worse if someone forgets to either include this functionality or to apply a patch, silent bugs and/or slowdowns are introduced. All and archiver should be responsible for is taking some decorated data structure and converting it to XML/JSON/YAML/etc and back again. Anything more complex than that should be shared functionality located in the serializer / de-serializer objects.

>> That this function exists speaks to design problems both large
>> and small. On the small scale, isSliceOf indicates that you are testing
>> every array against every other array for slicing, which I hope is not
>> the case.
>
> I do. How would I otherwise discover if an array is a slice of another
> array or not?

Okay, first some rational. Consider:

assert(!a.isSliceOf(b));
assert(!b.isSliceOf(a));
assert( c.isSliceOf(a));
assert( c.isSliceOf(b));

and

class Foo {
	float x;
	float[3] point;
}

void main() {
	auto foo = new Foo;
	auto ptr = &foo.x;
	auto slice = point[0..2];
}

In the first case, a, b and c are all slices of a common root array, but the root array may not be serialized. In the second case, first you have a pointer to the inside of an object and second you have a slice of a static array inside an object, all three of which may be serialized together. My impression from your API (so this might not be correct) is that currently, you can't handle the above use cases. Even if you can, an O(N^2) algorithm is rather inefficient.

The solution, in my mind, is to think in terms of memory blocks/chucks. Every reference can be thought as pointing to a memory chunk defined by two pointers and a flag:

{
	void* head;			// Start of the memory chunk
	void* tail;			// End of the memory chunk
	bool  hasAliases;	// True if there are more than one reference to this chunk
}

For alias detection / resolution, you build a balanced tree of memory chunks, widening the chunk and flagging hasAliases as appropriate. Which should give you O(N log(N)) performance.

As an optimization, the user should be able to 'finalize' the serialization by pruning away all memory chunks without aliases. (i.e. a serializeAndClear method)
October 03, 2011
On Sat, 01 Oct 2011 06:50:52 -0400, Jacob Carlborg <doob@me.com> wrote:

> On 2011-10-01 05:00, Robert Jacques wrote:
>> I agree, which is why I suggested lookup should have some granuality.
>> i.e. that there is both a global store of serialization methods and a
>> per instance store of serialization methods. Lookup would first look in
>> the local store before defaulting to the global store. But this should
>> be a separate pair of functions.
>
> Aah, now I get it. That's a good idea. The question is what to name the
> two functions. Yet another use case for overloading methods on static.

How about overrideSerializer or overloadSerializer?

[snip]

>> Both
>>
>> T deserialize (T)();
>> T deserialize (T)(string key);
>>
>> have the following example:
>>
>> class Foo
>> {
>> int a;
>>
>> void fromData (Serializer serializer, Serializer.Data key)
>> {
>> a = serializer!(int)("a");
>> }
>> }
>
>
> No, "T deserialize (T)(string key)" has:
>
> a = serializer!(int)("a");
>
> And "T deserialize (T)()" has:
>
> a = serializer!(int)();
>
> Both are correct. This is a complete example of using one of these
> methods: https://github.com/jacob-carlborg/orange/blob/master/tests/Custom.d
>
> Let me know if anything is confusing.
>

Umm... example code for the deserialize method should contain 'deserialize' somewhere inside it.
October 03, 2011
On 2011-10-02 00:52, Robert Jacques wrote:
> On Sat, 01 Oct 2011 07:18:59 -0400, Jacob Carlborg <doob@me.com> wrote:
> For the pre-review its okay. But you'll need it for the actual review.

Ok, it will be the same documentation as for Archive and Base. Ddoc really needs to get better at this, I mean, why can't it just inherit the documentation.

>>> Also by the time archiving is called, isSliceOf should always return
>>> false.
>>
>> Why is that?
>
> If isSliceOf can return true, then that means that the archive is
> responsible for alias detection, management, etc.

No, who says that. You can take this struct and use it outside of this library, it knows nothing about archiving or serialization. If the isSliceOf method should return false when archiving has been called I would need to add logic to detect when serialization and archiving has begun and ended.

> That means that every
> single archive format must implement an alias resolution algorithm.

No, the serializer performs this task.

> This  results in a lot of copy/paste boiler plate which has to be maintained.
> It also makes it more difficult to get extra formats supported. Worse if
> someone forgets to either include this functionality or to apply a
> patch, silent bugs and/or slowdowns are introduced. All and archiver
> should be responsible for is taking some decorated data structure and
> converting it to XML/JSON/YAML/etc and back again. Anything more complex
> than that should be shared functionality located in the serializer /
> de-serializer objects.

It's the only thing the archive does.

>> I do. How would I otherwise discover if an array is a slice of another
>> array or not?
>
> Okay, first some rational. Consider:
>
> assert(!a.isSliceOf(b));
> assert(!b.isSliceOf(a));
> assert( c.isSliceOf(a));
> assert( c.isSliceOf(b));
>
> and
>
> class Foo {
> float x;
> float[3] point;
> }
>
> void main() {
> auto foo = new Foo;
> auto ptr = &foo.x;
> auto slice = point[0..2];
> }
>
> In the first case, a, b and c are all slices of a common root array, but
> the root array may not be serialized. In the second case, first you have
> a pointer to the inside of an object and second you have a slice of a
> static array inside an object, all three of which may be serialized
> together. My impression from your API (so this might not be correct) is
> that currently, you can't handle the above use cases. Even if you can,
> an O(N^2) algorithm is rather inefficient.

This is how it works: As the first step all arrays are serialized as regular arrays and not slices. After all serialization is done I loop over all arrays and check if they are a slice of some other array. If I found a match I replace the serialized array with a slice instead.

These arrays are stored as an associative array with the type Array[Id]. I don't know if there's a better data structure for this.

> The solution, in my mind, is to think in terms of memory blocks/chucks.
> Every reference can be thought as pointing to a memory chunk defined by
> two pointers and a flag:
>
> {
> void* head; // Start of the memory chunk
> void* tail; // End of the memory chunk
> bool hasAliases; // True if there are more than one reference to this chunk
> }
>
> For alias detection / resolution, you build a balanced tree of memory
> chunks, widening the chunk and flagging hasAliases as appropriate. Which
> should give you O(N log(N)) performance.

I'm not sure I understand. That would require that the arrays are stored in a continues block of memory? Won't "head" and "tail" always point to start of the array and the end of the array?

> As an optimization, the user should be able to 'finalize' the
> serialization by pruning away all memory chunks without aliases. (i.e. a
> serializeAndClear method)


-- 
/Jacob Carlborg
October 03, 2011
On 2011-10-03 05:50, Robert Jacques wrote:
> On Sat, 01 Oct 2011 06:50:52 -0400, Jacob Carlborg <doob@me.com> wrote:
>
>> On 2011-10-01 05:00, Robert Jacques wrote:
>>> I agree, which is why I suggested lookup should have some granuality.
>>> i.e. that there is both a global store of serialization methods and a
>>> per instance store of serialization methods. Lookup would first look in
>>> the local store before defaulting to the global store. But this should
>>> be a separate pair of functions.
>>
>> Aah, now I get it. That's a good idea. The question is what to name the
>> two functions. Yet another use case for overloading methods on static.
>
> How about overrideSerializer or overloadSerializer?

registerSerializer for the static method and overloadSerializer/overrideSerializer for the instance method?

> Umm... example code for the deserialize method should contain
> 'deserialize' somewhere inside it.


You are completley right

a = serializer!(int)("a");

Should be

a = deserialize!(int)("a");

My bad.

-- 
/Jacob Carlborg
October 03, 2011
On Mon, 03 Oct 2011 03:06:36 -0400, Jacob Carlborg <doob@me.com> wrote:

> On 2011-10-03 05:50, Robert Jacques wrote:
>> On Sat, 01 Oct 2011 06:50:52 -0400, Jacob Carlborg <doob@me.com> wrote:
>>
>>> On 2011-10-01 05:00, Robert Jacques wrote:
>>>> I agree, which is why I suggested lookup should have some granuality.
>>>> i.e. that there is both a global store of serialization methods and a
>>>> per instance store of serialization methods. Lookup would first look in
>>>> the local store before defaulting to the global store. But this should
>>>> be a separate pair of functions.
>>>
>>> Aah, now I get it. That's a good idea. The question is what to name the
>>> two functions. Yet another use case for overloading methods on static.
>>
>> How about overrideSerializer or overloadSerializer?
>
> registerSerializer for the static method and
> overloadSerializer/overrideSerializer for the instance method?

Yes. Sorry for being unclear. The concept being that at the instance level, you are overriding default behavior.
October 03, 2011
On Mon, 03 Oct 2011 02:38:22 -0400, Jacob Carlborg <doob@me.com> wrote:
> On 2011-10-02 00:52, Robert Jacques wrote:
>> On Sat, 01 Oct 2011 07:18:59 -0400, Jacob Carlborg <doob@me.com> wrote:

[snip]

>>>> Also by the time archiving is called, isSliceOf should always return
>>>> false.
>>>
>>> Why is that?
>>
>> If isSliceOf can return true, then that means that the archive is
>> responsible for alias detection, management, etc.
>
> No, who says that. You can take this struct and use it outside of this
> library, it knows nothing about archiving or serialization. If the
> isSliceOf method should return false when archiving has been called I
> would need to add logic to detect when serialization and archiving has
> begun and ended.

So, in essence, you are saying that by the time archiving occurs, isSliceOf will always return false? Then why is it part of the public API?

>> That means that every
>> single archive format must implement an alias resolution algorithm.
>
> No, the serializer performs this task.

Okay.

[snip]

>>> I do. How would I otherwise discover if an array is a slice of another
>>> array or not?
>>
>> Okay, first some rational. Consider:
>>
>> assert(!a.isSliceOf(b));
>> assert(!b.isSliceOf(a));
>> assert( c.isSliceOf(a));
>> assert( c.isSliceOf(b));
>>
>> and
>>
>> class Foo {
>> float x;
>> float[3] point;
>> }
>>
>> void main() {
>> auto foo = new Foo;
>> auto ptr = &foo.x;
>> auto slice = point[0..2];
>> }
>>
>> In the first case, a, b and c are all slices of a common root array, but
>> the root array may not be serialized. In the second case, first you have
>> a pointer to the inside of an object and second you have a slice of a
>> static array inside an object, all three of which may be serialized
>> together. My impression from your API (so this might not be correct) is
>> that currently, you can't handle the above use cases. Even if you can,
>> an O(N^2) algorithm is rather inefficient.
>
> This is how it works: As the first step all arrays are serialized as
> regular arrays and not slices. After all serialization is done I loop
> over all arrays and check if they are a slice of some other array. If I
> found a match I replace the serialized array with a slice instead.
>
> These arrays are stored as an associative array with the type Array[Id].
> I don't know if there's a better data structure for this.

I presented one below.

>> The solution, in my mind, is to think in terms of memory blocks/chucks.
>> Every reference can be thought as pointing to a memory chunk defined by
>> two pointers and a flag:
>>
>> {
>> void* head; // Start of the memory chunk
>> void* tail; // End of the memory chunk
>> bool hasAliases; // True if there are more than one reference to this chunk
>> }
>>
>> For alias detection / resolution, you build a balanced tree of memory
>> chunks, widening the chunk and flagging hasAliases as appropriate. Which
>> should give you O(N log(N)) performance.
>
> I'm not sure I understand. That would require that the arrays are stored
> in a continues block of memory? Won't "head" and "tail" always point to
> start of the array and the end of the array?

Most of the time yes. But not all of the time. It would be fair to say that 'head' and 'tail' would be inside the GC memory region of the array and are <= or >= of the start/end of an array, respectively. More importantly, both objects and pointers would resolve to memory chunks as well and be included in the alias resolution algorithm. I'm assuming you currently separate object and array alias resolution and don't handle pointers at all.
October 03, 2011
On 2011-10-03 15:39, Robert Jacques wrote:
> On Mon, 03 Oct 2011 03:06:36 -0400, Jacob Carlborg <doob@me.com> wrote:
>
>> On 2011-10-03 05:50, Robert Jacques wrote:
>>> On Sat, 01 Oct 2011 06:50:52 -0400, Jacob Carlborg <doob@me.com> wrote:
>>>
>>>> On 2011-10-01 05:00, Robert Jacques wrote:
>>>>> I agree, which is why I suggested lookup should have some granuality.
>>>>> i.e. that there is both a global store of serialization methods and a
>>>>> per instance store of serialization methods. Lookup would first
>>>>> look in
>>>>> the local store before defaulting to the global store. But this should
>>>>> be a separate pair of functions.
>>>>
>>>> Aah, now I get it. That's a good idea. The question is what to name the
>>>> two functions. Yet another use case for overloading methods on static.
>>>
>>> How about overrideSerializer or overloadSerializer?
>>
>> registerSerializer for the static method and
>> overloadSerializer/overrideSerializer for the instance method?
>
> Yes. Sorry for being unclear. The concept being that at the instance
> level, you are overriding default behavior.

Yes, thanks, that makes sense.

-- 
/Jacob Carlborg