Range interface for std.serialization (page 4) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » Range interface for std.serialization (page 4)

August 26, 2013

Re: Range interface for std.serialization

Posted by Dicebot
in reply to Jacob Carlborg

Dicebot

Posted in reply to Jacob Carlborg

On Monday, 26 August 2013 at 11:23:05 UTC, Jacob Carlborg wrote:
> Here we have yet another suggestion for an API. The whole reason for this thread is that people weren't happy with the current interface, i.e. not range based. Now we got probably just as many suggestions as people who have answered to this thread. I still don't know hot the API should look like.

In the end it is still your decision to make - people here can provide some input and help with technical details but it is still your library and your voting ;) Though I would probably suggest to value Phobos developer opinions more than, ugh, mine (or any other random trespasser).

It is quite natural that such package has lot of potential use cases and thus different API expectation. Choosing proper aesthetics is what makes programming an art :P

August 26, 2013

Re: Range interface for std.serialization

Posted by Dmitry Olshansky
in reply to Jacob Carlborg

Dmitry Olshansky

Posted in reply to Jacob Carlborg

26-Aug-2013 15:23, Jacob Carlborg пишет:
> On 2013-08-26 11:23, Dmitry Olshansky wrote:
>
>> Array or any container should do for a range.
>
> But then it won't be lazy, or perhaps that's not a problem, since the
> whole deserializing should be lazy.

It's lazy but you have to put stuff somewhere.
Also 2nd API artifact unpackRange allows you to just look through the data (output range including lambdas can do anything). This can easily sift through say swaths of data picking only "lucky numbers":

deserilaizer.unpackRange!(int)((x){
	if(isLucky(x))
		writeln(x);
});

>> I'm not 100% sure what kind of interface to use, but Serializer and
>> Deserializer should not be "shipped in one package" as in one class.
>> The two a mirror each other but in essence are always used separately.
>> Ditto about archiver/unarchiver they simply provide different
>> functionality and it makes no sense to reuse the same object in 2 ways.
>>
>> Hence alternative 1 (unwrapping that snippet backwards):
>>
>> //BTW why go new & classes here(?)
>
> The reason to have classes is that I need reference types. I need to
> pass the serializer to "toData" and "fromData" methods that can be
> implemented on the objects being (de)serialized. I guess they could take
> the argument by ref. Is it possible to force that?

Would be interesting to do that. One way is to pass rvalue to said function and if it accepts that then it's not by ref.

Along the lines of
__traits(compiles, (){
	T val;
	//or better can use dummy function
	//that returns Serializer by value
	val.toData(Serializer.init);
});

At least that works with templated stuff too.

>
>> auto unarchiver = new XmlUnarchiver(someCharRange);
>> auto deserialzier = new Deserializer(unarchiver);
>> auto obj = deserializer.unpack!Object;
>>
>> //for sequence/array in underlying format it would use any container
>> List!int list = deserializer.unpack!(List!int);
>> int[] arr = deserializer.unpack!(int[]);
>>
>> IMO looks quite nice. The problem of how exactly should a container be
>> filled is open though.
>>
>> So another alternative being more generic (the above could be consider
>> convenience over this one):
>>
>> Vector!int ints;
>> deserilaizer.unpackRange!(int)(x => ints.pushBack(x));
>>
>> Basically unpack next sequence of data (as serialized) by feeding it to
>> an output range using element type as param. And a simple lambda
>> qualifies as an output range.
>
> Here we have yet another suggestion for an API.

It's not just yet another. It isn't about particular shade of color. I can explain the ifs and whys of any design decision here if there is a doubt. I don't care for names but I see the precise semantics and there is little left to define.

For instance I see good reasons why serializer _has_ to be OutputRange and not InputRange. Why archiver _has_ to take output range or be one and so on. Ditto on why there has to be separation of (un)archiver and (de)serializer.

> The whole reason for
> this thread is that people weren't happy with the current interface,
> i.e. not range based. Now we got probably just as many suggestions as
> people who have answered to this thread. I still don't know hot the API
> should look like.

It's not a question of putting here some ranges. Or replacing all arrays one comes across to ranges(as much as a lot of folks would unfortunately assume).

Rather it's how it could operate with them at all without sacrificing the functionality, performance and ease of use. And there is not much in this tight design space that actually works.

Pardon me if my tone is a bit sharp. I like any other want the best design we can get. Now that the great deal of work is done it would be a shame to present it in a bad package.

>> Also take a look at the new digest API. I have an understanding that
>> serialization would do well to take the same general strategy - concrete
>> archivers as structs + polymorphic interface and wrappers on top.
>
> I could have a look at that.

August 26, 2013

Re: Range interface for std.serialization

Posted by Jacob Carlborg
in reply to Dicebot

Jacob Carlborg

Posted in reply to Dicebot

On 2013-08-26 15:42, Dicebot wrote:

> For me distinction was very natural. `(de)serializer` is something that
> takes care of D type introspection and provides it in simplified form to
> `(de)archiver` which embeds actual format knowledge. Former can get
> pretty tricky in D so it makes some sense to keep it separate.

I think he was referring to that there should be a separate deserializer from the serializer. And a separate unarchiver from the archiver.

-- 
/Jacob Carlborg

August 26, 2013

Re: Range interface for std.serialization

Posted by Jacob Carlborg
in reply to Dmitry Olshansky

Jacob Carlborg

Posted in reply to Dmitry Olshansky

On 2013-08-26 15:57, Dmitry Olshansky wrote:

> It's not just yet another. It isn't about particular shade of color. I
> can explain the ifs and whys of any design decision here if there is a
> doubt. I don't care for names but I see the precise semantics and there
> is little left to define.

Yes, please do. As I see it there are four parts of the interface that need to be solved:

1. How to get data in to the serializer
2. How to get data out of the serializer
3. How to get data in to the archiver
4. How to get data out of the archiver

> Pardon me if my tone is a bit sharp. I like any other want the best
> design we can get. Now that the great deal of work is done it would be a
> shame to present it in a bad package.

Yes, that's why we're having this discussion.

-- 
/Jacob Carlborg

August 26, 2013

Re: Range interface for std.serialization

Posted by Jacob Carlborg
in reply to Dicebot

Jacob Carlborg

Posted in reply to Dicebot

On 2013-08-26 15:53, Dicebot wrote:

> In the end it is still your decision to make - people here can provide
> some input and help with technical details but it is still your library
> and your voting ;) Though I would probably suggest to value Phobos
> developer opinions more than, ugh, mine (or any other random trespasser).

Well I'm happy with the interface as it is, that's why I created it like that. But not the other developers here, so it won't be accepted in its current state. That's why I'm asking: "how should it look like?".

-- 
/Jacob Carlborg

August 26, 2013

Re: Range interface for std.serialization

Posted by Dmitry Olshansky
in reply to Jacob Carlborg

Dmitry Olshansky

Posted in reply to Jacob Carlborg

26-Aug-2013 18:37, Jacob Carlborg пишет:
> On 2013-08-26 15:57, Dmitry Olshansky wrote:
>
>> It's not just yet another. It isn't about particular shade of color. I
>> can explain the ifs and whys of any design decision here if there is a
>> doubt. I don't care for names but I see the precise semantics and there
>> is little left to define.
>
> Yes, please do.

> As I see it there are four parts of the interface that
> need to be solved:
>
> 1. How to get data in to the serializer
> 2. How to get data out of the serializer
> 3. How to get data in to the archiver
> 4. How to get data out of the archiver

More a question of implementation then.

Answer to both of them - wrapping an output range in archiver and being one for serializer. As for connection with your current API that gives away an array - just think std.array.Appender (and a multitude more ways to chew the data).

Looking at your current code in depth... finally. I would have a problem starting the productive answers to the questions.

First things first - there should not be a key parameter aside from stuff added by archiver itself for its internal needs. Nor is there a simple way to locate data by key afterwards (certainly not every format defines such). It would require some tagged object model and there is no such _requirement_ in the serialization.

Citing a line from archive.d
"""
There are a couple of limitations when implementing a new archive, this is due* to how the serializer and the archive interface is built. Except for what this interface says explicitly an archive needs to be able to handle the following:
Unarchive a value based on a key or id, regardless of where in the archive  the value is located
"""
- this is impossible in the setting of serialization.

Serialization is NOT about modeling the whole dataset and providing queries into that. A model of serialization is that of unix _tar_  just dump any graphs of data to "TAPE" and/or restore back. If you can do processing on the fly - bonus points (and what I push for can).

This confusion and its consequences are no doubt due to building on std.xml and the interface it presents.

Second - no, not every operation has to return some piece of Data that is produced. It would be tremendously inefficient and require keeping memory references to that alive (or be unsafe in addition to slow). Instead it just outputs something to the underlying sink.

class Serializer
{
	...
	//now we are output range for anything.
	//add constraint as you see fit
	void put(T)(T value)
	{
		serializeInternal(value);//calls methods of archiver
	}

	Archiver archiver;
}

class MyArchiver(Output)
	if(isOutputRange!(Output, dchar)) //or ubyte if binary
{
	...
	this(Output sink)
	{
		this.sink = sink;
	}

	//and a method for example
	private void archivePrimitive (T) (T value, string key, Id id)
	{
		//along the lines of this, don't take literally
		//I've no idea of the actual format for tags you use
   		formattedWrite(sink, "<%s>%s</%s>", id, value, id);
        }
	...
	Output sink;
}

The user just writes e.g.

auto app = appender!(char[])();
auto archiver = new XmlArchiver(app);
auto serializer = new Serializer(archiver);

and works with it serializer as with output range of anything (I showed the example before)

Then once the data is required just peek at app.data and there it is.
So in-memory case is easily covered. Other sinks bring more benefits see e.g.:

auto sink = stdout.lockingtextWriter();

And the same code now writes directly to stdout and no worries if there is a lot of stuff to write.

....

No matter how I look at code it needs a lot of (re-)work.
For instance Archive type is obsessed with strings. I can't see a need for that many strings attached :)
The awful duality of Serializer that results literally in:
if(mode == serializing) doSerializing else do doDeserializing

And the spectacular pair:

T deserialize (T) (Data data, string key = "")
{
        mode = deserializing;
	...
}

Data serialize (T) (T value, string key = null)
{
        mode = serializing;
	...
}

Amount of extra code executed per bit of output is remarkably high, and a hallmark of standard library is pay as you go principle. We (as collectively Phobos devs) have to set the baseline for performance, if it's too low we're out of the game.

For example - events are cute, but do we all need them? Do we always want an overhead of checking that stuff per field written?

Instead decompose these layers, make them stackable for instance:

auto serializer = new Serializer(...);
auto tracingSerializer = new TracingSerializer(serializer);

Or just make 2 kinds of serializers with static if on a single template parameter bool withEvents it's trivial. Then a couple of aliases would finish the job.

With that I'm observe that events are attached to types/fields... hum, in such a case it needs work to make them zero-cost if absent.

>> Pardon me if my tone is a bit sharp. I like any other want the best
>> design we can get. Now that the great deal of work is done it would be a
>> shame to present it in a bad package.
>
> Yes, that's why we're having this discussion.

And I'm afraid it's too late or the changes are too far reaching but let's try it.

I'm especially destroyed  by (and the fact that it's a part of interface to implement):

    void archiveEnum (bool value, string baseType, string key, Id id);

    /// Ditto
    void archiveEnum (bool value, string baseType, string key, Id id);

    /// Ditto
    void archiveEnum (byte value, string baseType, string key, Id id);

    /// Ditto
    void archiveEnum (char value, string baseType, string key, Id id);

    /// Ditto
    void archiveEnum (dchar value, string baseType, string key, Id id);

    /// Ditto
    void archiveEnum (int value, string baseType, string key, Id id);

    /// Ditto
    void archiveEnum (long value, string baseType, string key, Id id);

    /// Ditto
    void archiveEnum (short value, string baseType, string key, Id id);

    /// Ditto
    void archiveEnum (ubyte value, string baseType, string key, Id id);

    /// Ditto
    void archiveEnum (uint value, string baseType, string key, Id id);

    /// Ditto
    void archiveEnum (ulong value, string baseType, string key, Id id);

    /// Ditto
    void archiveEnum (ushort value, string baseType, string key, Id id);

    /// Ditto
    void archiveEnum (wchar value, string baseType, string key, Id id);

-- 
Dmitry Olshansky

August 26, 2013

Re: Range interface for std.serialization

Posted by Dmitry Olshansky
in reply to Dicebot

Dmitry Olshansky

Posted in reply to Dicebot

26-Aug-2013 17:42, Dicebot пишет:
> On Monday, 26 August 2013 at 09:23:32 UTC, Dmitry Olshansky wrote:
>> I'm still missing something about separation of archiver and
>> serializer but in my mind these are tightly coupled and may as well be
>> one entity.
>
> For me distinction was very natural. `(de)serializer` is something that
> takes care of D type introspection and provides it in simplified form to
> `(de)archiver` which embeds actual format knowledge. Former can get
> pretty tricky in D so it makes some sense to keep it separate.

If that is the case then fine. Though upon seeing Archive interface in its mortifying glory I'm not sure about that simplifying bit ...

> I can't really add anything on ranges part of your comments - sounds
> like you have a better "big picture" anyway :)


-- 
Dmitry Olshansky

August 27, 2013

Re: Range interface for std.serialization

Posted by Jacob Carlborg
in reply to Dmitry Olshansky

Jacob Carlborg

Posted in reply to Dmitry Olshansky

On 2013-08-26 18:41, Dmitry Olshansky wrote:

> More a question of implementation then.
>
> Answer to both of them - wrapping an output range in archiver and being
> one for serializer. As for connection with your current API that gives
> away an array - just think std.array.Appender (and a multitude more ways
> to chew the data).

Ok, thank you.

> Looking at your current code in depth... finally. I would have a problem
> starting the productive answers to the questions.
>
> First things first - there should not be a key parameter aside from
> stuff added by archiver itself for its internal needs. Nor is there a
> simple way to locate data by key afterwards (certainly not every format
> defines such). It would require some tagged object model and there is no
> such _requirement_ in the serialization.

I would really like the serializer not being dependent on the order of the fields of the types it's (de)serializing.

> Citing a line from archive.d
> """
> There are a couple of limitations when implementing a new archive, this
> is due* to how the serializer and the archive interface is built. Except
> for what this interface says explicitly an archive needs to be able to
> handle the following:
> Unarchive a value based on a key or id, regardless of where in the
> archive  the value is located
> """
> - this is impossible in the setting of serialization.
>
> Serialization is NOT about modeling the whole dataset and providing
> queries into that. A model of serialization is that of unix _tar_  just
> dump any graphs of data to "TAPE" and/or restore back. If you can do
> processing on the fly - bonus points (and what I push for can).
>
> This confusion and its consequences are no doubt due to building on
> std.xml and the interface it presents.

It might be due to the XML format in general but certainly not due to std.xml. The XmlArchie originally used the XML package in Tango, long before it supported std.xml.

> Second - no, not every operation has to return some piece of Data that
> is produced. It would be tremendously inefficient and require keeping
> memory references to that alive (or be unsafe in addition to slow).
> Instead it just outputs something to the underlying sink.
>
> class Serializer
> {
>      ...
>      //now we are output range for anything.
>      //add constraint as you see fit
>      void put(T)(T value)
>      {
>          serializeInternal(value);//calls methods of archiver
>      }
>
>      Archiver archiver;
> }
>
> class MyArchiver(Output)
>      if(isOutputRange!(Output, dchar)) //or ubyte if binary
> {
>      ...
>      this(Output sink)
>      {
>          this.sink = sink;
>      }
>
>      //and a method for example
>      private void archivePrimitive (T) (T value, string key, Id id)
>      {
>          //along the lines of this, don't take literally
>          //I've no idea of the actual format for tags you use
>             formattedWrite(sink, "<%s>%s</%s>", id, value, id);
>          }
>      ...
>      Output sink;
> }

Good, thank you.

> The user just writes e.g.
>
> auto app = appender!(char[])();
> auto archiver = new XmlArchiver(app);
> auto serializer = new Serializer(archiver);
>
> and works with it serializer as with output range of anything (I showed
> the example before)
>
> Then once the data is required just peek at app.data and there it is.
> So in-memory case is easily covered. Other sinks bring more benefits see
> e.g.:
>
> auto sink = stdout.lockingtextWriter();
>
> And the same code now writes directly to stdout and no worries if there
> is a lot of stuff to write.

Thank you for giving some concrete ides of the API.

> ....
>
> No matter how I look at code it needs a lot of (re-)work.
> For instance Archive type is obsessed with strings. I can't see a need
> for that many strings attached :)

Yeah, I know. It's manly because the archive doesn't use templates, because it need to implement an interface.

> The awful duality of Serializer that results literally in:
> if(mode == serializing) doSerializing else do doDeserializing
>
> And the spectacular pair:
>
> T deserialize (T) (Data data, string key = "")
> {
>          mode = deserializing;
>      ...
> }
>
> Data serialize (T) (T value, string key = null)
> {
>          mode = serializing;
>      ...
> }

I guess that's easier to avoid if I divide Serializer in to two separate parts, one for serializing and one for deserializing.

> Amount of extra code executed per bit of output is remarkably high

Now I think you're exaggerating a bit.

, and
> a hallmark of standard library is pay as you go principle. We (as
> collectively Phobos devs) have to set the baseline for performance, if
> it's too low we're out of the game.
>
> For example - events are cute, but do we all need them? Do we always
> want an overhead of checking that stuff per field written?

Sure, there are some overhead of calling some functions but the events are checked for at compile time so the overhead should be minimal.

> Instead decompose these layers, make them stackable for instance:
>
> auto serializer = new Serializer(...);
> auto tracingSerializer = new TracingSerializer(serializer);
>
> Or just make 2 kinds of serializers with static if on a single template
> parameter bool withEvents it's trivial. Then a couple of aliases would
> finish the job.

I don't think that will be needed. I can see if I can refactor a bit to minimize the overhead even more.

> With that I'm observe that events are attached to types/fields... hum,
> in such a case it needs work to make them zero-cost if absent.

The only cost is calling "triggerEvents" and "triggerEvent", the rest is performed at compile time.

> And I'm afraid it's too late or the changes are too far reaching but
> let's try it.
>
> I'm especially destroyed  by (and the fact that it's a part of interface
> to implement):
>
>      void archiveEnum (bool value, string baseType, string key, Id id);
>
>      /// Ditto
>      void archiveEnum (bool value, string baseType, string key, Id id);
>
>      /// Ditto
>      void archiveEnum (byte value, string baseType, string key, Id id);
>
>      /// Ditto
>      void archiveEnum (char value, string baseType, string key, Id id);
>
>      /// Ditto
>      void archiveEnum (dchar value, string baseType, string key, Id id);
>
>      /// Ditto
>      void archiveEnum (int value, string baseType, string key, Id id);
>
>      /// Ditto
>      void archiveEnum (long value, string baseType, string key, Id id);
>
>      /// Ditto
>      void archiveEnum (short value, string baseType, string key, Id id);
>
>      /// Ditto
>      void archiveEnum (ubyte value, string baseType, string key, Id id);
>
>      /// Ditto
>      void archiveEnum (uint value, string baseType, string key, Id id);
>
>      /// Ditto
>      void archiveEnum (ulong value, string baseType, string key, Id id);
>
>      /// Ditto
>      void archiveEnum (ushort value, string baseType, string key, Id id);
>
>      /// Ditto
>      void archiveEnum (wchar value, string baseType, string key, Id id);

So you want templates instead?

I have read your posts, thank you for your comments. I'm planning now to:

* Split Serializer in to two parts
* Make the parts struct
* Possibly provide class wrappers
* Split Archive in two parts
* Add range interface to Serializer and Archive

-- 
/Jacob Carlborg

August 27, 2013

Re: Range interface for std.serialization

Posted by Dmitry Olshansky
in reply to Jacob Carlborg

Dmitry Olshansky

Posted in reply to Jacob Carlborg

27-Aug-2013 23:23, Jacob Carlborg пишет:
> On 2013-08-26 18:41, Dmitry Olshansky wrote:
>> Looking at your current code in depth... finally. I would have a problem
>> starting the productive answers to the questions.
>>
>> First things first - there should not be a key parameter aside from
>> stuff added by archiver itself for its internal needs. Nor is there a
>> simple way to locate data by key afterwards (certainly not every format
>> defines such). It would require some tagged object model and there is no
>> such _requirement_ in the serialization.
>
> I would really like the serializer not being dependent on the order of
> the fields of the types it's (de)serializing.
>
I see...
That depends on the format and for these that have no keys or markers of any kind versioning might help here. For instance JSON/BSON could handle permutation of fields, but I then it falls short of handling links e.g. pointers (maybe there is a trick to get it, but I can't think of any right away).

I suspect it would be best to somehow see archives by capbilities:
1. Rigid (most binary) - in-order, depends on the order of fields, may need to fit a scheme (in this cases D types implicitly define one)
Rigid archivers may also enjoy (per format in the future) a code generator that given a scheme defines D types with a bit of CTFE+mixin.

2. Flexible - can survive reordering, is scheme-less, data defines structure etc. easer handles versioning e.g. XML is one.

This also neatly answers the question about scheme vs scheme-less serialization. Protocol buffers/Thrift may be absorbed into Rigid category if we can get the versioning right. Also solving versioning is the last roadblock (after ranges) mentioned on the path to making this an epic addition to Phobos.

+ Some kind of capability flag (compile-time) if it can serialize full graphs or if the format is to limited for such. Taking that with Rigid would cover most adhoc binary formats in the wild, with Flexible it would handle some simple hierarchical formats as well.

>> This confusion and its consequences are no doubt due to building on
>> std.xml and the interface it presents.
>
> It might be due to the XML format in general but certainly not due to
> std.xml. The XmlArchie originally used the XML package in Tango, long
> before it supported std.xml.

Was it DOM-ish too?

>> The awful duality of Serializer that results literally in:
>> if(mode == serializing) doSerializing else do doDeserializing
>>
>> And the spectacular pair:
>>
>> T deserialize (T) (Data data, string key = "")
>> {
>>          mode = deserializing;
>>      ...
>> }
>>
>> Data serialize (T) (T value, string key = null)
>> {
>>          mode = serializing;
>>      ...
>> }
>
> I guess that's easier to avoid if I divide Serializer in to two separate
> parts, one for serializing and one for deserializing.
>
Right, I was shamelessly picking at this again.

>> Amount of extra code executed per bit of output is remarkably high
>
> Now I think you're exaggerating a bit.

I've meant at least a check of 'mode' on each call to (de)serialize + some other branch-y stuff that tests overridden serializers etc.

It could be a relatively new idiom to follow but there is a great value in having a lean common path aka 90% of use cases that need no extras should go the fastest route potentially at the _expense_ of *less frequent cases*.
Simplified - the earlier you can elide extra work the better performance you get. To do that you may need to do double the overhead (checks) in less frequent case to remove some of it in the common case.

>> a hallmark of standard library is pay as you go principle. We (as
>> collectively Phobos devs) have to set the baseline for performance, if
>> it's too low we're out of the game.
>>
>> For example - events are cute, but do we all need them? Do we always
>> want an overhead of checking that stuff per field written?
>
> Sure, there are some overhead of calling some functions but the events
> are checked for at compile time so the overhead should be minimal.

See below. I was talking namely about calling functions to see that no events are fired anyway.

>> Instead decompose these layers, make them stackable for instance:
>>
>> auto serializer = new Serializer(...);
>> auto tracingSerializer = new TracingSerializer(serializer);
>>
>> Or just make 2 kinds of serializers with static if on a single template
>> parameter bool withEvents it's trivial. Then a couple of aliases would
>> finish the job.
>
> I don't think that will be needed. I can see if I can refactor a bit to
> minimize the overhead even more.

You are probably right as I note later on + there seems to be a way to elide the cost entirely if there are no events.
>
>> With that I'm observe that events are attached to types/fields... hum,
>> in such a case it needs work to make them zero-cost if absent.
>
> The only cost is calling "triggerEvents" and "triggerEvent", the rest is
> performed at compile time.

Yeah, I see, but it's still a call to delegate that's hard to inline (well LDC/GDC might). Would it be hard to do a compile-time check if there are any events with the type in question at all and then call triggerEvent(s)?

While we are on the subject of delegates - you absolutely should use 'scope delegate' as most (all?) delegates are never stored anywhere but rather pass blocks of code to call deeper down the line.
(I guess it's somewhat Ruby-style, but it's not a problem).

>
>> And I'm afraid it's too late or the changes are too far reaching but
>> let's try it.
>>
>> I'm especially destroyed  by (and the fact that it's a part of interface
>> to implement):
>>
>>      void archiveEnum (bool value, string baseType, string key, Id id);
>>
>>      /// Ditto
>>      void archiveEnum (bool value, string baseType, string key, Id id);
>>
[snip]

> So you want templates instead?

Aye, as any faithful Phobos dev absolutely :)
Seriously though ATM I just _suspect_ there is no need for Archive to be an interface. I would need to think this bit through more deeply but virtual call per field alone make me nervous here.

> I have read your posts, thank you for your comments. I'm planning now to:
>
> * Split Serializer in to two parts
> * Make the parts struct
> * Possibly provide class wrappers
> * Split Archive in two parts
> * Add range interface to Serializer and Archive

Great checklist, this would help greatly. I'm glad you see the value in these changes.
Feel free to nag me on the NG and personally for any deficiency you come across on the way there ;)

-- 
Dmitry Olshansky

August 28, 2013

Re: Range interface for std.serialization

Posted by Jacob Carlborg
in reply to Dmitry Olshansky

Jacob Carlborg

Posted in reply to Dmitry Olshansky

On 2013-08-27 22:12, Dmitry Olshansky wrote:

> I see...
> That depends on the format and for these that have no keys or markers of
> any kind versioning might help here. For instance JSON/BSON could handle
> permutation of fields, but I then it falls short of handling links e.g.
> pointers (maybe there is a trick to get it, but I can't think of any
> right away).

For pointers and reference types I currently serializing all fields with an id then when there's a pointer or reference I can just do this:

<int name="foo" id="1">3</int>
<pointer name="bar">1</pointer>

> I suspect it would be best to somehow see archives by capbilities:
> 1. Rigid (most binary) - in-order, depends on the order of fields, may
> need to fit a scheme (in this cases D types implicitly define one)
> Rigid archivers may also enjoy (per format in the future) a code
> generator that given a scheme defines D types with a bit of CTFE+mixin.
>
> 2. Flexible - can survive reordering, is scheme-less, data defines
> structure etc. easer handles versioning e.g. XML is one.

Yes, that's a good idea. In the binary archiver I'm working on I'm cheating quite a bit and relax the requirements made by the serializer.

> This also neatly answers the question about scheme vs scheme-less
> serialization. Protocol buffers/Thrift may be absorbed into Rigid
> category if we can get the versioning right. Also solving versioning is
> the last roadblock (after ranges) mentioned on the path to making this
> an epic addition to Phobos.

Versioning shouldn't be that hard, I think.

> + Some kind of capability flag (compile-time) if it can serialize full
> graphs or if the format is to limited for such. Taking that with Rigid
> would cover most adhoc binary formats in the wild, with Flexible it
> would handle some simple hierarchical formats as well.

Sounds like a good idea.

> Was it DOM-ish too?

Yes.

> I've meant at least a check of 'mode' on each call to (de)serialize +
> some other branch-y stuff that tests overridden serializers etc.
>
> It could be a relatively new idiom to follow but there is a great value
> in having a lean common path aka 90% of use cases that need no extras
> should go the fastest route potentially at the _expense_ of *less
> frequent cases*.
> Simplified - the earlier you can elide extra work the better performance
> you get. To do that you may need to do double the overhead (checks) in
> less frequent case to remove some of it in the common case.

Yes, I understand the checking for "mode" wasn't the best approach. The internals are mostly coded to be straight forward and just work.

> See below. I was talking namely about calling functions to see that no
> events are fired anyway.

I can probably add a static-if before calling the functions.

> Yeah, I see, but it's still a call to delegate that's hard to inline
> (well LDC/GDC might). Would it be hard to do a compile-time check if
> there are any events with the type in question at all and then call
> triggerEvent(s)?

No, I don't think so. I can also make the triggerEvents take the delegate by alias parameter, if that helps. Or inline it manually.

> While we are on the subject of delegates - you absolutely should use
> 'scope delegate' as most (all?) delegates are never stored anywhere but
> rather pass blocks of code to call deeper down the line.
> (I guess it's somewhat Ruby-style, but it's not a problem).

Good idea. The reasons for the delegates is to avoid begin/end functions. This also forces the use of the API correctly. Hmm, actually it may not. Since the Serializer technically is the user of the archiver API and that is already correctly implemented. The developer do need to implement the archiver API correctly, but there's nothing that stops him/her from _not_ calling the delegate. Am I over thinking this?

> Aye, as any faithful Phobos dev absolutely :)
> Seriously though ATM I just _suspect_ there is no need for Archive to be
> an interface. I would need to think this bit through more deeply but
> virtual call per field alone make me nervous here.

Originally it was using templates. One of my design goals back then was to not have to use templates. Templates forces slightly more complicated API for the user:

auto serializer = new Serializer!(XmlArchive);

Which is fine, but I'm not very about the API for custom serialization:

class Foo
{
    void toData (Archive) (Serializer!(Archive) serializer);
}

The user is either forced to use templates here as well, or:

class Foo
{
    void toData (Serializer!(XmlArchive) serializer);
}

... use a single type of archive. It's also possible to pass in anything as Archive. Now we have template constraints, which didn't exist back then, make it a bit better.

About the large API to implement for an Archive, this is the criteria I had when creating the API, in order of importance.

1. Should be easy for a consumer to use
2. Should be easy for an archive implementor
3. Should be easy to implement the serializer

In this case, point 1 made it less easy for point 2. Point 2 made me push as much as possible to the serializer instead of having it in the archiver.

In the end, it's quite easy to copy-paste the API, do some search and replace and forward methods like these:

void archiveEnum (bool value, string baseType, string key, Id id)
void archiveEnum (char value, string baseType, string key, Id id)
void archiveEnum (int value, string baseType, string key, Id id)

... to a private template method. That's what XmlArchive does:

https://github.com/jacob-carlborg/orange/blob/master/orange/serialization/archives/XmlArchive.d#L439

-- 
/Jacob Carlborg

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation