Range interface for std.serialization (page 5) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » Range interface for std.serialization (page 5)

August 28, 2013

Re: Range interface for std.serialization

Posted by Jacob Carlborg
in reply to Dmitry Olshansky

Jacob Carlborg

Posted in reply to Dmitry Olshansky

On 2013-08-27 22:12, Dmitry Olshansky wrote:

> Feel free to nag me on the NG and personally for any deficiency you come
> across on the way there ;)

About making Serializer a struct. Actually I think the semantics of Serializer should be a reference type. I see no use case in passing a Serializer by value. Although I do see the overhead of allocating a class and calling methods on it.

I do plan to add a free function for deserializing, for convenience. In that function Serializer, if it's a class, would be allocated using emplace to make it stack allocated.

-- 
/Jacob Carlborg

August 28, 2013

Re: Range interface for std.serialization

Posted by Dmitry Olshansky
in reply to Jacob Carlborg

Dmitry Olshansky

Posted in reply to Jacob Carlborg

28-Aug-2013 11:13, Jacob Carlborg пишет:
> On 2013-08-27 22:12, Dmitry Olshansky wrote:
>
>> I see...
>> That depends on the format and for these that have no keys or markers of
>> any kind versioning might help here. For instance JSON/BSON could handle
>> permutation of fields, but I then it falls short of handling links e.g.
>> pointers (maybe there is a trick to get it, but I can't think of any
>> right away).
>
> For pointers and reference types I currently serializing all fields with
> an id then when there's a pointer or reference I can just do this:
>
> <int name="foo" id="1">3</int>
> <pointer name="bar">1</pointer>

That would be tricky in JSON and quite overheadish (e.g. wrapping everything into object just in case there is a pointer there).

>> I suspect it would be best to somehow see archives by capbilities:
>> 1. Rigid (most binary) - in-order, depends on the order of fields, may
>> need to fit a scheme (in this cases D types implicitly define one)
>> Rigid archivers may also enjoy (per format in the future) a code
>> generator that given a scheme defines D types with a bit of CTFE+mixin.
>>
>> 2. Flexible - can survive reordering, is scheme-less, data defines
>> structure etc. easer handles versioning e.g. XML is one.
>
> Yes, that's a good idea. In the binary archiver I'm working on I'm
> cheating quite a bit and relax the requirements made by the serializer.

Yes, instead of cheating you can just define them as different kinds. It would ease the friction and prevent some "impedance mismatch" problems.

>> This also neatly answers the question about scheme vs scheme-less
>> serialization. Protocol buffers/Thrift may be absorbed into Rigid
>> category if we can get the versioning right. Also solving versioning is
>> the last roadblock (after ranges) mentioned on the path to making this
>> an epic addition to Phobos.
>
> Versioning shouldn't be that hard, I think.

Then collect some info on how to approach this problem.
See e.g. Boost serialziation, Protocol Buffers and Thrift.
The key point is that it's many things to many different people.

>> Was it DOM-ish too?
>
> Yes.

That nails it. DOM isn't quite serialization but rather a hierarchical DB. BTW Sqlite and other DBs may be an interesting backend for serialization (though they wouldn't have lookup untill deserialization).

>> Yeah, I see, but it's still a call to delegate that's hard to inline
>> (well LDC/GDC might). Would it be hard to do a compile-time check if
>> there are any events with the type in question at all and then call
>> triggerEvent(s)?
>
> No, I don't think so. I can also make the triggerEvents take the
> delegate by alias parameter, if that helps. Or inline it manually.

Great, anything to lessen the extra load.

>> While we are on the subject of delegates - you absolutely should use
>> 'scope delegate' as most (all?) delegates are never stored anywhere but
>> rather pass blocks of code to call deeper down the line.
>> (I guess it's somewhat Ruby-style, but it's not a problem).
>
> Good idea. The reasons for the delegates is to avoid begin/end
> functions. This also forces the use of the API correctly. Hmm, actually
> it may not. Since the Serializer technically is the user of the archiver
> API and that is already correctly implemented. The developer do need to
> implement the archiver API correctly, but there's nothing that stops
> him/her from _not_ calling the delegate. Am I over thinking this?

Seems like, after all library implementors should be trusted to not do truly awful things.

>
>> Aye, as any faithful Phobos dev absolutely :)
>> Seriously though ATM I just _suspect_ there is no need for Archive to be
>> an interface. I would need to think this bit through more deeply but
>> virtual call per field alone make me nervous here.
>
> Originally it was using templates. One of my design goals back then was
> to not have to use templates. Templates forces slightly more complicated
> API for the user:
>
> auto serializer = new Serializer!(XmlArchive);
>
> Which is fine, but I'm not very about the API for custom serialization:
>
> class Foo
> {
>      void toData (Archive) (Serializer!(Archive) serializer);
> }
>

Rather this:

void toData(Serializer)(Serializer serializer)
	if(isSerializer!Serializer)
{
	...
}

There is no need to even know how archiver looks like for the user code (wasn't it one of the goals of archivers?).

> The user is either forced to use templates here as well, or:
>
> class Foo
> {
>      void toData (Serializer!(XmlArchive) serializer);
> }

The main problem would be that it can't overriden as templates are final.

After all of this I think Archivers are just fine as templates user only ever interacts with them during creation. Then it's serializers templates that pick up the right types.

Serializers themselves on the other hand are present in user code and may need one common polymorphic abstract class that provides 'put' and forwards it to a set of abstract methods. All polymorphic wrappers would inherit from it.

This won't prevent folks from using templated version of toData/fromData if need be.

> ... use a single type of archive. It's also possible to pass in anything
> as Archive. Now we have template constraints, which didn't exist back
> then, make it a bit better.
>
> About the large API to implement for an Archive, this is the criteria I
> had when creating the API, in order of importance.
>
> 1. Should be easy for a consumer to use
> 2. Should be easy for an archive implementor
> 3. Should be easy to implement the serializer
>
> In this case, point 1 made it less easy for point 2. Point 2 made me
> push as much as possible to the serializer instead of having it in the
> archiver.
>

I'd suggest to maximally hide away (Un)Archivers API from end users and as such it would be more convenient to just stay templated as it won't be seen.

> In the end, it's quite easy to copy-paste the API, do some search and
> replace and forward methods like these:
>
> void archiveEnum (bool value, string baseType, string key, Id id)
> void archiveEnum (char value, string baseType, string key, Id id)
> void archiveEnum (int value, string baseType, string key, Id id)
>
> ... to a private template method. That's what XmlArchive does:
>
> https://github.com/jacob-carlborg/orange/blob/master/orange/serialization/archives/XmlArchive.d#L439
>



-- 
Dmitry Olshansky

August 28, 2013

Re: Range interface for std.serialization

Posted by Dmitry Olshansky
in reply to Jacob Carlborg

Dmitry Olshansky

Posted in reply to Jacob Carlborg

28-Aug-2013 12:08, Jacob Carlborg пишет:
> On 2013-08-27 22:12, Dmitry Olshansky wrote:
>
>> Feel free to nag me on the NG and personally for any deficiency you come
>> across on the way there ;)
>
> About making Serializer a struct. Actually I think the semantics of
> Serializer should be a reference type. I see no use case in passing a
> Serializer by value. Although I do see the overhead of allocating a
> class and calling methods on it.

Here you are quite right... just add a factory that hides away its true origin (and ctor as well so it can be changed later if need be) we.g.:

auto serializer = serializerFor!(XmlArchiver)(archiver);

> I do plan to add a free function for deserializing, for convenience. In
> that function Serializer, if it's a class, would be allocated using
> emplace to make it stack allocated.

Good idea. API should have many layers so that power users may keep digging to the bottom and these that just need to get the job done can do it in one stroke.

-- 
Dmitry Olshansky

August 28, 2013

Re: Range interface for std.serialization

Posted by Dmitry Olshansky
in reply to Dmitry Olshansky

Dmitry Olshansky

Posted in reply to Dmitry Olshansky

28-Aug-2013 13:58, Dmitry Olshansky пишет:
> 28-Aug-2013 11:13, Jacob Carlborg пишет:
>> On 2013-08-27 22:12, Dmitry Olshansky wrote:
>
> Rather this:
>
> void toData(Serializer)(Serializer serializer)
>      if(isSerializer!Serializer)
> {
>      ...
> }
>
> There is no need to even know how archiver looks like for the user code
> (wasn't it one of the goals of archivers?).
>
>> The user is either forced to use templates here as well, or:
>>
>> class Foo
>> {
>>      void toData (Serializer!(XmlArchive) serializer);
>> }
>
> The main problem would be that it can't overriden as templates are final.
>
> After all of this I think Archivers are just fine as templates user only
> ever interacts with them during creation. Then it's serializers
> templates that pick up the right types.
>
> Serializers themselves on the other hand are present in user code and
> may need one common polymorphic abstract class that provides 'put' and
> forwards it to a set of abstract methods. All polymorphic wrappers would
> inherit from it.

Taking into account that you've settled on keeping Serializers as classes just finalize all methods of a concrete serializer that is templated on archiver (and make it a final class).

Should be as simple as:

class Serializer {
	void put(T)(T item){ ...}
	//other methods per specific type
}

final class ConcreteSerializer(Archiver) : Serializer {
final:
	...
	//use Archiver here to implement these hooks
}

Then users that use templates in their code would have concrete types, for others it quickly "decays" to the base class they use.

The boilerplate of defining a lot of methods now moves to Serializer but there should be only one such (template) class anyway.

-- 
Dmitry Olshansky

August 28, 2013

Re: Range interface for std.serialization

Posted by Jacob Carlborg
in reply to Dmitry Olshansky

Jacob Carlborg

Posted in reply to Dmitry Olshansky

On 2013-08-28 11:58, Dmitry Olshansky wrote:

> That would be tricky in JSON and quite overheadish (e.g. wrapping
> everything into object just in case there is a pointer there).

Yes.

> Yes, instead of cheating you can just define them as different kinds. It
> would ease the friction and prevent some "impedance mismatch" problems.

Yes, that's better.

> Then collect some info on how to approach this problem.
> See e.g. Boost serialziation, Protocol Buffers and Thrift.
> The key point is that it's many things to many different people.

I'll do that.

> Rather this:
>
> void toData(Serializer)(Serializer serializer)
>      if(isSerializer!Serializer)
> {
>      ...
> }
>
> There is no need to even know how archiver looks like for the user code
> (wasn't it one of the goals of archivers?).

Right, didn't think of using a template argument for the whole serializer.


> Serializers themselves on the other hand are present in user code and
> may need one common polymorphic abstract class that provides 'put' and
> forwards it to a set of abstract methods. All polymorphic wrappers would
> inherit from it.
>
> This won't prevent folks from using templated version of toData/fromData
> if need be.

That's a good idea.

> I'd suggest to maximally hide away (Un)Archivers API from end users and
> as such it would be more convenient to just stay templated as it won't
> be seen.

Yes.

-- 
/Jacob Carlborg

August 28, 2013

Re: Range interface for std.serialization

Posted by Jacob Carlborg
in reply to Dmitry Olshansky

Jacob Carlborg

Posted in reply to Dmitry Olshansky

On 2013-08-28 13:20, Dmitry Olshansky wrote:

> Taking into account that you've settled on keeping Serializers as
> classes

Not necessary.

> just finalize all methods of a concrete serializer that is
> templated on archiver (and make it a final class).
>
> Should be as simple as:
>
> class Serializer {
>      void put(T)(T item){ ...}
>      //other methods per specific type
> }
>
> final class ConcreteSerializer(Archiver) : Serializer {
> final:
>      ...
>      //use Archiver here to implement these hooks
> }
>
> Then users that use templates in their code would have concrete types,
> for others it quickly "decays" to the base class they use.
>
> The boilerplate of defining a lot of methods now moves to Serializer but
> there should be only one such (template) class anyway.

This is a good idea.

-- 
/Jacob Carlborg

September 24, 2013

Re: Range interface for std.serialization

Posted by Jacob Carlborg
in reply to Dmitry Olshansky

Jacob Carlborg

Posted in reply to Dmitry Olshansky

On 2013-08-28 13:20, Dmitry Olshansky wrote:

Bumping this thread.

> Taking into account that you've settled on keeping Serializers as
> classes just finalize all methods of a concrete serializer that is
> templated on archiver (and make it a final class).
>
> Should be as simple as:
>
> class Serializer {
>      void put(T)(T item){ ...}
>      //other methods per specific type
> }
>
> final class ConcreteSerializer(Archiver) : Serializer {
> final:
>      ...
>      //use Archiver here to implement these hooks
> }

I'm having quite hard time to figure out how this should work. Or I'm misunderstanding what you're saying.

If I understand you correctly I should do something like:

class Serializer
{
    void put (T) (T item)
    {
        static if (is(T == int))
            serializeInt(item);

	...
    }

    abstract void serializeInt (int item);
}

But if I'm doing it that way I will still have the problem with a lot of methods that need to be implemented in the archiver.

Hmm, I guess it would be possible to minimize the number of methods used for built in types. There's still a problem with user defined types though.

-- 
/Jacob Carlborg

September 24, 2013

Re: Range interface for std.serialization

Posted by Dmitry Olshansky
in reply to Jacob Carlborg

Dmitry Olshansky

Posted in reply to Jacob Carlborg

24-Sep-2013 21:02, Jacob Carlborg пишет:
> On 2013-08-28 13:20, Dmitry Olshansky wrote:
>> Taking into account that you've settled on keeping Serializers as
>> classes just finalize all methods of a concrete serializer that is
>> templated on archiver (and make it a final class).
>>
>> Should be as simple as:
>>
>> class Serializer {
>>      void put(T)(T item){ ...}
>>      //other methods per specific type
>> }
>>
>> final class ConcreteSerializer(Archiver) : Serializer {
>> final:
>>      ...
>>      //use Archiver here to implement these hooks
>> }
>
> I'm having quite hard time to figure out how this should work. Or I'm
> misunderstanding what you're saying.
>
> If I understand you correctly I should do something like:
>
> class Serializer
> {
>      void put (T) (T item)
>      {
>          static if (is(T == int))
>              serializeInt(item);
>
>      ...
>      }
>
>      abstract void serializeInt (int item);
> }
>
> But if I'm doing it that way I will still have the problem with a lot of
> methods that need to be implemented in the archiver.
>

If I'm correct archiver would have the benefit of templates and common code would be merged (so all of these in a concrete serializer do forward to archiver.write!int, archive.write!uint etc.) On the plus side of having a bunch of methods in Serializer you need exactly one ConcreteSerializer!(Archive) that implement them. And user-defined archiver need not to even think of this, just define single templated write (or put or whatever).

> Hmm, I guess it would be possible to minimize the number of methods used
> for built in types. There's still a problem with user defined types though.

Indeed. But it must be provided as a template in generic serializer. The benefit is that said logic to serialize arbitrary UDTs is implemented there once and for all. Archiver is then partially relived of it. To achieve that an archiver may need to provided some fundamental "hooks" like  startStruct/endStruct (I didn't think through exact ones).

-- 
Dmitry Olshansky

September 25, 2013

Re: Range interface for std.serialization

Posted by Jacob Carlborg
in reply to Dmitry Olshansky

Jacob Carlborg

Posted in reply to Dmitry Olshansky

On 2013-09-24 22:27, Dmitry Olshansky wrote:

> Indeed. But it must be provided as a template in generic serializer. The
> benefit is that said logic to serialize arbitrary UDTs is implemented
> there once and for all. Archiver is then partially relived of it. To
> achieve that an archiver may need to provided some fundamental "hooks"
> like  startStruct/endStruct (I didn't think through exact ones).

Ok, that's basically how it already works. Thanks.

-- 
/Jacob Carlborg

October 10, 2013

Re: Range interface for std.serialization

Posted by Jacob Carlborg
in reply to Dmitry Olshansky

Jacob Carlborg

Posted in reply to Dmitry Olshansky

On 2013-08-27 22:12, Dmitry Olshansky wrote:

> Feel free to nag me on the NG and personally for any deficiency you come
> across on the way there ;)

I'm bumping this again with a new question. I'm thinking about how to output the data to output range. If the output range is of type ubyte[] how should I output serialized data looking like this:

<object runtimeType="main.Foo" type="main.Foo" key="0" id="0">
    <int key="a" id="1">3</int>
</object>

Should I output this in one chunk or in parts like this:

<object runtimeType="main.Foo" type="main.Foo" key="0" id="0">

Then

<int key="a" id="1">3</int>

Then

</object>

If the first case is chosen I guess this data:

<object runtimeType="main.Foo" type="main.Foo" key="0" id="0">
    <int key="a" id="1">3</int>
</object>

<object runtimeType="main.Foo" type="main.Foo" key="1" id="2">
    <int key="a" id="3">3</int>
</object>

Would be outputted in two chunks.

-- 
/Jacob Carlborg

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation