Reflections on Serialization APIs in D - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » Learn » Reflections on Serialization APIs in D

Thread overview

Reflections on Serialization APIs in D
Nov 17, 2013 Nordlöw
Nov 17, 2013 Jacek Furmankiewicz
Nov 17, 2013 Orvid King
Nov 18, 2013 Per Nordlöw
Nov 18, 2013 Jacob Carlborg
Nov 18, 2013 Per Nordlöw
Nov 18, 2013 Orvid King
Nov 18, 2013 Atila Neves
Nov 18, 2013 Jacek Furmankiewicz
Nov 18, 2013 Orvid King
Nov 18, 2013 Atila Neves
Nov 18, 2013 Atila Neves
Nov 18, 2013 Jacob Carlborg
Nov 18, 2013 Atila Neves

November 17, 2013

Reflections on Serialization APIs in D

Posted by Nordlöw

Nordlöw

In the road to develop a new kind of search engine that caches types, statistics, etc about files and directories I'm currently trying to implement persistent caching of my internal directory tree using `msgpack-d`:

Why doesn't `msgpack-d` and, from what I can see also, `std.serialization` (Orange) support implementing *both* packing and unpacking through one common template (member) function overload like **Boost.Serialization** does?. For example containers can be handled using this concise and elegant syntax in C++11:

        friend class boost::serialization::access;
        template<class Ar> void serialize(Ar& ar, const uint version) {
            for (const auto& e : *this) { ar & e; }
        }

This halves the code size aswell as removes the risk of making the `pack` and `unpack` go out of sync.

November 17, 2013

Re: Reflections on Serialization APIs in D

Posted by Jacek Furmankiewicz
in reply to Nordlöw

Jacek Furmankiewicz

Posted in reply to Nordlöw

I have not used it in D, but we use Thrift in Java a lot and I've been very happy with it on many levels. it works really well in production.

Since Thrift has D bindings as of recently, it may be worth your time to investigate.

November 17, 2013

Re: Reflections on Serialization APIs in D

Posted by Orvid King
in reply to Nordlöw

Orvid King

Posted in reply to Nordlöw

On 11/17/13, "Nordlöw" <per.nordlow@gmail.com> wrote:
> In the road to develop a new kind of search engine that caches types, statistics, etc about files and directories I'm currently trying to implement persistent caching of my internal directory tree using `msgpack-d`:
>
> Why doesn't `msgpack-d` and, from what I can see also, `std.serialization` (Orange) support implementing *both* packing and unpacking through one common template (member) function overload like **Boost.Serialization** does?. For example containers can be handled using this concise and elegant syntax in C++11:
>
>          friend class boost::serialization::access;
>          template<class Ar> void serialize(Ar& ar, const uint
> version) {
>              for (const auto& e : *this) { ar & e; }
>          }
>
> This halves the code size aswell as removes the risk of making the `pack` and `unpack` go out of sync.
>

I would suspect that the biggest reason is the limitations that that imposes on the underlying serialization implementation, as it would require that the underlying format support a minimum set of types.

I have something similar(ish) in my serialization framework, (https://github.com/Orvid/JSONSerialization) that allows you to implement a custom format for each type, but I implement it as a pair of methods, toString and parse, allowing the underlying format to be able to support only serializing strings if it really wanted to. Also, currently my framework only supports JSON, but it's designed such that it would be insanely easy to add support for another format. It's also fast, very fast, mostly because I have managed to implement the JSON serialization methods entirely with no allocation at all being required. I'm able to serialize 100k objects in about 90ms on an i5 running at 1.6ghz, deserialization is a bit slower currently, 420ms to deserialize those same objects, but that's almost exclusively allocation time.

November 18, 2013

Re: Reflections on Serialization APIs in D

Posted by Per Nordlöw
in reply to Orvid King

Per Nordlöw

Posted in reply to Orvid King

Is JSONSerialization somehow related to the upcoming std.serialization?
I feel that there is a big need for standardizing serialization in D. There are too many alternatives: dproto, msgpack, JSON, xml, etc should be made backends to the same frontend named std.serialization right?

/Per

On Sunday, 17 November 2013 at 21:37:35 UTC, Orvid King wrote:
> On 11/17/13, "Nordlöw" <per.nordlow@gmail.com> wrote:
>> In the road to develop a new kind of search engine that caches
>> types, statistics, etc about files and directories I'm currently
>> trying to implement persistent caching of my internal directory
>> tree using `msgpack-d`:
>>
>> Why doesn't `msgpack-d` and, from what I can see also,
>> `std.serialization` (Orange) support implementing *both* packing
>> and unpacking through one common template (member) function
>> overload like **Boost.Serialization** does?. For example
>> containers can be handled using this concise and elegant syntax
>> in C++11:
>>
>>          friend class boost::serialization::access;
>>          template<class Ar> void serialize(Ar& ar, const uint
>> version) {
>>              for (const auto& e : *this) { ar & e; }
>>          }
>>
>> This halves the code size aswell as removes the risk of making
>> the `pack` and `unpack` go out of sync.
>>
>
> I would suspect that the biggest reason is the limitations that that
> imposes on the underlying serialization implementation, as it would
> require that the underlying format support a minimum set of types.
>
> I have something similar(ish) in my serialization framework,
> (https://github.com/Orvid/JSONSerialization) that allows you to
> implement a custom format for each type, but I implement it as a pair
> of methods, toString and parse, allowing the underlying format to be
> able to support only serializing strings if it really wanted to. Also,
> currently my framework only supports JSON, but it's designed such that
> it would be insanely easy to add support for another format. It's also
> fast, very fast, mostly because I have managed to implement the JSON
> serialization methods entirely with no allocation at all being
> required. I'm able to serialize 100k objects in about 90ms on an i5
> running at 1.6ghz, deserialization is a bit slower currently, 420ms to
> deserialize those same objects, but that's almost exclusively
> allocation time.

November 18, 2013

Re: Reflections on Serialization APIs in D

Posted by Jacob Carlborg
in reply to Per Nordlöw

Jacob Carlborg

Posted in reply to Per Nordlöw

On 2013-11-18 12:25, "Per Nordlöw" <per.nordlow@gmail.com>" wrote:

> Is JSONSerialization somehow related to the upcoming std.serialization?

No.

> I feel that there is a big need for standardizing serialization in D.
> There are too many alternatives: dproto, msgpack, JSON, xml, etc should
> be made backends to the same frontend named std.serialization right?

The idea is that std.serialization can support many different archive types (backends).

-- 
/Jacob Carlborg

November 18, 2013

Re: Reflections on Serialization APIs in D

Posted by Per Nordlöw
in reply to Jacob Carlborg

Per Nordlöw

Posted in reply to Jacob Carlborg

Ok. That is great.

Thx.

On Monday, 18 November 2013 at 12:26:19 UTC, Jacob Carlborg wrote:
> On 2013-11-18 12:25, "Per Nordlöw" <per.nordlow@gmail.com>" wrote:
>
>> Is JSONSerialization somehow related to the upcoming std.serialization?
>
> No.
>
>> I feel that there is a big need for standardizing serialization in D.
>> There are too many alternatives: dproto, msgpack, JSON, xml, etc should
>> be made backends to the same frontend named std.serialization right?
>
> The idea is that std.serialization can support many different archive types (backends).

November 18, 2013

Re: Reflections on Serialization APIs in D

Posted by Orvid King
in reply to Per Nordlöw

Orvid King

Posted in reply to Per Nordlöw

On 11/18/13, "Per Nordlöw\" <per.nordlow@gmail.com>"@puremagic.com <"Per Nordlöw\" <per.nordlow@gmail.com>"@puremagic.com> wrote:
> Is JSONSerialization somehow related to the upcoming
> std.serialization?
> I feel that there is a big need for standardizing serialization
> in D. There are too many alternatives: dproto, msgpack, JSON,
> xml, etc should be made backends to the same frontend named
> std.serialization right?
>
> /Per
>
> On Sunday, 17 November 2013 at 21:37:35 UTC, Orvid King wrote:
>> On 11/17/13, "Nordlöw" <per.nordlow@gmail.com> wrote:
>>> In the road to develop a new kind of search engine that caches
>>> types, statistics, etc about files and directories I'm
>>> currently
>>> trying to implement persistent caching of my internal directory
>>> tree using `msgpack-d`:
>>>
>>> Why doesn't `msgpack-d` and, from what I can see also,
>>> `std.serialization` (Orange) support implementing *both*
>>> packing
>>> and unpacking through one common template (member) function
>>> overload like **Boost.Serialization** does?. For example
>>> containers can be handled using this concise and elegant syntax
>>> in C++11:
>>>
>>>          friend class boost::serialization::access;
>>>          template<class Ar> void serialize(Ar& ar, const uint
>>> version) {
>>>              for (const auto& e : *this) { ar & e; }
>>>          }
>>>
>>> This halves the code size aswell as removes the risk of making the `pack` and `unpack` go out of sync.
>>>
>>
>> I would suspect that the biggest reason is the limitations that
>> that
>> imposes on the underlying serialization implementation, as it
>> would
>> require that the underlying format support a minimum set of
>> types.
>>
>> I have something similar(ish) in my serialization framework,
>> (https://github.com/Orvid/JSONSerialization) that allows you to
>> implement a custom format for each type, but I implement it as
>> a pair
>> of methods, toString and parse, allowing the underlying format
>> to be
>> able to support only serializing strings if it really wanted
>> to. Also,
>> currently my framework only supports JSON, but it's designed
>> such that
>> it would be insanely easy to add support for another format.
>> It's also
>> fast, very fast, mostly because I have managed to implement the
>> JSON
>> serialization methods entirely with no allocation at all being
>> required. I'm able to serialize 100k objects in about 90ms on
>> an i5
>> running at 1.6ghz, deserialization is a bit slower currently,
>> 420ms to
>> deserialize those same objects, but that's almost exclusively
>> allocation time.
>
>

Yep, my goal with it is to be a possible contender for the place of std.serialization, I've designed it from the start to be able to be fast, but also easily usable, which is why toJSON and fromJSON exist, because they provide a very large usability improvement while still allowing it to be fast. It also is based on an abstracted api that allows you to interact with the serialization format in a way that is independent of what the actual format is.

November 18, 2013

Re: Reflections on Serialization APIs in D

Posted by Atila Neves
in reply to Orvid King

Atila Neves

Posted in reply to Orvid King

> I would suspect that the biggest reason is the limitations that that
> imposes on the underlying serialization implementation, as it would
> require that the underlying format support a minimum set of types.

I'm not sure that's actually true. I've been working on my own serialisation library in D that I plan to unleash on the announce forum soon and it does it in a manner described by the original poster. Even with custom serialisations, client code need only define one function for both directions.

The only reason I haven't announced it yet is because I wanted to be pure it's polished enough, but maybe I shouldn't wait.

Atila

November 18, 2013

Re: Reflections on Serialization APIs in D

Posted by Jacek Furmankiewicz
in reply to Atila Neves

Jacek Furmankiewicz

Posted in reply to Atila Neves

The reason I like Thrift is that it is backwards and forwards compatible.

Assuming in your schema you keep defining new fields as "optional",
old clients can read data from new producers as well
as new clients can read data from old producers.

Not too many binary serialization formats offer this type of flexibility to evolve your schema over time.

November 18, 2013

Re: Reflections on Serialization APIs in D

Posted by Orvid King
in reply to Atila Neves

Orvid King

Posted in reply to Atila Neves

On 11/18/13, Atila Neves <atila.neves@gmail.com> wrote:
>> I would suspect that the biggest reason is the limitations that
>> that
>> imposes on the underlying serialization implementation, as it
>> would
>> require that the underlying format support a minimum set of
>> types.
>
> I'm not sure that's actually true. I've been working on my own serialisation library in D that I plan to unleash on the announce forum soon and it does it in a manner described by the original poster. Even with custom serialisations, client code need only define one function for both directions.
>
> The only reason I haven't announced it yet is because I wanted to be pure it's polished enough, but maybe I shouldn't wait.
>
> Atila
>

I am curious as to how exactly that would work, does it determine the output format at compile-time or runtime? Does it specify the way it's serialized, or it's serialized representation? I'd also be curious about the performance impact it brings, if any. Depending on it's exact function it's perfectly possible that it could actually be faster than my toString/parse combo because mine requires the string be allocated from toString, due to the lack of knowledge of the underlying format.

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation