Range interface for std.serialization (page 3) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » Range interface for std.serialization (page 3)

August 23, 2013

Re: Range interface for std.serialization

Posted by Tyler Jameson Little
in reply to Dicebot

Tyler Jameson Little

Posted in reply to Dicebot

On Thursday, 22 August 2013 at 14:48:57 UTC, Dicebot wrote:
> On Thursday, 22 August 2013 at 03:13:46 UTC, Tyler Jameson Little wrote:
>> On Wednesday, 21 August 2013 at 20:21:49 UTC, Dicebot wrote:
>>> It should be range of strings - one call to popFront should serialize one object from input object range and provide matching string buffer.
>>
>> I don't like this because it still caches the whole object into memory. In a memory-restricted application, this is unacceptable.
>
> Well, in memory-restricted applications having large object at all is unacceptable. Rationale is that you hardly ever want half-deserialized object. If environment is very restrictive, smaller objects will be used anyway (list of smaller objects).

It seems you and I are trying to solve two very different problems. Perhaps if I explain my use-case, it'll make things clearer.

I have a server that serializes data from a socket, processes that data, then updates internal state and sends notifications to clients (involves serialization as well).

When new clients connect, they need all of this internal state, so the easiest way to do this is to create one large object out of all of the smaller objects:

    class Widget {
    }

    class InternalState {
        Widget[string] widgets;
        ... other data here
    }

InternalState isn't very big by itself; it just has an associative array of Widget pointers with some other rather small data. When serialized, however, this can get quite large. Since archive formats are orders of magnitude less-efficient than in-memory stores, caching the archived version of the internal state can be prohibitively expensive.

Let's say the serialized form of the internal state is 5MB, and I have 128MB available, while 50MB or so is used by the application. This leaves about 70MB, so I can only support 14 connected clients.

With a streaming serializer (per object), I'll get that 5MB down to a few hundred KB and I can support many more clients.

>> ...
>> There's no reason why the serializer can't output this in chunks
>
> Outputting on its own is not useful to discuss - in pipe model output matches input. What is the point in outputting partial chunks of serialized object if you still need to provide it as a whole to the input?

This only makes sense if you are deserializing right after serializing, which is *not* a common thing to do.

Also, it's much more likely to need to serialize a single object (as in a REST API, 3d model parser [think COLLADA] or config parser). Providing a range seems to fit only a small niche, people that need to dump the state of the system. With single-object serialization and chunked output, you can define your own range to get the same effect, but with an API as you detailed, you can't avoid memory problems without going outside std.

August 25, 2013

Re: Range interface for std.serialization

Posted by Daniel Murphy
in reply to Dicebot

Daniel Murphy

Posted in reply to Dicebot

"Dicebot" <public@dicebot.lv> wrote in message news:niufnloijwvjifusgisn@forum.dlang.org...
> On Thursday, 22 August 2013 at 17:39:19 UTC, Johannes Pfau wrote:
>> Yes, but the important point is that Serializer is _not_ an InputRange of serialized data. Instead it _uses_ a OutputRange / Stream internally.
>
> Shame on me. I have completely misunderstood you and though you want to make serializer OutputRange itself.
>
> Your examples make a lot sense and I do agree it is a use case worth supporting. Need some more time to imagine how that may impact API in general.

It seems to me that if you give serializer a 'put' method, it _will_ be a valid output range.

August 25, 2013

Re: Range interface for std.serialization

Posted by Dmitry Olshansky
in reply to Daniel Murphy

Dmitry Olshansky

Posted in reply to Daniel Murphy

25-Aug-2013 12:20, Daniel Murphy пишет:
> "Dicebot" <public@dicebot.lv> wrote in message
> news:niufnloijwvjifusgisn@forum.dlang.org...
>> On Thursday, 22 August 2013 at 17:39:19 UTC, Johannes Pfau wrote:
>>> Yes, but the important point is that Serializer is _not_ an InputRange
>>> of serialized data. Instead it _uses_ a OutputRange / Stream
>>> internally.
>>
>> Shame on me. I have completely misunderstood you and though you want to
>> make serializer OutputRange itself.
>>
>> Your examples make a lot sense and I do agree it is a use case worth
>> supporting. Need some more time to imagine how that may impact API in
>> general.
>
> It seems to me that if you give serializer a 'put' method, it _will_ be a
> valid output range.
>
Same thoughts here.
Serializer is an output range for pretty much anything (that is serializable). Literally isOutputRange!T would be true for a whole lot of things, making it possible to dumping any ranges of Ts via copy.
Just make its put method work on a variety of types and you have it.


-- 
Dmitry Olshansky

August 25, 2013

Re: Range interface for std.serialization

Posted by Dicebot
in reply to Dmitry Olshansky

Dicebot

Posted in reply to Dmitry Olshansky

On Sunday, 25 August 2013 at 08:36:40 UTC, Dmitry Olshansky wrote:
> Same thoughts here.
> Serializer is an output range for pretty much anything (that is serializable). Literally isOutputRange!T would be true for a whole lot of things, making it possible to dumping any ranges of Ts via copy.
> Just make its put method work on a variety of types and you have it.

Can't it be both OutputRange itself and provide InputRange via `serialize` call (for filters & similar pipe processing)?

August 25, 2013

Re: Range interface for std.serialization

Posted by Dmitry Olshansky
in reply to Dicebot

Dmitry Olshansky

Posted in reply to Dicebot

25-Aug-2013 23:15, Dicebot пишет:
> On Sunday, 25 August 2013 at 08:36:40 UTC, Dmitry Olshansky wrote:
>> Same thoughts here.
>> Serializer is an output range for pretty much anything (that is
>> serializable). Literally isOutputRange!T would be true for a whole lot
>> of things, making it possible to dumping any ranges of Ts via copy.
>> Just make its put method work on a variety of types and you have it.
>
> Can't it be both OutputRange itself and provide InputRange via
> `serialize` call (for filters & similar pipe processing)?

I see that you potentially want to say compress serialized data on the fly via some range-based compressor. Or send over network... with some byChunk(favoriteBufferSize) or rather some kind of adapter that outputs no less then X bytes if not at end. Then indeed it becomes awkward to model a 'sink' kind of range as it is a transformer (no matter how convenient it makes putting stuff into it).

It looks like the serializer has 2 "ends" - one accepts any element type, the other produces ubyte[] chunks.

A problem is how to connect that output end, or more precisely this puts our "ranges are the pipeline" idea into an awkward situation. Basically on the front end data may arrive in various chunks and ditto on the output. More then that it isn't just an input range translation to ubyte[] (at least that'd be very ineffective and restrictive). But I have an idea.

With all that said I get to the main point hopefully. here is an example far simpler then serialization.

No matter how we look at this there has to be a way to connect 2 sinks, say I want to do:

//can only use output range with it
formattedWrite(compressor, "Hey, %s !\n", name);

And have said compressor use LZMA on the data that is put into it, but it has to go somewhere. Thus problem of say compressing formatted text is not solved by input range, nor is the filtering of said text before 'put'-ing it somewhere.

What's lacking is a way to connect a sink to another sink.

My view of it is:
auto app = appender!(ubyte[])();
//thus compression is an output range wrapper
auto compressor = compress!LZMA(app);

In other words an output range could be a filter, or rather forwarder of the transformed result to some other output range. And we need this not only for serialization (though formattedWrite can arguably  be seen as a serialization) but anytime we have to turn heterogeneous input into homogeneous output and post-process THAT output.

TL;DR: Simply put - make serialization an output range, and set an example by making archiver the first output range adapter.

Adapting the code by Jacob (Alternative AO2)

auto archiver = new XmlArchive!(char)(outputRange);
auto serializer = new Serializer(archiver);
serializer.put(new Object);
serializer.put([1, 2, 3, 4]); //mix and match stuff as you see fit

And even
copy(iota(1, 10), serializer);

Would all work just fine.

-- 
Dmitry Olshansky

August 26, 2013

Re: Range interface for std.serialization

Posted by Jacob Carlborg
in reply to Dmitry Olshansky

Jacob Carlborg

Posted in reply to Dmitry Olshansky

On 2013-08-25 22:50, Dmitry Olshansky wrote:

> Adapting the code by Jacob (Alternative AO2)
>
> auto archiver = new XmlArchive!(char)(outputRange);
> auto serializer = new Serializer(archiver);
> serializer.put(new Object);
> serializer.put([1, 2, 3, 4]); //mix and match stuff as you see fit
>
> And even
> copy(iota(1, 10), serializer);
>
> Would all work just fine.

I'm still worried about how to get out deserialized objects, especially if they are serialized as a range.

-- 
/Jacob Carlborg

August 26, 2013

Re: Range interface for std.serialization

Posted by Dmitry Olshansky
in reply to Jacob Carlborg

Dmitry Olshansky

Posted in reply to Jacob Carlborg

26-Aug-2013 11:07, Jacob Carlborg пишет:
> On 2013-08-25 22:50, Dmitry Olshansky wrote:
>
>> Adapting the code by Jacob (Alternative AO2)
>>
>> auto archiver = new XmlArchive!(char)(outputRange);
>> auto serializer = new Serializer(archiver);
>> serializer.put(new Object);
>> serializer.put([1, 2, 3, 4]); //mix and match stuff as you see fit
>>
>> And even
>> copy(iota(1, 10), serializer);
>>
>> Would all work just fine.
>
> I'm still worried about how to get out deserialized objects, especially
> if they are serialized as a range.

Array or any container should do for a range.

I'm not 100% sure what kind of interface to use, but Serializer and Deserializer should not be "shipped in one package" as in one class.
The two a mirror each other but in essence are always used separately.
Ditto about archiver/unarchiver they simply provide different functionality and it makes no sense to reuse the same object in 2 ways.

Hence alternative 1 (unwrapping that snippet backwards):

//BTW why go new & classes here(?)
auto unarchiver = new XmlUnarchiver(someCharRange);
auto deserialzier = new Deserializer(unarchiver);
auto obj = deserializer.unpack!Object;

//for sequence/array in underlying format it would use any container
List!int list = deserializer.unpack!(List!int);
int[] arr = deserializer.unpack!(int[]);

IMO looks quite nice. The problem of how exactly should a container be filled is open though.

So another alternative being more generic (the above could be consider convenience over this one):

Vector!int ints;
deserilaizer.unpackRange!(int)(x => ints.pushBack(x));

Basically unpack next sequence of data (as serialized) by feeding it to an output range using element type as param. And a simple lambda qualifies as an output range.

Also take a look at the new digest API. I have an understanding that serialization would do well to take the same general strategy - concrete archivers as structs + polymorphic interface and wrappers on top.

I'm still missing something about separation of archiver and serializer but in my mind these are tightly coupled and may as well be one entity.
One tough little thing to take care of in std.serialization is how to reduce amount of constant overhead (indirections, function calls, branches etc.) per item. Polymorphism is easily achieved on top of fast and tight core the other way around is impossible.

-- 
Dmitry Olshansky

August 26, 2013

Re: Range interface for std.serialization

Posted by Dmitry Olshansky
in reply to Dmitry Olshansky

Dmitry Olshansky

Posted in reply to Dmitry Olshansky

26-Aug-2013 00:50, Dmitry Olshansky пишет:
> 25-Aug-2013 23:15, Dicebot пишет:
>> On Sunday, 25 August 2013 at 08:36:40 UTC, Dmitry Olshansky wrote:
>>> Same thoughts here.
>>> Serializer is an output range for pretty much anything (that is
>>> serializable). Literally isOutputRange!T would be true for a whole lot
>>> of things, making it possible to dumping any ranges of Ts via copy.
>>> Just make its put method work on a variety of types and you have it.
>>
>> Can't it be both OutputRange itself and provide InputRange via
>> `serialize` call (for filters & similar pipe processing)?
>
[...]

> What's lacking is a way to connect a sink to another sink.
>
> My view of it is:
> auto app = appender!(ubyte[])();
> //thus compression is an output range wrapper
> auto compressor = compress!LZMA(app);
>
> In other words an output range could be a filter, or rather forwarder of
> the transformed result to some other output range. And we need this not
> only for serialization (though formattedWrite can arguably  be seen as a
> serialization) but anytime we have to turn heterogeneous input into
> homogeneous output and post-process THAT output.

On the subject of it we can do some cool wonders by providing such adapters, example - calculate SHA1 hash of a message on the fly:

https://gist.github.com/blackwhale/6339932

As a proof of concept to show the power that output range adapters possess :)

Sadly it hits a bug in LockingTextWriter, namely destructor fails on T.init (a usual oversight). Patch:

--- a/std/stdio.d
+++ b/std/stdio.d
@@ -1517,9 +1517,12 @@ $(D Range) that locks the file and allows fast writing to it.

         ~this()
         {
-            FUNLOCK(fps);
-            fps = null;
-            handle = null;
+            if(fps)
+            {
+                FUNLOCK(fps);
+                fps = null;
+                handle = null;
+            }
         }

         this(this)

-- 
Dmitry Olshansky

August 26, 2013

Re: Range interface for std.serialization

Posted by Jacob Carlborg
in reply to Dmitry Olshansky

Jacob Carlborg

Posted in reply to Dmitry Olshansky

On 2013-08-26 11:23, Dmitry Olshansky wrote:

> Array or any container should do for a range.

But then it won't be lazy, or perhaps that's not a problem, since the whole deserializing should be lazy.

> I'm not 100% sure what kind of interface to use, but Serializer and
> Deserializer should not be "shipped in one package" as in one class.
> The two a mirror each other but in essence are always used separately.
> Ditto about archiver/unarchiver they simply provide different
> functionality and it makes no sense to reuse the same object in 2 ways.
>
> Hence alternative 1 (unwrapping that snippet backwards):
>
> //BTW why go new & classes here(?)

The reason to have classes is that I need reference types. I need to pass the serializer to "toData" and "fromData" methods that can be implemented on the objects being (de)serialized. I guess they could take the argument by ref. Is it possible to force that?

> auto unarchiver = new XmlUnarchiver(someCharRange);
> auto deserialzier = new Deserializer(unarchiver);
> auto obj = deserializer.unpack!Object;
>
> //for sequence/array in underlying format it would use any container
> List!int list = deserializer.unpack!(List!int);
> int[] arr = deserializer.unpack!(int[]);
>
> IMO looks quite nice. The problem of how exactly should a container be
> filled is open though.
>
> So another alternative being more generic (the above could be consider
> convenience over this one):
>
> Vector!int ints;
> deserilaizer.unpackRange!(int)(x => ints.pushBack(x));
>
> Basically unpack next sequence of data (as serialized) by feeding it to
> an output range using element type as param. And a simple lambda
> qualifies as an output range.

Here we have yet another suggestion for an API. The whole reason for this thread is that people weren't happy with the current interface, i.e. not range based. Now we got probably just as many suggestions as people who have answered to this thread. I still don't know hot the API should look like.

> Also take a look at the new digest API. I have an understanding that
> serialization would do well to take the same general strategy - concrete
> archivers as structs + polymorphic interface and wrappers on top.

I could have a look at that.

> I'm still missing something about separation of archiver and serializer
> but in my mind these are tightly coupled and may as well be one entity.
> One tough little thing to take care of in std.serialization is how to
> reduce amount of constant overhead (indirections, function calls,
> branches etc.) per item. Polymorphism is easily achieved on top of fast
> and tight core the other way around is impossible.

-- 
/Jacob Carlborg

August 26, 2013

Re: Range interface for std.serialization

Posted by Dicebot
in reply to Dmitry Olshansky

Dicebot

Posted in reply to Dmitry Olshansky

On Monday, 26 August 2013 at 09:23:32 UTC, Dmitry Olshansky wrote:
> I'm still missing something about separation of archiver and serializer but in my mind these are tightly coupled and may as well be one entity.

For me distinction was very natural. `(de)serializer` is something that takes care of D type introspection and provides it in simplified form to `(de)archiver` which embeds actual format knowledge. Former can get pretty tricky in D so it makes some sense to keep it separate.

I can't really add anything on ranges part of your comments - sounds like you have a better "big picture" anyway :)

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation