std.serialization: pre-voting review / discussion (page 11) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » std.serialization: pre-voting review / discussion (page 11)

August 20, 2013

Re: std.serialization: pre-voting review / discussion

Posted by Johannes Pfau
in reply to ilya-stromberg

Johannes Pfau

Posted in reply to ilya-stromberg

Am Tue, 20 Aug 2013 10:40:57 +0200
schrieb "ilya-stromberg" <ilya-stromberg-2009@yandex.ru>:

> On Tuesday, 20 August 2013 at 03:42:48 UTC, Tyler Jameson Little wrote:
> > On Monday, 19 August 2013 at 18:06:00 UTC, Johannes Pfau wrote:
> >> An important question regarding ranges for std.serialization
> >> is whether
> >> we want it to work as an InputRange or if it should _take_ an
> >> OutputRange. So the question is
> >>
> >> -----------------
> >> auto archive = new Archive();
> >> Serializer(archive).serialize(object);
> >> //Archive takes OutputRange, writes to it
> >> archive.writeTo(OutputRange);
> >>
> >> vs
> >>
> >> auto archive = new Archive()
> >> Serializer(archive).serialize(object);
> >> //Archive implements InputRange for ubyte[]
> >> foreach(ubyte[] data; archive) {}
> >> -----------------
> >>
> >> I'd use the first approach as it should be simpler to
> >> implement. The
> >> second approach would be useful if the ubyte[] elements were
> >> processed
> >> via other ranges (map, take, ...). But as binary data is
> >> usually
> >> not processed in this way but just stored to disk or sent over
> >> network
> >> (basically streaming operations) the first approach should be
> >> fine.
> >
> > +1 for the first way.
> 
> No, you are WRONG. InputRange is MORE flexible: it can be lazy or eager. OutputRange is only eager. As we know, lazy ranges is required if it's possible:
> 
> On Sunday, 18 August 2013 at 18:26:55 UTC, Dicebot wrote:
> > So as a review manager, I think voting should be delayed until API is ready to address lazy range-based work model. No actual implementation is required but
> >
> > 1) it should be possible to do it later without breaking user
> > code
> > 2) library should not make an assumption about implementation
> > being lazy or eager
> 
> We can use InputRange like this:
> 
> import std.file;
> auto archive = new Archive()
> Serializer(archive).serialize(object);
> //Archive implements InputRange for ubyte[]
> write("file", archive);

Yes, InputRange is more flexible, but it's also more difficult to
implement and less efficient:
What happens between the 'serialize' and the 'write' call? Archive
has to cache the data, either the original object or the final
produced data in an ubyte[] buffer.

> 
> Another benefit: we can process InputRange. For example, if we
> have
> ZipRange zip(InputRange)
> function, it's easy to compress data:
> write("file", zip(archive));
> 
> Another example: we would like to change output xml file and filter some data (because we already have it). Or we would like to transform output xml to the html web page. No problems:

Filtering is easier with an InputRange. "Zip-Streams" on the other hand should be OutputRanges and therefore work fine with both approaches.

> XmlRange transformXml(InputRange);
> write("file", transformXml(archive));
> 
> Ideas?

The question is are there real-world examples where this is useful. You have to gauge the utility of this approach against it's more complicated and less efficient implementation.

August 20, 2013

Re: std.serialization: pre-voting review / discussion

Posted by Dicebot
in reply to Jacob Carlborg

Dicebot

Posted in reply to Jacob Carlborg

Ok, I was trying to avoid expressing personal opinion until now and mostly keep track of comments of others but now that I have started reading docs/sources in details, will step down from review manager role for a moment and do some very subjective reviewing :)

-----------------------

Hot topic first. Ranges. As far as I can see it it is not about "lets stick range API whenever possible because it is the way Phobos does things". Key moment here to recognized use cases that are likely to require range-based interface and focus on them.

As far as I can see it there two important places where possibility for range-based API can be helpful - providing values for serialize and providing raw data to deserialize, as well as matching Archiver changes.

Former is relatively trivial - "serialize" should have an overload that accepts InputRange of monotyped values to take care of and provides ForwardRange as a result, which serializes values one-by-one lazily. Same goes to archiver.

Latter is a bit more interesting. It would have been cool if instead of accepting raw data chunk that matches deserialized object size serializer.deserialize could have accepted InputRange that provides sequence of any random chunks of raw data and use it to construct values on per-request basis, lazily. This will require maintaining a buffer that will keep unconsumed remainder of the last chunk and make some decisions about behavior in case of hitting "empty()" before getting enough data to deserialize object.

But it is not be something you should care about right now because only actual function/method signatures are needed with static asserts insides, actual implementation can be added later by anyone willing to spend time.

-----------------------

Now about my personal feeling about std.serialization as a potential user. Core functionality I'd like to see in such module is the ability to dump D data type state into arbitrary formats in a robust way that requires minimal interference from the user code. Something like what is done with toJSON/fromJSON in vibe.d API stuff but more generic when in comes to output formats and more robust when it comes to data hierarchies to load/store.

Judging by examples and documentation this is exactly what std.serialization does and I like it. It lacks some better output (Archiver) choices but it is more like Phobos fault.

What I really don't like is excessive amount of object in the API. For example, I have found no reason why I need to create serializer object to simply dump a struct state. It is both boilerplate and runtime overhead I can't justify. Only state serializer has is archiver - and it is simply collection of methods on its own. I prefer to be able to do something like `auto data = serialize!XmlArchiver(value);`

That is not something that would have made me vote against the inclusion (I think it is much needed anyway) but that may have discouraged me from using this part of Phobos and fall to some NIH syndrome.

I have found documentation complete enough to get a basic understanding personally but one thing that has caused some frustration is that docs don't make clear distinction between minimal stuff and extra features. For example, there is https://dl.dropboxusercontent.com/u/18386187/docs/std.serialization/std_serialization_serializable.html - my guess that it is only used if user wants to override default serialization method for an aggregate type. But documentation for it is written in such manner that it gives an impression that it is absolutely required.

-----------------------

Last thing is not really relevant but is more about general documentation problem. This may be the first package that makes use of new "package.d" system and it shows that we need some way to provide package-wide documentation to keep things clear. I guess for DDOC itself generating output from package.d is nothing special - but what about dlang.org? How hard will it be to update a documentation page to support own block for package roots?

August 20, 2013

Re: std.serialization: pre-voting review / discussion

Posted by Jacob Carlborg
in reply to Walter Bright

Jacob Carlborg

Posted in reply to Walter Bright

On 2013-08-20 10:01, Walter Bright wrote:

> Thank you, Jacob. It looks like you've put a lot of nice work into this.
>
> I've perused the documentation, and all I can think of is "What's a cubit?"
>
> http://www.youtube.com/watch?v=so9o3_daDZw
>
> I.e. there are 9 documentation pages of details. There's no obvious
> place to start, no overview, no explanation of what serialization is for
> and why I might want to use it and what's great about this
> implementation. At least none that I could find. Also needs some
> non-trivial canonical example code.
>
> Something that answers who what where when why and how would be
> immensely useful.

Yes, I need to add some overview documentation. There's still the problem of finding the overview.

> Some nits:
>
> https://dl.dropboxusercontent.com/u/18386187/docs/std.serialization/std_serialization_serializationexception.html
>
>
> Something went horribly wrong here:
> ----------------
> Parameters:
> Exception exception the exception exception to wrap
> ----------------

Hehe, yeah :)

> https://dl.dropboxusercontent.com/u/18386187/docs/std.serialization/std_serialization_registerwrapper.html
>
>
> Lacks an illuminating example.

That doesn't need to be ddoc comments at all. The whole module is declared "package". I would be really nice if ddoc could automatically hide anything that wasn't public or protected but still generate the documentation for package and private.

> https://dl.dropboxusercontent.com/u/18386187/docs/std.serialization/std_serialization_serializer.html
>
>
> When would I use a struct Array or a struct Slice?

Same as above. I'll see if they really have to be public.

> https://dl.dropboxusercontent.com/u/18386187/docs/std.serialization/std_serialization_attribute.html
>
>
> struct attribute should be capitalized. When would I use an attribute?
> Does this have anything to do with User Defined Attributes? Need a
> canonical example.

Same as above.

I have used lower case because I don't consider this a struct, yes technically it is. This is an attribute (UDA) and I think attributes should be lower case. Or rather it's supposed to be used on types to indicate they are UDA's:

@attribute struct foo {}

The reason for this is that I'm a bit disappointed in the implementation of UDA's in D. I would have liked to have some kind of entity that I can point to and say "this is an attribute". Currently all random values and types can be used as an UDA, I don't like that.

Same idea why to have "interface" and "abstract" keywords. It's possible to avoid these, i.e. C++, but I think it's a lot better to have them.

> https://dl.dropboxusercontent.com/u/18386187/docs/std.serialization/std_serialization_archives_archive.html
>
>
> Aren't interfaces already abstract? I.e. abstract is redundant.

I have no idea why "abstract" is added there. The definition looks like this:

https://github.com/jacob-carlborg/phobos/blob/serialization/std/serialization/archives/archive.d#L88

> The documentation defines an archive more or less as an archive. I still
> don't know what an archive is.

"The archive is the backend in the serialization process."

And

"The archive is responsible for archiving primitive types in the format chosen by the archive implementation. The archive ensures that all types are properly archived in a format that can be later unarchived."

> (E.g. a zip file is an archive - can this create zip files?)

Theoretically one can create an archive that serializes to a zip file, yes. Or rather the format used by zip. An archive shouldn't write to disk.

-- 
/Jacob Carlborg

August 20, 2013

Re: std.serialization: pre-voting review / discussion

Posted by Daniel Murphy
in reply to Dicebot

Daniel Murphy

Posted in reply to Dicebot

"Dicebot" <public@dicebot.lv> wrote in message news:luhuyerzmkebcltxhgjj@forum.dlang.org...
>
> What I really don't like is excessive amount of object in the API. For example, I have found no reason why I need to create serializer object to simply dump a struct state. It is both boilerplate and runtime overhead I can't justify. Only state serializer has is archiver - and it is simply collection of methods on its own. I prefer to be able to do something like `auto data = serialize!XmlArchiver(value);`
>

I think this is very important.  Simple uses should be as simple as possible.

August 20, 2013

Re: std.serialization: pre-voting review / discussion

Posted by Tyler Jameson Little
in reply to Daniel Murphy

Tyler Jameson Little

Posted in reply to Daniel Murphy

On Tuesday, 20 August 2013 at 13:44:01 UTC, Daniel Murphy wrote:
> "Dicebot" <public@dicebot.lv> wrote in message
> news:luhuyerzmkebcltxhgjj@forum.dlang.org...
>>
>> What I really don't like is excessive amount of object in the API. For example, I have found no reason why I need to create serializer object to simply dump a struct state. It is both boilerplate and runtime overhead I can't justify. Only state serializer has is archiver - and it is simply collection of methods on its own. I prefer to be able to do something like `auto data = serialize!XmlArchiver(value);`
>>
>
> I think this is very important.  Simple uses should be as simple as
> possible.

+1

This would enhance the 1-liner: write("file", serialize!XmlArchiver(InputRange));

We could even make nearly everything private except an isArchiver() template and serialize!().

August 20, 2013

Re: std.serialization: pre-voting review / discussion

Posted by Walter Bright
in reply to Jacob Carlborg

Walter Bright

Posted in reply to Jacob Carlborg

On 8/20/2013 6:28 AM, Jacob Carlborg wrote:
> That doesn't need to be ddoc comments at all. The whole module is declared
> "package". I would be really nice if ddoc could automatically hide anything that
> wasn't public or protected but still generate the documentation for package and
> private.

You can hide comments from ddoc by not starting them with /** but with /*

> I have no idea why "abstract" is added there. The definition looks like this:
>
> https://github.com/jacob-carlborg/phobos/blob/serialization/std/serialization/archives/archive.d#L88

Hmm. That looks then like a ddoc bug.

>> The documentation defines an archive more or less as an archive. I still
>> don't know what an archive is.
>
> "The archive is the backend in the serialization process."

Doesn't make sense to me. I would think the archive would be what is created, not the creator.

> And
>
> "The archive is responsible for archiving primitive types in the format chosen
> by the archive implementation. The archive ensures that all types are properly
> archived in a format that can be later unarchived."

What confuses me here is the conflation between the archiveR and the resulting archive, i.e. "an archiver creates an archive". Saying "archive creates the archive" is a bit of a disastrous conflation of the terms, as it makes the documentation a constant source of confusion.

>> (E.g. a zip file is an archive - can this create zip files?)
>
> Theoretically one can create an archive that serializes to a zip file, yes. Or
> rather the format used by zip. An archive shouldn't write to disk.

Some exposition of this is necessary, along with some comments along the line that the package provides a generic archiving interface, and a couple implementations X and Y of that interface, and that other implementations such as Z, the zip archiver, are possible.

August 20, 2013

Re: std.serialization: pre-voting review / discussion

Posted by Jesse Phillips
in reply to Jacob Carlborg

Jesse Phillips

Posted in reply to Jacob Carlborg

On Monday, 19 August 2013 at 16:29:54 UTC, Jacob Carlborg wrote:
> On 2013-08-19 17:41, Jesse Phillips wrote:
>
>> Code has moved to https://github.com/opticron/ProtocolBuffer
>
> Does it have any utility functions that are fairly standalone to handle the basic types, i.e. int, string, float and so on?

The data conversions are handled by
https://github.com/opticron/ProtocolBuffer/blob/master/conversion/pbbinary.d

August 20, 2013

Re: std.serialization: pre-voting review / discussion

Posted by Jacob Carlborg
in reply to Dicebot

Jacob Carlborg

Posted in reply to Dicebot

On 2013-08-20 15:12, Dicebot wrote:

> What I really don't like is excessive amount of object in the API. For
> example, I have found no reason why I need to create serializer object
> to simply dump a struct state. It is both boilerplate and runtime
> overhead I can't justify. Only state serializer has is archiver - and it
> is simply collection of methods on its own. I prefer to be able to do
> something like `auto data = serialize!XmlArchiver(value);`

I have been planning to add a function like that but just haven't got around doing it. This is just a convenience function that is easy to add.

Some reasons for having an object oriented API are:

* The serializer does have state. It stores information about what's serialized and keep track that an object is not stored more than once in the archive and similar things.

* When doing custom serialization the serializer is passed to the methods: https://github.com/jacob-carlborg/orange/wiki/Custom-Serialization

> I have found documentation complete enough to get a basic understanding
> personally but one thing that has caused some frustration is that docs
> don't make clear distinction between minimal stuff and extra features.
> For example, there is
> https://dl.dropboxusercontent.com/u/18386187/docs/std.serialization/std_serialization_serializable.html
> - my guess that it is only used if user wants to override default
> serialization method for an aggregate type. But documentation for it is
> written in such manner that it gives an impression that it is absolutely
> required.

Ok, I can try and clarify that.

-- 
/Jacob Carlborg

August 20, 2013

Re: std.serialization: pre-voting review / discussion

Posted by Jacob Carlborg
in reply to Tyler Jameson Little

Jacob Carlborg

Posted in reply to Tyler Jameson Little

On 2013-08-20 17:07, Tyler Jameson Little wrote:

> +1
>
> This would enhance the 1-liner: write("file",
> serialize!XmlArchiver(InputRange));
>
> We could even make nearly everything private except an isArchiver()
> template and serialize!().

The rest of the API is need for more advanced use cases.

-- 
/Jacob Carlborg

August 20, 2013

Re: std.serialization: pre-voting review / discussion

Posted by Jacob Carlborg
in reply to Walter Bright

Jacob Carlborg

Posted in reply to Walter Bright

On 2013-08-20 20:04, Walter Bright wrote:

> You can hide comments from ddoc by not starting them with /** but with /*

Yeah, I know that.

> Doesn't make sense to me. I would think the archive would be what is
> created, not the creator.

I guess it could be called "archiver", or do you have a better suggestion?

> What confuses me here is the conflation between the archiveR and the
> resulting archive, i.e. "an archiver creates an archive". Saying
> "archive creates the archive" is a bit of a disastrous conflation of the
> terms, as it makes the documentation a constant source of confusion.

Would calling it "archiver" or some other name be better?

> Some exposition of this is necessary, along with some comments along the
> line that the package provides a generic archiving interface, and a
> couple implementations X and Y of that interface, and that other
> implementations such as Z, the zip archiver, are possible.

I don't understand what's so confusing.

"This is the interface all archive implementations need to implement to be able to be used as an archive with the serializer".

-- 
/Jacob Carlborg

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation