August 19, 2013 Re: std.serialization: pre-voting review / discussion | ||||
---|---|---|---|---|
| ||||
Posted in reply to Jesse Phillips | On 2013-08-19 17:41, Jesse Phillips wrote: > Code has moved to https://github.com/opticron/ProtocolBuffer Does it have any utility functions that are fairly standalone to handle the basic types, i.e. int, string, float and so on? -- /Jacob Carlborg |
August 19, 2013 Re: std.serialization: pre-voting review / discussion | ||||
---|---|---|---|---|
| ||||
Posted in reply to Tyler Jameson Little | Am Mon, 19 Aug 2013 16:21:44 +0200 schrieb "Tyler Jameson Little" <beatgammit@gmail.com>: > On Monday, 19 August 2013 at 13:31:27 UTC, Jacob Carlborg wrote: > > On 2013-08-19 15:03, Dicebot wrote: > > > >> Great! Are there any difficulties with the input? > > > > It just that I don't clearly know how the code will need to look like, and I'm not particular familiar with implementing range based code. > > Maybe we need some kind of doc explaining the idiomatic usage of ranges? > > Personally, I'd like to do something like this: > > auto archive = new XmlArchive!(char); // create an XML archive > auto serializer = new Serializer(archive); // create the > serializer > serializer.serialize(foo); > > pipe(archive.out, someFile); Your "pipe" function is the same as std.algorithm.copy(InputRange, OutputRange) or std.range.put(OutputRange, InputRange); An important question regarding ranges for std.serialization is whether we want it to work as an InputRange or if it should _take_ an OutputRange. So the question is ----------------- auto archive = new Archive(); Serializer(archive).serialize(object); //Archive takes OutputRange, writes to it archive.writeTo(OutputRange); vs auto archive = new Archive() Serializer(archive).serialize(object); //Archive implements InputRange for ubyte[] foreach(ubyte[] data; archive) {} ----------------- I'd use the first approach as it should be simpler to implement. The second approach would be useful if the ubyte[] elements were processed via other ranges (map, take, ...). But as binary data is usually not processed in this way but just stored to disk or sent over network (basically streaming operations) the first approach should be fine. The first approach has the additional benefit that we can easily do streaming like this: ---------------- auto archive = new Archive(OutputRange); //Immediately write the data to the output range Serializer(archive).serialize([1,2,3]); ---------------- This is difficult to implement with the second approach as you somehow have to interleave calls to serialize and reads to the InputRange interface: ------------ Serializer(archive).serialize(1); foreach(data; archive) {stdout.write(data);} Serializer(archive).serialize(2); foreach(data; archive) {stdout.write(data);} ------------ And it's still less efficient than approach 1 as it has to keep an internal buffer. Another point is that "serialize" in the above example could be renamed to "put". This way Serializer would itself be an OutputRange which allows stuff like [1,2,3,4,5].stride(2).take(2).copy(archive); Then serialize could also accept InputRanges to allow this: archive.serialize([1,2,3,4,5].stride(2).take(2)); However, this use case is already covered by using copy so it would just be for convenience. |
August 19, 2013 Re: std.serialization: pre-voting review / discussion | ||||
---|---|---|---|---|
| ||||
Posted in reply to Johannes Pfau | 19-Aug-2013 22:05, Johannes Pfau пишет: > Am Mon, 19 Aug 2013 16:21:44 +0200 > schrieb "Tyler Jameson Little" <beatgammit@gmail.com>: > > > Another point is that "serialize" in the above example could be > renamed to "put". This way Serializer would itself be an OutputRange > which allows stuff like [1,2,3,4,5].stride(2).take(2).copy(archive); > +1 I totally expect serializer to be a sink. > Then serialize could also accept InputRanges to allow this: > archive.serialize([1,2,3,4,5].stride(2).take(2)); > However, this use case is already covered by using copy so it would just > be for convenience. > -- Dmitry Olshansky |
August 19, 2013 Re: std.serialization: pre-voting review / discussion | ||||
---|---|---|---|---|
| ||||
Posted in reply to bsd | On Monday, 19 August 2013 at 14:47:15 UTC, bsd wrote: > I think this versioning idea is more important for protocol buffers, msgpck, thrift like libraries that use a separate IDL schema and IDL-compiled code. std.serialization uses the D code itself to serialize so the version is practically dictated by the user. It may as well be manually handled....as long as it throws/returns error and doesn't crash if one tries to deserialize an archive type into a different/modified D type. > > From memory the Protocol Buffers versioning is to ensure schema generated code and library are compatible. You get compile errors if you try to compile IDL generated code with a newer version of the library. Similarly you get runtime errors if you deserialize data that was serialized with an older version of the library. This is all from memory so I could be wrong... Seems like your memory has indeed faded a bit. ;) Versioning is an integral idea of formats like Protobuf and Thrift. For example, see the "A bit of history" section right on the doc overview page. [1] You might also want to read through the (rather dated) Thrift whitepaper to get an idea about what the design constraints for it were. [2] The main point is that when you have deployed services at the scale Google or Facebook work with, you can't just upgrade all involved parties simultaneously on a schema change. So, having to support multiple versions running along each other is pretty much a given, and the best way to deal with that is to build it right into your protocols. David [1] https://developers.google.com/protocol-buffers/docs/overview [2] http://thrift.apache.org/static/files/thrift-20070401.pdf |
August 19, 2013 Re: std.serialization: pre-voting review / discussion | ||||
---|---|---|---|---|
| ||||
Posted in reply to David Nadlinger | On Monday, 19 August 2013 at 19:47:32 UTC, David Nadlinger wrote:
> Versioning is an integral idea of formats like Protobuf and Thrift. For example, see the "A bit of history" section right on the doc overview page. [1] You might also want to read through the (rather dated) Thrift whitepaper to get an idea about what the design constraints for it were. [2]
By the way, to be honest, this is also the main point that makes me feel uneasy about including Orbit in Phobos at this point: Sure, it has been around for some time, but as far as I can tell, not a lot of people are using it right now, and what seems to be entirely missing from the docs is a clear design rationale, outlining its goals and explaining how Orbit compares to well-known existing solutions.
It seems to me that a large part of the discussion in this thread can be attributed to that fact, i.e. a lack of understanding/agreement what the module is supposed to be in the first place.
David
|
August 20, 2013 Re: std.serialization: pre-voting review / discussion | ||||
---|---|---|---|---|
| ||||
Posted in reply to David Nadlinger |
>
> Seems like your memory has indeed faded a bit. ;)
>
> Versioning is an integral idea of formats like Protobuf and Thrift. For example, see the "A bit of history" section right on the doc overview page. [1] You might also want to read through the (rather dated) Thrift whitepaper to get an idea about what the design constraints for it were. [2]
>
> The main point is that when you have deployed services at the scale Google or Facebook work with, you can't just upgrade all involved parties simultaneously on a schema change. So, having to support multiple versions running along each other is pretty much a given, and the best way to deal with that is to build it right into your protocols.
>
> David
>
>
> [1] https://developers.google.com/protocol-buffers/docs/overview
> [2] http://thrift.apache.org/static/files/thrift-20070401.pdf
Getting old! :-)
Thanks for the heads up.
|
August 20, 2013 Re: std.serialization: pre-voting review / discussion | ||||
---|---|---|---|---|
| ||||
Posted in reply to Johannes Pfau | On Monday, 19 August 2013 at 18:06:00 UTC, Johannes Pfau wrote: > Am Mon, 19 Aug 2013 16:21:44 +0200 > schrieb "Tyler Jameson Little" <beatgammit@gmail.com>: > >> On Monday, 19 August 2013 at 13:31:27 UTC, Jacob Carlborg wrote: >> > On 2013-08-19 15:03, Dicebot wrote: >> > >> >> Great! Are there any difficulties with the input? >> > >> > It just that I don't clearly know how the code will need to look like, and I'm not particular familiar with implementing range based code. >> >> Maybe we need some kind of doc explaining the idiomatic usage of ranges? >> >> Personally, I'd like to do something like this: >> >> auto archive = new XmlArchive!(char); // create an XML archive >> auto serializer = new Serializer(archive); // create the serializer >> serializer.serialize(foo); >> >> pipe(archive.out, someFile); > > Your "pipe" function is the same as std.algorithm.copy(InputRange, > OutputRange) or std.range.put(OutputRange, InputRange); Right, for some reason I couldn't find it... Moot point though. > An important question regarding ranges for std.serialization is whether > we want it to work as an InputRange or if it should _take_ an > OutputRange. So the question is > > ----------------- > auto archive = new Archive(); > Serializer(archive).serialize(object); > //Archive takes OutputRange, writes to it > archive.writeTo(OutputRange); > > vs > > auto archive = new Archive() > Serializer(archive).serialize(object); > //Archive implements InputRange for ubyte[] > foreach(ubyte[] data; archive) {} > ----------------- > > I'd use the first approach as it should be simpler to implement. The > second approach would be useful if the ubyte[] elements were processed > via other ranges (map, take, ...). But as binary data is usually > not processed in this way but just stored to disk or sent over network > (basically streaming operations) the first approach should be fine. +1 for the first way. > The first approach has the additional benefit that we can easily do > streaming like this: > ---------------- > auto archive = new Archive(OutputRange); > //Immediately write the data to the output range > Serializer(archive).serialize([1,2,3]); > ---------------- This can make a nice one-liner for the general case: Serializer(new Archive(OutputRange)).serialize(...); > Another point is that "serialize" in the above example could be > renamed to "put". This way Serializer would itself be an OutputRange > which allows stuff like [1,2,3,4,5].stride(2).take(2).copy(archive); > > Then serialize could also accept InputRanges to allow this: > archive.serialize([1,2,3,4,5].stride(2).take(2)); > However, this use case is already covered by using copy so it would just > be for convenience. This is nice, but I think I like serialize() better. I also don't think serializing a range is it's primary purpose, so it doesn't make a lot of sense to optimize for the uncommon case. |
August 20, 2013 Re: std.serialization: pre-voting review / discussion | ||||
---|---|---|---|---|
| ||||
Posted in reply to Dicebot | On 8/12/2013 6:27 AM, Dicebot wrote: > Documentation: > https://dl.dropboxusercontent.com/u/18386187/docs/std.serialization/index.html Thank you, Jacob. It looks like you've put a lot of nice work into this. I've perused the documentation, and all I can think of is "What's a cubit?" http://www.youtube.com/watch?v=so9o3_daDZw I.e. there are 9 documentation pages of details. There's no obvious place to start, no overview, no explanation of what serialization is for and why I might want to use it and what's great about this implementation. At least none that I could find. Also needs some non-trivial canonical example code. Something that answers who what where when why and how would be immensely useful. Some nits: https://dl.dropboxusercontent.com/u/18386187/docs/std.serialization/std_serialization_serializationexception.html Something went horribly wrong here: ---------------- Parameters: Exception exception the exception exception to wrap ---------------- https://dl.dropboxusercontent.com/u/18386187/docs/std.serialization/std_serialization_registerwrapper.html Lacks an illuminating example. https://dl.dropboxusercontent.com/u/18386187/docs/std.serialization/std_serialization_serializer.html When would I use a struct Array or a struct Slice? https://dl.dropboxusercontent.com/u/18386187/docs/std.serialization/std_serialization_attribute.html struct attribute should be capitalized. When would I use an attribute? Does this have anything to do with User Defined Attributes? Need a canonical example. https://dl.dropboxusercontent.com/u/18386187/docs/std.serialization/std_serialization_archives_archive.html Aren't interfaces already abstract? I.e. abstract is redundant. The documentation defines an archive more or less as an archive. I still don't know what an archive is. (E.g. a zip file is an archive - can this create zip files?) |
August 20, 2013 Re: std.serialization: pre-voting review / discussion | ||||
---|---|---|---|---|
| ||||
Posted in reply to David Nadlinger | On 8/18/2013 9:33 AM, David Nadlinger wrote:
> Having a system that regularly, automatically runs the test suites of several
> larger, well-known D projects with the results being readily available to the
> DMD/druntime/Phobos teams would certainly help. But it's also not ideal, since
> if a project starts to fail, the exact nature of the issue (regression in DMD or
> bug in the project, and if the former, a minimal test case) can often be hard to
> track down for somebody not already familiar with the code base.
That's exactly the problem. If these large projects are incorporated into the autotester, who is going to isolate/fix problems arising with them?
The test suite is designed to be a collection of already-isolated issues, so understanding what went wrong shouldn't be too difficult. Note that already it is noticeably much harder to debug a phobos unit test gone awry than the other tests. A full blown project that nobody understands would fare far worse.
(And the other problem, of course, is the test suite is designed to be runnable fairly quickly. Compiling some other large project and running its test suite can make the autotester much less useful when the turnaround time increases.)
Putting large projects into the autotester has the implication that development and support of those projects has been ceded to the core dev team, i.e. who is responsible for it has been badly blurred.
|
August 20, 2013 Re: std.serialization: pre-voting review / discussion | ||||
---|---|---|---|---|
| ||||
Posted in reply to Tyler Jameson Little | On Tuesday, 20 August 2013 at 03:42:48 UTC, Tyler Jameson Little wrote: > On Monday, 19 August 2013 at 18:06:00 UTC, Johannes Pfau wrote: >> An important question regarding ranges for std.serialization is whether >> we want it to work as an InputRange or if it should _take_ an >> OutputRange. So the question is >> >> ----------------- >> auto archive = new Archive(); >> Serializer(archive).serialize(object); >> //Archive takes OutputRange, writes to it >> archive.writeTo(OutputRange); >> >> vs >> >> auto archive = new Archive() >> Serializer(archive).serialize(object); >> //Archive implements InputRange for ubyte[] >> foreach(ubyte[] data; archive) {} >> ----------------- >> >> I'd use the first approach as it should be simpler to implement. The >> second approach would be useful if the ubyte[] elements were processed >> via other ranges (map, take, ...). But as binary data is usually >> not processed in this way but just stored to disk or sent over network >> (basically streaming operations) the first approach should be fine. > > +1 for the first way. No, you are WRONG. InputRange is MORE flexible: it can be lazy or eager. OutputRange is only eager. As we know, lazy ranges is required if it's possible: On Sunday, 18 August 2013 at 18:26:55 UTC, Dicebot wrote: > So as a review manager, I think voting should be delayed until API is ready to address lazy range-based work model. No actual implementation is required but > > 1) it should be possible to do it later without breaking user code > 2) library should not make an assumption about implementation being lazy or eager We can use InputRange like this: import std.file; auto archive = new Archive() Serializer(archive).serialize(object); //Archive implements InputRange for ubyte[] write("file", archive); Another benefit: we can process InputRange. For example, if we have ZipRange zip(InputRange) function, it's easy to compress data: write("file", zip(archive)); Another example: we would like to change output xml file and filter some data (because we already have it). Or we would like to transform output xml to the html web page. No problems: XmlRange transformXml(InputRange); write("file", transformXml(archive)); Ideas? |
Copyright © 1999-2021 by the D Language Foundation