August 19, 2013
On 2013-08-19 17:41, Jesse Phillips wrote:

> Code has moved to https://github.com/opticron/ProtocolBuffer

Does it have any utility functions that are fairly standalone to handle the basic types, i.e. int, string, float and so on?

-- 
/Jacob Carlborg
August 19, 2013
Am Mon, 19 Aug 2013 16:21:44 +0200
schrieb "Tyler Jameson Little" <beatgammit@gmail.com>:

> On Monday, 19 August 2013 at 13:31:27 UTC, Jacob Carlborg wrote:
> > On 2013-08-19 15:03, Dicebot wrote:
> >
> >> Great! Are there any difficulties with the input?
> >
> > It just that I don't clearly know how the code will need to look like, and I'm not particular familiar with implementing range based code.
> 
> Maybe we need some kind of doc explaining the idiomatic usage of ranges?
> 
> Personally, I'd like to do something like this:
> 
>      auto archive = new XmlArchive!(char); // create an XML archive
>      auto serializer = new Serializer(archive); // create the
> serializer
>      serializer.serialize(foo);
> 
>      pipe(archive.out, someFile);

Your "pipe" function is the same as std.algorithm.copy(InputRange,
OutputRange) or std.range.put(OutputRange, InputRange);



An important question regarding ranges for std.serialization is whether we want it to work as an InputRange or if it should _take_ an OutputRange. So the question is

-----------------
auto archive = new Archive();
Serializer(archive).serialize(object);
//Archive takes OutputRange, writes to it
archive.writeTo(OutputRange);

vs

auto archive = new Archive()
Serializer(archive).serialize(object);
//Archive implements InputRange for ubyte[]
foreach(ubyte[] data; archive) {}
-----------------

I'd use the first approach as it should be simpler to implement. The
second approach would be useful if the ubyte[] elements were processed
via other ranges (map, take, ...). But as binary data is usually
not processed in this way but just stored to disk or sent over network
(basically streaming operations) the first approach should be fine.

The first approach has the additional benefit that we can easily do streaming like this:
----------------
auto archive = new Archive(OutputRange);
//Immediately write the data to the output range
Serializer(archive).serialize([1,2,3]);
----------------

This is difficult to implement with the second approach as you somehow have to interleave calls to serialize and reads to the InputRange interface:
------------
Serializer(archive).serialize(1);
foreach(data; archive) {stdout.write(data);}
Serializer(archive).serialize(2);
foreach(data; archive) {stdout.write(data);}
------------
And it's still less efficient than approach 1 as it has to keep an internal buffer.

Another point is that "serialize" in the above example could be
renamed to "put". This way Serializer would itself be an OutputRange
which allows stuff like [1,2,3,4,5].stride(2).take(2).copy(archive);

Then serialize could also accept InputRanges to allow this:
archive.serialize([1,2,3,4,5].stride(2).take(2));
However, this use case is already covered by using copy so it would just
be for convenience.
August 19, 2013
19-Aug-2013 22:05, Johannes Pfau пишет:
> Am Mon, 19 Aug 2013 16:21:44 +0200
> schrieb "Tyler Jameson Little" <beatgammit@gmail.com>:
>
>
> Another point is that "serialize" in the above example could be
> renamed to "put". This way Serializer would itself be an OutputRange
> which allows stuff like [1,2,3,4,5].stride(2).take(2).copy(archive);
>

+1
I totally expect serializer to be a sink.

> Then serialize could also accept InputRanges to allow this:
> archive.serialize([1,2,3,4,5].stride(2).take(2));
> However, this use case is already covered by using copy so it would just
> be for convenience.
>


-- 
Dmitry Olshansky
August 19, 2013
On Monday, 19 August 2013 at 14:47:15 UTC, bsd wrote:
> I think this versioning idea is more important for protocol buffers, msgpck, thrift like libraries that use a separate IDL schema and IDL-compiled code. std.serialization uses the D code itself to serialize so the version is practically dictated by the user. It may as well be manually handled....as long as it throws/returns error and doesn't crash if one tries to deserialize an archive type into a different/modified D type.
>
> From memory the Protocol Buffers versioning is to ensure schema generated code and library are compatible. You get compile errors if you try to compile IDL generated code with a newer version of the library. Similarly you get runtime errors if you deserialize data that was serialized with an older version of the library. This is all from memory so I could be wrong...

Seems like your memory has indeed faded a bit. ;)

Versioning is an integral idea of formats like Protobuf and Thrift. For example, see the "A bit of history" section right on the doc overview page. [1] You might also want to read through the (rather dated) Thrift whitepaper to get an idea about what the design constraints for it were. [2]

The main point is that when you have deployed services at the scale Google or Facebook work with, you can't just upgrade all involved parties simultaneously on a schema change. So, having to support multiple versions running along each other is pretty much a given, and the best way to deal with that is to build it right into your protocols.

David


[1] https://developers.google.com/protocol-buffers/docs/overview
[2] http://thrift.apache.org/static/files/thrift-20070401.pdf
August 19, 2013
On Monday, 19 August 2013 at 19:47:32 UTC, David Nadlinger wrote:
> Versioning is an integral idea of formats like Protobuf and Thrift. For example, see the "A bit of history" section right on the doc overview page. [1] You might also want to read through the (rather dated) Thrift whitepaper to get an idea about what the design constraints for it were. [2]

By the way, to be honest, this is also the main point that makes me feel uneasy about including Orbit in Phobos at this point: Sure, it has been around for some time, but as far as I can tell, not a lot of people are using it right now, and what seems to be entirely missing from the docs is a clear design rationale, outlining its goals and explaining how Orbit compares to well-known existing solutions.

It seems to me that a large part of the discussion in this thread can be attributed to that fact, i.e. a lack of understanding/agreement what the module is supposed to be in the first place.

David
August 20, 2013
>
> Seems like your memory has indeed faded a bit. ;)
>
> Versioning is an integral idea of formats like Protobuf and Thrift. For example, see the "A bit of history" section right on the doc overview page. [1] You might also want to read through the (rather dated) Thrift whitepaper to get an idea about what the design constraints for it were. [2]
>
> The main point is that when you have deployed services at the scale Google or Facebook work with, you can't just upgrade all involved parties simultaneously on a schema change. So, having to support multiple versions running along each other is pretty much a given, and the best way to deal with that is to build it right into your protocols.
>
> David
>
>
> [1] https://developers.google.com/protocol-buffers/docs/overview
> [2] http://thrift.apache.org/static/files/thrift-20070401.pdf

Getting old! :-)

Thanks for the heads up.
August 20, 2013
On Monday, 19 August 2013 at 18:06:00 UTC, Johannes Pfau wrote:
> Am Mon, 19 Aug 2013 16:21:44 +0200
> schrieb "Tyler Jameson Little" <beatgammit@gmail.com>:
>
>> On Monday, 19 August 2013 at 13:31:27 UTC, Jacob Carlborg wrote:
>> > On 2013-08-19 15:03, Dicebot wrote:
>> >
>> >> Great! Are there any difficulties with the input?
>> >
>> > It just that I don't clearly know how the code will need to look like, and I'm not particular familiar with implementing range based code.
>> 
>> Maybe we need some kind of doc explaining the idiomatic usage of ranges?
>> 
>> Personally, I'd like to do something like this:
>> 
>>      auto archive = new XmlArchive!(char); // create an XML archive
>>      auto serializer = new Serializer(archive); // create the serializer
>>      serializer.serialize(foo);
>> 
>>      pipe(archive.out, someFile);
>
> Your "pipe" function is the same as std.algorithm.copy(InputRange,
> OutputRange) or std.range.put(OutputRange, InputRange);

Right, for some reason I couldn't find it... Moot point though.

> An important question regarding ranges for std.serialization is whether
> we want it to work as an InputRange or if it should _take_ an
> OutputRange. So the question is
>
> -----------------
> auto archive = new Archive();
> Serializer(archive).serialize(object);
> //Archive takes OutputRange, writes to it
> archive.writeTo(OutputRange);
>
> vs
>
> auto archive = new Archive()
> Serializer(archive).serialize(object);
> //Archive implements InputRange for ubyte[]
> foreach(ubyte[] data; archive) {}
> -----------------
>
> I'd use the first approach as it should be simpler to implement. The
> second approach would be useful if the ubyte[] elements were processed
> via other ranges (map, take, ...). But as binary data is usually
> not processed in this way but just stored to disk or sent over network
> (basically streaming operations) the first approach should be fine.

+1 for the first way.

> The first approach has the additional benefit that we can easily do
> streaming like this:
> ----------------
> auto archive = new Archive(OutputRange);
> //Immediately write the data to the output range
> Serializer(archive).serialize([1,2,3]);
> ----------------

This can make a nice one-liner for the general case:

Serializer(new Archive(OutputRange)).serialize(...);

> Another point is that "serialize" in the above example could be
> renamed to "put". This way Serializer would itself be an OutputRange
> which allows stuff like [1,2,3,4,5].stride(2).take(2).copy(archive);
>
> Then serialize could also accept InputRanges to allow this:
> archive.serialize([1,2,3,4,5].stride(2).take(2));
> However, this use case is already covered by using copy so it would just
> be for convenience.

This is nice, but I think I like serialize() better. I also don't think serializing a range is it's primary purpose, so it doesn't make a lot of sense to optimize for the uncommon case.
August 20, 2013
On 8/12/2013 6:27 AM, Dicebot wrote:
> Documentation:
> https://dl.dropboxusercontent.com/u/18386187/docs/std.serialization/index.html

Thank you, Jacob. It looks like you've put a lot of nice work into this.

I've perused the documentation, and all I can think of is "What's a cubit?"

http://www.youtube.com/watch?v=so9o3_daDZw

I.e. there are 9 documentation pages of details. There's no obvious place to start, no overview, no explanation of what serialization is for and why I might want to use it and what's great about this implementation. At least none that I could find. Also needs some non-trivial canonical example code.

Something that answers who what where when why and how would be immensely useful.



Some nits:

https://dl.dropboxusercontent.com/u/18386187/docs/std.serialization/std_serialization_serializationexception.html

Something went horribly wrong here:
----------------
Parameters:
Exception exception the exception exception to wrap
----------------

https://dl.dropboxusercontent.com/u/18386187/docs/std.serialization/std_serialization_registerwrapper.html

Lacks an illuminating example.

https://dl.dropboxusercontent.com/u/18386187/docs/std.serialization/std_serialization_serializer.html

When would I use a struct Array or a struct Slice?

https://dl.dropboxusercontent.com/u/18386187/docs/std.serialization/std_serialization_attribute.html

struct attribute should be capitalized. When would I use an attribute? Does this have anything to do with User Defined Attributes? Need a canonical example.

https://dl.dropboxusercontent.com/u/18386187/docs/std.serialization/std_serialization_archives_archive.html

Aren't interfaces already abstract? I.e. abstract is redundant. The documentation defines an archive more or less as an archive. I still don't know what an archive is. (E.g. a zip file is an archive - can this create zip files?)
August 20, 2013
On 8/18/2013 9:33 AM, David Nadlinger wrote:
> Having a system that regularly, automatically runs the test suites of several
> larger, well-known D projects with the results being readily available to the
> DMD/druntime/Phobos teams would certainly help. But it's also not ideal, since
> if a project starts to fail, the exact nature of the issue (regression in DMD or
> bug in the project, and if the former, a minimal test case) can often be hard to
> track down for somebody not already familiar with the code base.

That's exactly the problem. If these large projects are incorporated into the autotester, who is going to isolate/fix problems arising with them?

The test suite is designed to be a collection of already-isolated issues, so understanding what went wrong shouldn't be too difficult. Note that already it is noticeably much harder to debug a phobos unit test gone awry than the other tests. A full blown project that nobody understands would fare far worse.

(And the other problem, of course, is the test suite is designed to be runnable fairly quickly. Compiling some other large project and running its test suite can make the autotester much less useful when the turnaround time increases.)

Putting large projects into the autotester has the implication that development and support of those projects has been ceded to the core dev team, i.e. who is responsible for it has been badly blurred.

August 20, 2013
On Tuesday, 20 August 2013 at 03:42:48 UTC, Tyler Jameson Little wrote:
> On Monday, 19 August 2013 at 18:06:00 UTC, Johannes Pfau wrote:
>> An important question regarding ranges for std.serialization is whether
>> we want it to work as an InputRange or if it should _take_ an
>> OutputRange. So the question is
>>
>> -----------------
>> auto archive = new Archive();
>> Serializer(archive).serialize(object);
>> //Archive takes OutputRange, writes to it
>> archive.writeTo(OutputRange);
>>
>> vs
>>
>> auto archive = new Archive()
>> Serializer(archive).serialize(object);
>> //Archive implements InputRange for ubyte[]
>> foreach(ubyte[] data; archive) {}
>> -----------------
>>
>> I'd use the first approach as it should be simpler to implement. The
>> second approach would be useful if the ubyte[] elements were processed
>> via other ranges (map, take, ...). But as binary data is usually
>> not processed in this way but just stored to disk or sent over network
>> (basically streaming operations) the first approach should be fine.
>
> +1 for the first way.

No, you are WRONG. InputRange is MORE flexible: it can be lazy or eager. OutputRange is only eager. As we know, lazy ranges is required if it's possible:

On Sunday, 18 August 2013 at 18:26:55 UTC, Dicebot wrote:
> So as a review manager, I think voting should be delayed until API is ready to address lazy range-based work model. No actual implementation is required but
>
> 1) it should be possible to do it later without breaking user code
> 2) library should not make an assumption about implementation being lazy or eager

We can use InputRange like this:

import std.file;
auto archive = new Archive()
Serializer(archive).serialize(object);
//Archive implements InputRange for ubyte[]
write("file", archive);

Another benefit: we can process InputRange. For example, if we have
ZipRange zip(InputRange)
function, it's easy to compress data:
write("file", zip(archive));

Another example: we would like to change output xml file and filter some data (because we already have it). Or we would like to transform output xml to the html web page. No problems:

XmlRange transformXml(InputRange);
write("file", transformXml(archive));

Ideas?