April 02, 2013
On Monday, 1 April 2013 at 21:11:57 UTC, Matt Soucy wrote:
> And therefore, it supports arrays just fine (as repeated fields). Yes. That last sentence was poorly-worded, and should have said "you'd just end up with the un'packed' data with an extra header."

It says repeated messages should be merged which results in one message, not an array of messages. So from several repeated messages you get one as if they formed contiguous soup of fields which got parsed as one message: e.g. scalar fields of the resulting message take their last seen values.

> Unfortunately, I'm not particularly knowledgeable about networking, but that's not quite what I meant. I meant that the use case itself would result in sending individual Result messages one at a time, since packing (even if it were valid) wouldn't be useful and would require getting all of the Results at once. You would just leave off the "packed" attribute.

As you said, there's no way to tell where one message ends and next begins. If you send them one or two at a time, they end up as a contiguous stream of bytes. If one is to delimit messages, he should define a container format as an extension on top of PB with additional semantics for representation of arrays, which results in another protocol. And even if you define such protocol, there's still no way to have array fields in PB messages (arrays of non-trivial types).

For example if you want to update students and departments with one method, the obvious choice is to pass it a dictionary of key-value pairs of new values for the object's attributes. How to do that?
April 02, 2013
On 04/02/2013 12:38 AM, Kagamin wrote:
> On Monday, 1 April 2013 at 21:11:57 UTC, Matt Soucy wrote:
>> And therefore, it supports arrays just fine (as repeated fields). Yes.
>> That last sentence was poorly-worded, and should have said "you'd just
>> end up with the un'packed' data with an extra header."
>
> It says repeated messages should be merged which results in one message,
> not an array of messages. So from several repeated messages you get one
> as if they formed contiguous soup of fields which got parsed as one
> message: e.g. scalar fields of the resulting message take their last
> seen values.
>

They're merged if the field is optional and gets multiple inputs with that id. If a field is marked as repeated, then each block of data denoted with that field is treated as a new item in the array.

>> Unfortunately, I'm not particularly knowledgeable about networking,
>> but that's not quite what I meant. I meant that the use case itself
>> would result in sending individual Result messages one at a time,
>> since packing (even if it were valid) wouldn't be useful and would
>> require getting all of the Results at once. You would just leave off
>> the "packed" attribute.
>
> As you said, there's no way to tell where one message ends and next
> begins. If you send them one or two at a time, they end up as a
> contiguous stream of bytes. If one is to delimit messages, he should
> define a container format as an extension on top of PB with additional
> semantics for representation of arrays, which results in another
> protocol. And even if you define such protocol, there's still no way to
> have array fields in PB messages (arrays of non-trivial types).
>
> For example if you want to update students and departments with one
> method, the obvious choice is to pass it a dictionary of key-value pairs
> of new values for the object's attributes. How to do that?

I said that that only applies to the incorrect idea of packing complex messages. With regular repeated messages (nonpacked), each new repeated message is added to the array when deserialized.
While yes, protocol buffers do not create a way to denote uppermost-level messages, that isn't really relevant to the situation that you're trying to claim. If messages are supposed to be treated separately, there are numerous ways to handle that which CAN be done inside of protocol buffers.
In this example, one way could be to define messages like so:

message Updates {
	message StudentUpdate {
		required string studentName = 1;
		required uint32 departmentNumber = 2;
	}
	repeated StudentUpdate updates = 1;
}

The you would iterate over Updates.updates, which you'd be adding to upon deserialization of more of the messages.
April 02, 2013
On 2013-04-01 19:13, Jesse Phillips wrote:

> Let me see if I can describe this.
>
> PB does encoding to binary by type. However it also has a schema in a
> .proto file. My first concern is that this file provides the ID to use
> for each field, while arbitrary the ID must be what is specified.
>
> The second one I'm concerned with is option to pack repeated fields. I'm
> not sure the specifics for this encoding, but I imagine some compression.
>
> This is why I think I'd have to implement my own Serializer to be able
> to support PB, but also believe we could have a binary format based on
> PB (which maybe it would be possible to create a schema of Orange
> generated data, but it would be hard to generate data for a specific
> schema).

As I understand it there's a "schema definition", that is the .proto file. You compile this schema to produce D/C++/Java/whatever code that contains structs/classes with methods/fields that matches this schema.

If you need to change the schema, besides adding optional fields, you need to recompile the schema to produce new code, right?

If you have a D class/struct that matches this schema (regardless if it's auto generated from the schema or manually created) with actual instance variables for the fields I think it would be possible to (de)serialize into the binary PB format using Orange.

Then there's the issue of the options supported by PB like optional fields and pack repeated fields (which I don't know what it means).

It seems PB is dependent on the order of the fields so that won't be a problem. Just disregard the "key" that is passed to the archive and deserialize the next type that is expected. Maybe you could use the schema to do some extra validations.

Although, I don't know how PB handles multiple references to the same value.

Looking at this:

https://developers.google.com/protocol-buffers/docs/overview

Below "Why not just use XML?", they both mention a text format (not to be confused with the schema, .proto) and a binary format. Although the text format seems to be mostly for debugging.

-- 
/Jacob Carlborg
April 02, 2013
On 2013-03-24 22:03, Jacob Carlborg wrote:
> std.serialization (orange) is now ready to be reviewed.

I've been working on a binary archive with the following format:

FileFormat := CompoundArrayOffset Data CompoundArray
CompoundArrayOffset := AbsOffset # Offset of the compound array
AbsOffset := 4B # Absolute offset from the beginning of FileFormat
CompoundArray := Compound* # An array of Compound
CompoundOffset := 4B # Offset into CompoundArray
Data := Type*
Type := String | Array | Compound | AssociativeArray | Pointer | Enum | Primitive
Compound := ClassData | StructData
String := Length 4B* | 2B* | 1B*
Array := Length Type*
Class := CompoundOffset
Struct := CompoundOffset
ClassData := String Field*
StructData := Field*
Field := Type
Length := 4B
Primitive := Bool | Byte | Cdouble | Cfloat | Char | Creal | Dchar | Double | Float | Idouble | Ifloat | Int | Ireal | Long | Real | Short | Ubyte | Uint | Ulong | Ushort | Wchar
Bool := 1B
Byte := 1B
Cdouble := 8B 8B
Cfloat := 8B
Char := 1B
Creal := 8B 8B 8B 8B
Dchar := 4B
Double := 8B
Float := 4B
Idouble := 8B
Ifloat := 4B
Int := 4B
Ireal := 8B 8B
Long := 8B
Real := 8B 8B 8B 8B
Short := 2B
Ubyte := 1B
Uint := 4B
Ulong := 8B
Ushort := 2B
Wchar := 2B
1B := 1Byte
2B := 2Bytes
4B := 4Bytes
8B := 8Bytes

How does this look like?

-- 
/Jacob Carlborg
April 02, 2013
On 04/02/2013 03:21 AM, Jacob Carlborg wrote:
> On 2013-04-01 19:13, Jesse Phillips wrote:
>
>> Let me see if I can describe this.
>>
>> PB does encoding to binary by type. However it also has a schema in a
>> .proto file. My first concern is that this file provides the ID to use
>> for each field, while arbitrary the ID must be what is specified.
>>
>> The second one I'm concerned with is option to pack repeated fields. I'm
>> not sure the specifics for this encoding, but I imagine some compression.
>>
>> This is why I think I'd have to implement my own Serializer to be able
>> to support PB, but also believe we could have a binary format based on
>> PB (which maybe it would be possible to create a schema of Orange
>> generated data, but it would be hard to generate data for a specific
>> schema).
>
> As I understand it there's a "schema definition", that is the .proto
> file. You compile this schema to produce D/C++/Java/whatever code that
> contains structs/classes with methods/fields that matches this schema.
>
> If you need to change the schema, besides adding optional fields, you
> need to recompile the schema to produce new code, right?
>
> If you have a D class/struct that matches this schema (regardless if
> it's auto generated from the schema or manually created) with actual
> instance variables for the fields I think it would be possible to
> (de)serialize into the binary PB format using Orange.
>
> Then there's the issue of the options supported by PB like optional
> fields and pack repeated fields (which I don't know what it means).
>
> It seems PB is dependent on the order of the fields so that won't be a
> problem. Just disregard the "key" that is passed to the archive and
> deserialize the next type that is expected. Maybe you could use the
> schema to do some extra validations.
>
> Although, I don't know how PB handles multiple references to the same
> value.
>
> Looking at this:
>
> https://developers.google.com/protocol-buffers/docs/overview
>
> Below "Why not just use XML?", they both mention a text format (not to
> be confused with the schema, .proto) and a binary format. Although the
> text format seems to be mostly for debugging.
>

Unfortunately, only partially correct. Optional isn't an "option", it's a way of saying that a field may be specified 0 or 1 times. If two messages with the same ID are read and the ID is considered optional in the schema, then they are merged.

Packed IS an "option", which can only be done to primitives. It changes serialization from:
> return raw.map!(a=>(MsgType!BufferType | (id << 3)).toVarint() ~ a.writeProto!BufferType())().reduce!((a,b)=>a~b)();
to
> auto msg = raw.map!(writeProto!BufferType)().reduce!((a,b)=>a~b)();
> return (2 | (id << 3)).toVarint() ~ msg.length.toVarint() ~ msg;

(Actual snippets from my partially-complete protocol buffer library)

If you had a struct that matches that schema (PB messages have value semantics) then yes, in theory you could do something to serialize the struct based on the schema, but you'd have to maintain both separately.

PB is NOT dependent on the order of the fields during serialization, they can be sent/received in any order. You could use the schema like you mentioned above to tie member names to ids, though.

PB uses value semantics, so multiple references to the same thing isn't really an issue that is covered.

I hadn't actually noticed that TextFormat stuff before...interesting. I might take a look at that later when I have time.

-Matt Soucy
April 02, 2013
On 2013-04-02 15:38, Matt Soucy wrote:

> Unfortunately, only partially correct. Optional isn't an "option", it's
> a way of saying that a field may be specified 0 or 1 times. If two
> messages with the same ID are read and the ID is considered optional in
> the schema, then they are merged.

With "option", I mean you don't have to use it in the schema. But the (de)serializer of course need to support this to be fully compliant with the spec.

> Packed IS an "option", which can only be done to primitives. It changes
> serialization from:
>  > return raw.map!(a=>(MsgType!BufferType | (id << 3)).toVarint() ~
> a.writeProto!BufferType())().reduce!((a,b)=>a~b)();
> to
>  > auto msg = raw.map!(writeProto!BufferType)().reduce!((a,b)=>a~b)();
>  > return (2 | (id << 3)).toVarint() ~ msg.length.toVarint() ~ msg;
>
> (Actual snippets from my partially-complete protocol buffer library)
>
> If you had a struct that matches that schema (PB messages have value
> semantics) then yes, in theory you could do something to serialize the
> struct based on the schema, but you'd have to maintain both separately.

Just compile the schema to a struct with the necessary fields. Perhaps not how it's usually done.

> PB is NOT dependent on the order of the fields during serialization,
> they can be sent/received in any order. You could use the schema like
> you mentioned above to tie member names to ids, though.

So if you have a schema like this:

message Person {
  required string name = 1;
  required int32 id = 2;
  optional string email = 3;
}

1, 2 and 3 will be ids of the fields, and also the order in which they are (de)serialized?

Then you could have the archive read the schema, map names to ids and archive the ids instead of the names.

> PB uses value semantics, so multiple references to the same thing isn't
> really an issue that is covered.

I see, that kind of sucks, in my opinion.

-- 
/Jacob Carlborg
April 02, 2013
On 04/02/2013 10:52 AM, Jacob Carlborg wrote:
> On 2013-04-02 15:38, Matt Soucy wrote:
>
>> Unfortunately, only partially correct. Optional isn't an "option", it's
>> a way of saying that a field may be specified 0 or 1 times. If two
>> messages with the same ID are read and the ID is considered optional in
>> the schema, then they are merged.
>
> With "option", I mean you don't have to use it in the schema. But the
> (de)serializer of course need to support this to be fully compliant with
> the spec.
>

OK, I see what you mean. PB uses the term "option" for language constructs, hence my confusion.

>> Packed IS an "option", which can only be done to primitives. It changes
>> serialization from:
>>  > return raw.map!(a=>(MsgType!BufferType | (id << 3)).toVarint() ~
>> a.writeProto!BufferType())().reduce!((a,b)=>a~b)();
>> to
>>  > auto msg = raw.map!(writeProto!BufferType)().reduce!((a,b)=>a~b)();
>>  > return (2 | (id << 3)).toVarint() ~ msg.length.toVarint() ~ msg;
>>
>> (Actual snippets from my partially-complete protocol buffer library)
>>
>> If you had a struct that matches that schema (PB messages have value
>> semantics) then yes, in theory you could do something to serialize the
>> struct based on the schema, but you'd have to maintain both separately.
>
> Just compile the schema to a struct with the necessary fields. Perhaps
> not how it's usually done.
>

Again, my misunderstanding. I assumed you were talking about taking a pre-existing struct, not one generated from the .proto

>> PB is NOT dependent on the order of the fields during serialization,
>> they can be sent/received in any order. You could use the schema like
>> you mentioned above to tie member names to ids, though.
>
> So if you have a schema like this:
>
> message Person {
>    required string name = 1;
>    required int32 id = 2;
>    optional string email = 3;
> }
>
> 1, 2 and 3 will be ids of the fields, and also the order in which they
> are (de)serialized?
>
> Then you could have the archive read the schema, map names to ids and
> archive the ids instead of the names.
>

You could easily receive 3,1,2 or 2,1,3 or any other such combination, and it would still be valid. That doesn't stop you from doing what you suggest, however, as long as you can lookup id[name] and name[id].

>> PB uses value semantics, so multiple references to the same thing isn't
>> really an issue that is covered.
>
> I see, that kind of sucks, in my opinion.
>

Eh. I personally think that it makes sense, and don't have much of a problem with it.
April 02, 2013
On 2013-04-02 18:24, Matt Soucy wrote:

> Again, my misunderstanding. I assumed you were talking about taking a
> pre-existing struct, not one generated from the .proto

It doesn't really matter where the struct comes from.

> You could easily receive 3,1,2 or 2,1,3 or any other such combination,
> and it would still be valid. That doesn't stop you from doing what you
> suggest, however, as long as you can lookup id[name] and name[id].

Right. The archive gets the names, it's then up to the archive how to map names to PB ids. If the archive gets "foo", "bar" and the serialized data contains "bar", "foo" can it handle that? What I mean is that the serializer decides which field should be (de)serialized not the archive.

> Eh. I personally think that it makes sense, and don't have much of a
> problem with it.

It probably makes sense if one sends the data over the network and the data is mostly value based. I usually have an object hierarchy with many reference types and objects passed around.

-- 
/Jacob Carlborg
June 15, 2013
On Sunday, 24 March 2013 at 21:03:59 UTC, Jacob Carlborg wrote:
> std.serialization (orange) is now ready to be reviewed.

I'm fundatemtaly agaisnt the integration of Orange into the std lib.
The basic problem is that there is no flag in the D object model for the serialization (e.g: the "published" attribute in Pascal).
In the same order of idea, the doc about RTI is null. In fact there's even no RTI usefull for an "academic" serialization system.
No no no.
June 15, 2013
On 2013-06-15 20:54, Baz wrote:

> I'm fundatemtaly agaisnt the integration of Orange into the std lib.
> The basic problem is that there is no flag in the D object model for the
> serialization (e.g: the "published" attribute in Pascal).

Why does that matter?

-- 
/Jacob Carlborg