Request for review - std.serialization (orange) (page 3) - D Programming Language Discussion Forum

A "MyArchive" example can be useful too. The basic idea is to write a minimal archive class with basic test code. All methods assert(false,"method archive(dchar) not implemented"); the example complies and runs, but asserts. So people take the example and fill methods with their own implementations, thus incrementally building their archive class.

On Sunday, 31 March 2013 at 21:25:48 UTC, Jacob Carlborg wrote: > Ok. I'm not familiar with Protocol Buffers. Well, the basic idea of EXI and similar standards is that you can have 2 types of serialization: built-in when you keep schema in the serialized message - which value belongs to which field (this way you can read and write any data structure) or schema-informed when the serializer knows what data it works with, so it omits schema from the message and e.g. writes two int fields as just consecutive 8 bytes - it knows that these 8 bytes are 2 ints and which field each belongs to; the drawback is that you can't read the message without schema, the advantage is smaller message size and faster serialization.

On Sunday, 31 March 2013 at 11:23:27 UTC, Kagamin wrote: > On Saturday, 30 March 2013 at 20:02:48 UTC, Jesse Phillips wrote: >> 3) Serialization is done by message (struct) and not by primitives > > PB does serialize by primitives and Archive has archiveStruct method which is called to serialize struct, I believe. At first sight orange serializes using built-in grammar (in EXI terms), and since PB uses schema-informed grammar, you have to provide schema to the archiver: either keep it in the archiver or store globally. Thank you, you've described it much better. When I saide "by message" I was referring to what you have more accurately stated as requiring a schema. I'm not well versed in PB or Orange so I'd need to play around more with both, but I'm pretty sure Orange would need changes made to be able to make the claim PB is supported. It should be possible to create a binary format based on PB.

It's a pull parser? Hmm... how reordered fields are supposed to be handled? When the archiver is requested for a field, it will probably need to look ahead for the field in the entire message. Also arrays can be discontinuous both in xml and in pb. Also if the archiver is requested for a missing field, it may be a bad idea to return typeof(return).init as it will overwrite the default value for the field in the structure. Though, this may be a minor issue: field usually is missing because it's obsolete, but the serializer will spend time requesting missing fields. As a schema-informed serialization, PB works better with specialized code, so it's better to provide a means for specialized serialization, where components will be tightly coupled, and the archiver will have full access to the serialized type and will be able to infer schema. Isn't serialization simpler when you have access to the type?

On 2013-03-31 23:40, Kagamin wrote: > A "MyArchive" example can be useful too. The basic idea is to write a > minimal archive class with basic test code. All methods > assert(false,"method archive(dchar) not implemented"); the example > complies and runs, but asserts. So people take the example and fill > methods with their own implementations, thus incrementally building > their archive class. Yes, if the API is change to what you're suggesting. -- /Jacob Carlborg

April 01, 2013

Re: Request for review - std.serialization (orange)

Posted by Jacob Carlborg
in reply to Kagamin

Permalink

Jacob Carlborg

Posted in reply to Kagamin

Permalink

On 2013-04-01 07:15, Kagamin wrote:
> It's a pull parser? Hmm... how reordered fields are supposed to be
> handled? When the archiver is requested for a field, it will probably
> need to look ahead for the field in the entire message. Also arrays can
> be discontinuous both in xml and in pb. Also if the archiver is
> requested for a missing field, it may be a bad idea to return
> typeof(return).init as it will overwrite the default value for the field
> in the structure. Though, this may be a minor issue: field usually is
> missing because it's obsolete, but the serializer will spend time
> requesting missing fields.

Optional fields are possible to implement by writing a custom serializer for a given type.

The look ahead is not needed for the entire message. Only for the length of a class/strcut. But since fields of class can consist of other class it might not make a difference.

> As a schema-informed serialization, PB works better with specialized
> code, so it's better to provide a means for specialized serialization,
> where components will be tightly coupled, and the archiver will have
> full access to the serialized type and will be able to infer schema.
> Isn't serialization simpler when you have access to the type?

Yes, it would probably be simpler if the archive had access to the type. The idea behind Orange is that Serializer tries to do as much as possible of the implementation and leaves the data dependent parts to the archive. Also, the archive only needs to know how to serialize primitive types.

-- 
/Jacob Carlborg

On 2013-03-31 23:57, Kagamin wrote: > Well, the basic idea of EXI and similar standards is that you can have 2 > types of serialization: built-in when you keep schema in the serialized > message - which value belongs to which field (this way you can read and > write any data structure) or schema-informed when the serializer knows > what data it works with, so it omits schema from the message and e.g. > writes two int fields as just consecutive 8 bytes - it knows that these > 8 bytes are 2 ints and which field each belongs to; the drawback is that > you can't read the message without schema, the advantage is smaller > message size and faster serialization. I see. -- /Jacob Carlborg

On 2013-04-01 01:39, Jesse Phillips wrote: > I'm not well versed in PB or Orange so I'd need to play around more with > both, but I'm pretty sure Orange would need changes made to be able to > make the claim PB is supported. It should be possible to create a binary > format based on PB. Isn't PB binary? Or it actually seems it can be both. -- /Jacob Carlborg

On Monday, 1 April 2013 at 08:53:51 UTC, Jacob Carlborg wrote: > On 2013-04-01 01:39, Jesse Phillips wrote: > >> I'm not well versed in PB or Orange so I'd need to play around more with >> both, but I'm pretty sure Orange would need changes made to be able to >> make the claim PB is supported. It should be possible to create a binary >> format based on PB. > > Isn't PB binary? Or it actually seems it can be both. Let me see if I can describe this. PB does encoding to binary by type. However it also has a schema in a .proto file. My first concern is that this file provides the ID to use for each field, while arbitrary the ID must be what is specified. The second one I'm concerned with is option to pack repeated fields. I'm not sure the specifics for this encoding, but I imagine some compression. This is why I think I'd have to implement my own Serializer to be able to support PB, but also believe we could have a binary format based on PB (which maybe it would be possible to create a schema of Orange generated data, but it would be hard to generate data for a specific schema).

On 04/01/2013 01:13 PM, Jesse Phillips wrote: > On Monday, 1 April 2013 at 08:53:51 UTC, Jacob Carlborg wrote: >> On 2013-04-01 01:39, Jesse Phillips wrote: >> >>> I'm not well versed in PB or Orange so I'd need to play around more with >>> both, but I'm pretty sure Orange would need changes made to be able to >>> make the claim PB is supported. It should be possible to create a binary >>> format based on PB. >> >> Isn't PB binary? Or it actually seems it can be both. > > Let me see if I can describe this. > > PB does encoding to binary by type. However it also has a schema in a > .proto file. My first concern is that this file provides the ID to use > for each field, while arbitrary the ID must be what is specified. > > The second one I'm concerned with is option to pack repeated fields. I'm > not sure the specifics for this encoding, but I imagine some compression. > > This is why I think I'd have to implement my own Serializer to be able > to support PB, but also believe we could have a binary format based on > PB (which maybe it would be possible to create a schema of Orange > generated data, but it would be hard to generate data for a specific > schema). From what I got from the examples, Repeated fields are done roughly as following: auto msg = fields.map!(a=>a.serialize())().reduce!(a,b=>a~b)(); return ((id<<3)|2) ~ msg.length.toVarint() ~ msg; Where msg is a ubyte[]. -Matt Soucy

Forums