Jump to page: 1 2
Thread overview
[GSoC] Improved FlatBuffers and/or Protobuf Support ~ Binary Serialization
Mar 29, 2019
Ahmet Sait
Mar 29, 2019
Dragos Carp
Apr 01, 2019
Ahmet Sait
Apr 04, 2019
Ahmet Sait
Apr 04, 2019
Jacob Carlborg
Apr 04, 2019
Ahmet Sait
Apr 04, 2019
Dragos Carp
Apr 09, 2019
Ahmet Sait
Apr 01, 2019
Jacob Carlborg
Apr 02, 2019
Ahmet Sait
Apr 02, 2019
Jacob Carlborg
March 29, 2019
Hi,
I've been thinking about working on binary serialization as my potential GSoC
project. It's originally one of the older entries in GSoC ideas page [0]. I
think D is pretty much cut out for this kind of task and serialization is a
topic I'm rather interested in so hopefully it will be a great candidate.

# About me:
My name is Ahmet Sait Koçak, currently studying Computer Science in Turkey. I
first met with D way back in high school which in itself is an interesting
story.

I've been introduced to programming in first year in high school with C for
some sort of competition. I then proceeded to learn C# myself and continued
coding as a hobby, having lots of fun. One of my biggest projects was LF2 IDE
[1] - a modding tool for the game LF2. It didn't take long for me to fall in
love with open source since that project made use of several OSS libraries
itself which led me to using version control, embracing git and open sourcing
nearly anything I code on GitHub from there on.

Fast forward 4 years I was hitting walls trying to do low level stuff in C#
and the fact that bytecode compiled languages being too easy to reverse
engineer was hindering my motivation to do anything commercial with them. I
never liked C++ but gave it another try telling myself "come on it's not that
bad" but failed miserably, there had to be a better way. Besides, I was
already crafting my dream language in my head.

One day, I sat in front of my computer and thought "I bet there is a language
called D". It was once in a life time magical moment reading through the home
page and seeing how it is the same language if I were to create one (static
reflection, native compiled, GC...). My first project in D was IDL [2] - it
made it possible for LF2 IDE to hot reload modded data files into the game's
memory, it was amazing working with slices for the first time. I'm a D user
and an evangelist ever since.

# Overview:
- Improving & updating D implementation of flatbuffers and/or protobuf
- Contributing the D support to the upstream repositories
- Better documentation & samples
- Benchmarking and making sure D rocks

# Key Points:
- Meta-programming (DbI, CTFE, mixin...)
I plan to make D meta features shine in this library.

  - It should be possible to parse schema and output mixable D code at
    compile time
  const schema = `message Person
  {
      required string name = 1;
      required int32 id = 2;
  }`;
  mixin(fromProtoSchema(schema));

  - There should be no need for a schema definition, a custom type annotated
    with UDAs should be enough
  struct Person
  {
      @protoID(1) string name;
      @protoID(2) int age;
  }
  serialize(Person("Walter", 42), stdout);

- Simple things should be simple
It should be dead simple to do basic stuff:
  auto obj = deserialize!SomeType(stdin);
  serialize(obj, stdout);

- Complex things should be possible
The library should be flexible and extensible without modification

- Support for library and tool based usage
It should be usable as a library without any additional setup but also usable
as a schema compiler.

- Support for common Phobos types
Nullable, tuples, std.datetime, std.complex, std.bigint, containers...

Existing work:
https://github.com/huntlabs/flatbuffers
https://github.com/dcarp/protobuf-d
https://github.com/msoucy/dproto

I'm personally not happy with any of the existing libraries but they will
likely be a valuable resource regardless.

Questions:
- How much work would be ideal for GSoC? Should I be working on flatbuffers
  only or protobuf too? (Seems like flatbuffers need more love)
- Should I tackle the std.serialization [3] idea?
- Any other serialization related suggestions?
- Anything I'm missing?

I'm still not entirely sure about my project (probably gonna write a few
proposals) so if you have other suggestions do not hesitate. All kinds of
constructive feedback is welcome!


[0] https://wiki.dlang.org/GSOC_2018_Ideas#FlatBuffers_Support_and.2For_Improved_Protocol_Buffer_Support
[1] https://github.com/ahmetsait/LF2.IDE
[2] https://github.com/ahmetsait/IDL
[3] https://wiki.dlang.org/GSOC_2019_Ideas#std.serialization
March 29, 2019
Hi Ahmet,

welcome to the D forum.

As the author of protobuf-d I'll try to give you some feedback to the points you made. I couldn't find the time to also do the flatbuffers implementation, so my comments are related just to protobuf. If you are interested to do the Flatbuffers work, I'll be more than happy to play the mentor role for you - I have some ideas there. But let's get to the existing, real stuff.

On Friday, 29 March 2019 at 00:18:40 UTC, Ahmet Sait wrote:
>
>   - It should be possible to parse schema and output mixable D code at
>     compile time
>   const schema = `message Person
>   {
>       required string name = 1;
>       required int32 id = 2;
>   }`;
>   mixin(fromProtoSchema(schema));

I don't think that it is worth the effort.
1. A complete implementation for .proto file parsing is complicated (https://developers.google.com/protocol-buffers/docs/reference/proto3-spec).
2. Theoretically, protobuf definitions does not change often, and considering that compile time parsing is somehow slow, the benefit of parsing them at every compilation is actually a drawback.
3. protoc plugin is the Protobuf recommended way of parsing .proto definitions: https://developers.google.com/protocol-buffers/docs/proto3#generating

>
>   - There should be no need for a schema definition, a custom type annotated
>     with UDAs should be enough
>   struct Person
>   {
>       @protoID(1) string name;
>       @protoID(2) int age;
>   }
>   serialize(Person("Walter", 42), stdout);

protobuf-d does that already, see the unittest for toProtobuf: https://github.com/dcarp/protobuf-d/blob/3f8a1a5129c98920e1652e965004ac77e9bb8ef1/src/google/protobuf/encoding.d#L193

>
> - Simple things should be simple
> It should be dead simple to do basic stuff:
>   auto obj = deserialize!SomeType(stdin);
>   serialize(obj, stdout);

Again, protobuf-d has that: https://github.com/dcarp/protobuf-d/blob/3f8a1a5129c98920e1652e965004ac77e9bb8ef1/src/google/protobuf/decoding.d#L214

>
> - Complex things should be possible
> The library should be flexible and extensible without modification

toProtobuf, fromProtobuf, toJSONValue, fromJSONValue methods are protobuf customization points in protobuf-d. For an example see https://github.com/dcarp/protobuf-d/blob/3f8a1a5129c98920e1652e965004ac77e9bb8ef1/src/google/protobuf/wrappers.d#L27-L54

>
> - Support for library and tool based usage
> It should be usable as a library without any additional setup but also usable
> as a schema compiler.

protobuf-d is usable as library, see https://github.com/huntlabs/grpc-dlang/blob/57c8fe9808f8e860c4b0668a83cdabd78b296ce5/dub.json#L9
Regarding the usage as schema compiler, review the first comment.

>
> - Support for common Phobos types
> Nullable, tuples, std.datetime, std.complex, std.bigint, containers...

Protobuf is a language agnostic serialization format. Having .protobuf definitions for common Phobos types will just shift the problem somewhere else (i.e. other programming languages).

Nevertheless Protobuf addresses probably the same problem by defining the "well-known" types (https://developers.google.com/protocol-buffers/docs/reference/google.protobuf).
protobuf-d also supports those, so that std.datetime.Systime is mapped to google.protobuf.Timestamp and std.datetime.Duration to google.protobuf.Duration

>
> I'm personally not happy with any of the existing libraries but they will
> likely be a valuable resource regardless.

The existing protobuf libraries are quite mature and probably improving those will be time better spent than starting once again from scratch.

>
> Questions:
> - How much work would be ideal for GSoC? Should I be working on flatbuffers
>   only or protobuf too? (Seems like flatbuffers need more love)

I'm quite satisfied with protobuf-d implementation: it is small (aprox. 4k LOC), clean and quite feature complete - 26 failing conformance test vs. 27 resp. 41 for the official C++ and Java counterparts. Of course there is still enough space for improvement, but at least in case of protobuf-d not enough for a GSoC application.

On the other hand Flatbuffers is a very good candidate: it has its own specialties, but is also somehow similar to protobuf. This would reduce the planning risks considerably.

> - Should I tackle the std.serialization [3] idea?

I see std.serialization as a high level API. Probably this will be a long term std.experimental.serialization, that will require quite some time till multiple serialization formats implements it. Just after that, if it will ever happen, we can remove the "experimental" part. I don't see this as a suited GSoC project.

> - Any other serialization related suggestions?
https://arrow.apache.org/


Cheers, Dragos
April 01, 2019
On 2019-03-29 01:18, Ahmet Sait wrote:

> - Should I tackle the std.serialization [3] idea?

I would love that. FlatBuffers or Protobuf could be one of the backends. Although you might need to implement more than one backend to make sure the frontend API actually is general enough to implement multiple backend. Ideally two completely different kind of backend, like FlatBuffers and JSON, for example.

-- 
/Jacob Carlborg
April 01, 2019
On Friday, 29 March 2019 at 23:19:10 UTC, Dragos Carp wrote:
> Hi Ahmet,
>
> welcome to the D forum.
>
> As the author of protobuf-d I'll try to give you some feedback to the points you made. I couldn't find the time to also do the flatbuffers implementation, so my comments are related just to protobuf. If you are interested to do the Flatbuffers work, I'll be more than happy to play the mentor role for you - I have some ideas there. But let's get to the existing, real stuff.

Glad to hear, thanks!

> On Friday, 29 March 2019 at 00:18:40 UTC, Ahmet Sait wrote:
>>
>>   - It should be possible to parse schema and output mixable D code at
>>     compile time
>>   const schema = `message Person
>>   {
>>       required string name = 1;
>>       required int32 id = 2;
>>   }`;
>>   mixin(fromProtoSchema(schema));
>
> I don't think that it is worth the effort.
> 1. A complete implementation for .proto file parsing is complicated (https://developers.google.com/protocol-buffers/docs/reference/proto3-spec).
> 2. Theoretically, protobuf definitions does not change often, and considering that compile time parsing is somehow slow, the benefit of parsing them at every compilation is actually a drawback.
> 3. protoc plugin is the Protobuf recommended way of parsing .proto definitions: https://developers.google.com/protocol-buffers/docs/proto3#generating

It doesn't immediately strike me as complicated and https://github.com/msoucy/dproto apparently has this feature so I'm guessing it can be used as a reference. Compile times are of course not expected to be good with this approach but it's promising if Stefan's New CTFE gets completed in the future. Then again you likely have more experience about this so I should probably defer this to when New CTFE is ready.

>>   - There should be no need for a schema definition, a custom type annotated
>>     with UDAs should be enough
>>   struct Person
>>   {
>>       @protoID(1) string name;
>>       @protoID(2) int age;
>>   }
>>   serialize(Person("Walter", 42), stdout);
>
> protobuf-d does that already, see the unittest for toProtobuf: https://github.com/dcarp/protobuf-d/blob/3f8a1a5129c98920e1652e965004ac77e9bb8ef1/src/google/protobuf/encoding.d#L193
>
>>
>> - Simple things should be simple
>> It should be dead simple to do basic stuff:
>>   auto obj = deserialize!SomeType(stdin);
>>   serialize(obj, stdout);
>
> Again, protobuf-d has that: https://github.com/dcarp/protobuf-d/blob/3f8a1a5129c98920e1652e965004ac77e9bb8ef1/src/google/prot

I assumed it wasn't the case since examples folder didn't have such code, thanks for pointing out.

>> - Complex things should be possible
>> The library should be flexible and extensible without modification
>
> toProtobuf, fromProtobuf, toJSONValue, fromJSONValue methods are protobuf customization points in protobuf-d. For an example see https://github.com/dcarp/protobuf-d/blob/3f8a1a5129c98920e1652e965004ac77e9bb8ef1/src/google/protobuf/wrappers.d#L27-L54
>
>>
>> - Support for library and tool based usage
>> It should be usable as a library without any additional setup but also usable
>> as a schema compiler.
>
> protobuf-d is usable as library, see https://github.com/huntlabs/grpc-dlang/blob/57c8fe9808f8e860c4b0668a83cdabd78b296ce5/dub.json#L9
> Regarding the usage as schema compiler, review the first comment.

These are basically a checklist that I want to fill whether it already exists. Say, if I were to write flatbuffers-d I would want to implement them.

>> - Support for common Phobos types
>> Nullable, tuples, std.datetime, std.complex, std.bigint, containers...
>
> Protobuf is a language agnostic serialization format. Having .protobuf definitions for common Phobos types will just shift the problem somewhere else (i.e. other programming languages).
>
> Nevertheless Protobuf addresses probably the same problem by defining the "well-known" types (https://developers.google.com/protocol-buffers/docs/reference/google.protobuf).
> protobuf-d also supports those, so that std.datetime.Systime is mapped to google.protobuf.Timestamp and std.datetime.Duration to google.protobuf.Duration

Makes sense, I'm in the opinion that API should support common types if there is direct correspondence or well established conventions for said type.

>> I'm personally not happy with any of the existing libraries but they will
>> likely be a valuable resource regardless.
>
> The existing protobuf libraries are quite mature and probably improving those will be time better spent than starting once again from scratch.

I feel like there is some lack of documentation since none of those things you mentioned are obvious looking at the repo. Nevertheless, I'm happy to hear that protobuf-d is mature & feature complete.

>> Questions:
>> - How much work would be ideal for GSoC? Should I be working on flatbuffers
>>   only or protobuf too? (Seems like flatbuffers need more love)
>
> I'm quite satisfied with protobuf-d implementation: it is small (aprox. 4k LOC), clean and quite feature complete - 26 failing conformance test vs. 27 resp. 41 for the official C++ and Java counterparts. Of course there is still enough space for improvement, but at least in case of protobuf-d not enough for a GSoC application.
>
> On the other hand Flatbuffers is a very good candidate: it has its own specialties, but is also somehow similar to protobuf. This would reduce the planning risks considerably.

Agreed, I'm going to focus on flatbuffers in my proposal then.

>> - Should I tackle the std.serialization [3] idea?
>
> I see std.serialization as a high level API. Probably this will be a long term std.experimental.serialization, that will require quite some time till multiple serialization formats implements it. Just after that, if it will ever happen, we can remove the "experimental" part. I don't see this as a suited GSoC project.

I see, thanks for the feedback.

>> - Any other serialization related suggestions?
> https://arrow.apache.org/

Thanks, I'll take a look.
April 02, 2019
On Monday, 1 April 2019 at 09:57:08 UTC, Jacob Carlborg wrote:
> On 2019-03-29 01:18, Ahmet Sait wrote:
>
>> - Should I tackle the std.serialization [3] idea?
>
> I would love that. FlatBuffers or Protobuf could be one of the backends. Although you might need to implement more than one backend to make sure the frontend API actually is general enough to implement multiple backend. Ideally two completely different kind of backend, like FlatBuffers and JSON, for example.

Thanks for the feedback! I decided I should gather some experience building a serialization library first before thinking about designing std.serialization.

Also, I want to know if I can ask you questions when working on my project (since you're the author of orange lib and have experience) ?
April 02, 2019
On 2019-04-02 02:05, Ahmet Sait wrote:

> Also, I want to know if I can ask you questions when working on my project (since you're the author of orange lib and have experience) ?

Sure, please do.

-- 
/Jacob Carlborg
April 04, 2019
https://docs.google.com/document/d/1kFXDbs-LLsIW5nTIt8EZkNBq3N6vayaEw25IcuoGxgI/edit?usp=sharing

Seeking some feedback, thanks in advance..!
April 04, 2019
On 2019-04-04 18:43, Ahmet Sait wrote:
> https://docs.google.com/document/d/1kFXDbs-LLsIW5nTIt8EZkNBq3N6vayaEw25IcuoGxgI/edit?usp=sharing 
> 
> 
> Seeking some feedback, thanks in advance..!

I think "Contributing D support to the upstream repositories" might be a hurdle. You never know how much time someone else will have to review pull requests.

"Using D traits, UDAs and static introspection, it is possible to generate flatbuffer accessors without a schema file"

I don't know how flatbuffer works, but are accessors necessary?

It might be interesting to specify if you have any requirements that it should work with any of the attributes: "nothrow", "@safe", "pure", "@nogc" and the betterC subset.

-- 
/Jacob Carlborg
April 04, 2019
On Thursday, 4 April 2019 at 16:43:44 UTC, Ahmet Sait wrote:
> https://docs.google.com/document/d/1kFXDbs-LLsIW5nTIt8EZkNBq3N6vayaEw25IcuoGxgI/edit?usp=sharing
>
> Seeking some feedback, thanks in advance..!

I added some comments directly in the document.
April 04, 2019
On Thursday, 4 April 2019 at 18:27:05 UTC, Jacob Carlborg wrote:
> On 2019-04-04 18:43, Ahmet Sait wrote:
>> https://docs.google.com/document/d/1kFXDbs-LLsIW5nTIt8EZkNBq3N6vayaEw25IcuoGxgI/edit?usp=sharing
>> 
>> Seeking some feedback, thanks in advance..!
>
> I think "Contributing D support to the upstream repositories" might be a hurdle. You never know how much time someone else will have to review pull requests.

That's what I thought too, but at least I want the project in a state where I can make PR to the upstream, which is not a clear/measurable criteria.

> "Using D traits, UDAs and static introspection, it is possible to generate flatbuffer accessors without a schema file"
>
> I don't know how flatbuffer works, but are accessors necessary?

AFAIU, accessors make vector (array) fields and backward/forward compatibility possible. I'm still learning so don't count on me.

> It might be interesting to specify if you have any requirements that it should work with any of the attributes: "nothrow", "@safe", "pure", "@nogc" and the betterC subset.

This is something that came to my mind after the fact (since I don't bother with attributes much), but I still couldn't decide yet. It makes a lot of sense to provide @nogc functionality for potential RPC protocol usage (not high priority right now), not sure about the others.
« First   ‹ Prev
1 2