IAP Tools for D

Hi D Community, I am currently working on a cloud project where we intend to reinvent a lot of the old, less-than-optimal technologies. Among the technologies we are working on is a new general purpose network protocol called IAP. IAP comes with a general purpose binary data format called ION (IAP Object Notation). ION is similar to MessagePack and CBOR, but with a few additions. ION has a table mode which can be used to model tables (like CSV files) efficiently, and which can also be used in larger object graphs. Our early serialized length + performance benchmarks look promising (tables can be down to 1/5 of JSON, and up to 2 x the speed of parsing CBOR). ION can be used both inside IAP, but also separately with HTTP and in data and log files. We already have a working toolkit in Java (we have Java backgrounds), but since we really find D interesting, we would like to make a D toolkit too. Since we are rather new to D, would anyone be interested in helping us a bit out making such a library? We can probably do the coding ourselves, but might need some tips about how to pack it nicely into a D library which can be used with Dub etc.

On 16/12/15 10:47 PM, Jakob Jenkov wrote: > Hi D Community, > > I am currently working on a cloud project where we intend to reinvent a > lot of the old, less-than-optimal technologies. Among the technologies > we are working on is a new general purpose network protocol called IAP. > > IAP comes with a general purpose binary data format called ION (IAP > Object Notation). ION is similar to MessagePack and CBOR, but with a few > additions. ION has a table mode which can be used to model tables (like > CSV files) efficiently, and which can also be used in larger object > graphs. Our early serialized length + performance benchmarks look > promising (tables can be down to 1/5 of JSON, and up to 2 x the speed of > parsing CBOR). > > ION can be used both inside IAP, but also separately with HTTP and in > data and log files. > > We already have a working toolkit in Java (we have Java backgrounds), > but since we really find D interesting, we would like to make a D > toolkit too. > > Since we are rather new to D, would anyone be interested in helping us a > bit out making such a library? We can probably do the coding ourselves, > but might need some tips about how to pack it nicely into a D library > which can be used with Dub etc. If you hop onto IRC #d Freenode, there maybe somebody from time to time that can give you a hand. Or at worst help solve some of your problems.

> If you hop onto IRC #d Freenode, there maybe somebody from time to time that can give you a hand. Or at worst help solve some of your problems. Thanks! Oh, I forgot to tell that the IAP Tools for D library will be open source, Apache 2 License.

On Wednesday, 16 December 2015 at 10:08:14 UTC, Jakob Jenkov wrote: >> If you hop onto IRC #d Freenode, there maybe somebody from time to time that can give you a hand. Or at worst help solve some of your problems. > > Thanks! > > Oh, I forgot to tell that the IAP Tools for D library will be open source, Apache 2 License. Sounds like an interesting thing. I will lend a hand.

> Sounds like an interesting thing. I will lend a hand. Great! We probably won't get started until January, as we have some documentation work to do on the Java library still, and some more systematic benchmarks to run etc. We will announce it here again when we get there. A GitHub repo would suffice, right?

On Wednesday, 16 December 2015 at 11:06:21 UTC, Jakob Jenkov wrote: >> Sounds like an interesting thing. I will lend a hand. > > Great! We probably won't get started until January, as we have some documentation work to do on the Java library still, and some more systematic benchmarks to run etc. We will announce it here again when we get there. > > A GitHub repo would suffice, right? yeah I think so

On Wednesday, 16 December 2015 at 09:47:35 UTC, Jakob Jenkov wrote: > Hi D Community, ION is similar to MessagePack and CBOR, > but with a few additions. ION has a table mode which can be used to model tables (like CSV files) efficiently, and which can also be used in larger object graphs. Our early serialized length + performance benchmarks look promising (tables can be down to 1/5 of JSON, and up to 2 x the speed of parsing CBOR). > > How does the performance of ION compare with Protocol Buffers (https://developers.google.com/protocol-buffers/?hl=en) and Apache Thrift ( https://thrift.apache.org/)?

December 20, 2015

Re: IAP Tools for D

Posted by Jakob Jenkov
in reply to belkin

Permalink

Jakob Jenkov

Posted in reply to belkin

Permalink

> How does the performance of ION compare with Protocol Buffers (https://developers.google.com/protocol-buffers/?hl=en) and Apache Thrift ( https://thrift.apache.org/)?

That depends on what API you use, and how much "meta data" (e.g. class names and property names) you write in the serialized ION data. ION is quite flexible about how much meta you want to include.

If you remove property names and rely only the sequence of fields, ION can write faster than Google Protocol Buffers. When reading, if you only rely in the sequence of fields, ION is a bit slower than Google Protocol Buffers. All in all I believe performance will be on-par with Google Protocol Buffers.

We have some benchmarks here:

http://tutorials.jenkov.com/iap/ion-performance-benchmarks.html

We still have a few minor optimizations to do, and more benchmarks to run, but perhaps also some validations to add etc, so the benchmarks on this page (for Java) are probably not too far off from the final numbers.

Regarding Apache Avro and Thrift, I looked at them today. It seems that Avro's encoding is similar to ION (and MessagePack and CBOR), although without e.g. tables. According to Thrift's own docs their binary encoding is not compact. For compact encoding it seems they refer to Protobuf.

ION has several advantages over Protobuf as a general purpose data format. ION is self describing, so you can iterate it without a schema. This means that you can do pretty fast arbitrary hierarchical navigation of an ION "file/message".

Protobuf's own docs say that Protobuf is not good for large amounts of raw bytes (e.g. files). ION is capable of modeling both raw binary data (e.g. files), JSON, XML and CSV efficiently. You could even convert ION to a restricted XML format, edit it in a text editor, and convert it back to ION (we have not implemented this yet, but we have planned it). We also believe that ION can support cyclic object graphs, but this is also not fully implemented and tested yet.

ION has a very compact encoding of arrays of objects in "Tables" which are similar to CSV files with only 1 header row, and N value rows. It is very common to transport arrays of object over the network, e.g. N search results from a service. Thus ION tables are a major advantage. Tables can also be used inside object graphs where an object has 0..N children (in an array).

We have a comparison of ION to other data formats here:

http://tutorials.jenkov.com/iap/ion-vs-other-formats.html

> How does the performance of ION compare with Protocol Buffers (https://developers.google.com/protocol-buffers/?hl=en) and Apache Thrift ( https://thrift.apache.org/)? Oh - one final thing: If you *really* want speed you should not parse ION into objects before using the data. Since ION is self describing, you can just navigate through it and find the data you need, and ignore the rest. This should be faster than first parsing the data into objects first. Especially if you parse an array of objects which may end up scattered all over the heap, and thus lead to cache misses. Accessing these objects directly in the message buffer might save you both the ION-to-object parse time, plus it might play better with the L1, L2 and L3 caches. We have not yet benchmarked this, but we will within long. In this mode I expect the read+use time to be faster than Google Protocol Buffers.

On Sunday, 20 December 2015 at 01:16:46 UTC, Jakob Jenkov wrote: >> [...] > > That depends on what API you use, and how much "meta data" (e.g. class names and property names) you write in the serialized ION data. ION is quite flexible about how much meta you want to include. > > [...] I suggest to compare also against this [1]. The author, Kenton Varda, was the primary author of Protocol Buffers version 2, which is the version that Google released open source. [1] https://capnproto.org /Paolo

Forums