March 09, 2022

Greetings D Community,

Do you have need for an efficient binary format for encoding and decoding data, but are hesitant to use things like Google Protocol Buffers due to the need to re-compile code in order to change formats?

Apache Avro addresses many of these problems by pairing a binary (and other) format with a JSON formatted schema which is part of the protocol.

Sadly, this protocol is not available in the D Programming language... until now.

The initial v0.1.0 of the Avro library in D can be found here: https://code.dlang.org/packages/avro-d

Features Implemented

Schema representation

A set of data classes exist to represent schemas for generation in code or the processing of data.

For example:

import avro.schema;
auto schema = new UnionSchema([
    Schema.createPrimitive(Type.STRING),
    Schema.createPrimitive(Type.INT)]);

Schema parsing & validation

Schemas may be parsed from files, text, or JSON.

For example:

import avro.parser;
auto parser = new Parser();
Schema schema =  parser.parseText(q"EOS
{"namespace": "example.avro",
 "type": "record",
 "name": "User",
 "fields": [
     {"name": "name", "type": "string"},
     {"name": "favorite_number", "type": ["int", "null"]},
     {"name": "favorite_color", "type": ["string", "null"]}
 ]
}
EOS");

Errors in the JSON format of a schema will lead to descriptive errors.

Generic Data types

Generic data objects may be created according to schemas with their values set to schema-appropriate
defaults and validation logic when setting values. Most GenericDatum objects make use of
.getValue!T() and .setValue(T)(T val) methods, however, many convenience functions also exist.

For example:

import avro.generic.genericdata;

// Initializes the GenericDatum according to the schema with default values.
GenericDatum datum = new GenericDatum(schema);
assert(datum.getType == Type.RECORD);

// Primitive values can be set and retrieved.
datum.getValue!(GenericRecord).getField("name").setValue("bob");

// Convenience shortcut using opIndex() and opAssign() for primitive types.
datum["name"] = "bob";

assert(datum["name"].getValue!string == "bob");

// Enums have convenience functions directly on GenericData.
datum["favorite_number"].setUnionIndex(0);
assert(datum["favorite_number"].getUnionIndex() == 0);

// Arrays also have convenience functions.
datum["scores"] ~= 1.23f;
datum["scores"] ~= 4.56f;
assert(datum["scores"].length == 2);
p
// Maps do as well.
datum["m"]["m1"] = 10L;
datum["m"]["m2"] = 20L;
assert(datum["m"]["m1"].getValue!long == 10L);

Binary Serialization/Deserialization

GenericData objects can be written using an encoder.

For example:

import avro.codec.binaryencoder;
import avro.generic.genericwriter;

ubyte[] data;
auto encoder = binaryEncoder(appender(&data));
GenericWriter writer = new GenericWriter(schema, encoder);
writer.write(datum);

assert(data == [
// Field: name
// len=3     b     o     b
    0x06, 0x62, 0x6F, 0x62,
// Field: favorite_number
// idx=0     8
    0x00, 0x10,
// Field: favorite_color
// idx=0 len=4     b     l     u     e
    0x00, 0x08, 0x62, 0x6C, 0x75, 0x65
]);

They may also be read using a decoder.

For example:

import avro.codec.binarydecoder;
import avro.generic.genericreader;

auto decoder = binaryDecoder(data);
GenericReader reader = new GenericReader(schema, decoder);
GenericDatum datum;
reader.read(datum);

assert(datum["name"].getValue!string() == "bob");
assert(datum["favorite_number"].getValue!int() == 8);
assert(datum["favorite_color"].getValue!string() == "blue");

Features Not Yet Implemented

  • Logical Type support
  • Specific Data types generated for schemas
  • JSON Serialization/Deserialization
  • Codex compression support
  • Object Container Files
  • Protocol wire format
  • Schema Resolution
March 16, 2022

On Wednesday, 9 March 2022 at 14:26:10 UTC, Vijay Nayar wrote:

>

The initial v0.1.0 of the Avro library in D can be found here: https://code.dlang.org/packages/avro-d

Version 0.2.2 has been released, which includes the ability to read and write generic objects in JSON format in addition to the aforementioned binary format.