Binary Serialization

Jul 13, 2019

Jul 14, 2019

I am receiving packets of data over the network from a C#/Java/dlang/etc. client. I'm writing the data directly to a socket, one primitive value at a time in little endian format. I would like to receive this super easily in d. Here is an example of roughly what I want to do. class MoveCommand { byte serialNumber; int x; int y; } public void processMoveCommand(byte[] buffer, offset) { MoveCommand command = *(cast(MoveCommand *)(buffer.ptr + offset)); } When I do MoveCommand.sizeof, it returns 4. When I changed MoveCommand to a struct, it returns 12. Is that going to be reliable on every machine and with every compiler? It seems like I had to use __attributed_packed__ (something like this, it's been 10 years) to guarantee the actual byte layout of a struct in C when working with files. Do I have to even worry about this in D? If so, is there something comparable to attribute packed in D? The alternative that I'm fearing would be that some compiler would choose word-based boundaries, it would pack that data or something else (I would hope not these). I would rather know how this works now and not have it come back to bite me when I least suspect it on some other machine. I am happy to read about this stuff, but I don't know where to start. I have the book "The D Programming Language" if there is something in there that I missed.

On Saturday, 13 July 2019 at 23:52:38 UTC, harakim wrote: > class MoveCommand > { > byte serialNumber; > int x; > int y; > } > When I do MoveCommand.sizeof, it returns 4. It is important to understand a class in D is a reference type, so `MoveCommand` here is actually a pointer internally, meaning that 4 is the size of a pointer. Trying to directly read or write that to a socket is a mistake. With struct though, there's potential. The reason you get 12 there though is that the byte is padded. The struct looks like 0: serialNumber 1: padding 2: padding 3: padding 4: x 5: x 6: x 7: x 8: y 9: y 10: y 11: y To get move the padding to the end, you can add `align(1):` inside, so given: struct MoveCommand { align(1): byte serialNumber; int x; int y; } The layout will look like this: 0: serialNumber 1: x 2: x 3: x 4: x 5: y 6: y 7: y 8: y 9: padding 10: padding 11: padding The align(1) is kinda like the __packed__ thing in C compilers. The size of will still read 12 there, but you can read and write an individual item direct off a binary thing reliably. But an array of them will have that padding still. To get rid of that, you put an align(1) on the *outside* of the struct: align(1) // this one added struct MoveCommand { align(1): // in ADDITION to this one byte serialNumber; int x; int y; } And now the sizeof will read 9, with the padding cut off fromt the end too. You can do an array of these now totally packed. This behavior is consistent across D compilers; it is defined by the spec. Just remember types like `string` have a pointer embedded and probably shouldn't be memcpyed!

July 14, 2019

Re: Binary Serialization

Posted by harakim
in reply to Adam D. Ruppe

Permalink

harakim

Posted in reply to Adam D. Ruppe

Permalink

On Sunday, 14 July 2019 at 00:18:02 UTC, Adam D. Ruppe wrote:
> On Saturday, 13 July 2019 at 23:52:38 UTC, harakim wrote:
>> class MoveCommand
>> {
>> 	byte serialNumber;
>> 	int x;
>> 	int y;
>> }
>> When I do MoveCommand.sizeof, it returns 4.
>
>
> It is important to understand a class in D is a reference type, so `MoveCommand` here is actually a pointer internally, meaning that 4 is the size of a pointer. Trying to directly read or write that to a socket is a mistake.
>
> With struct though, there's potential. The reason you get 12 there though is that the byte is padded. The struct looks like
>
> 0: serialNumber
> 1: padding
> 2: padding
> 3: padding
> 4: x
> 5: x
> 6: x
> 7: x
> 8: y
> 9: y
> 10: y
> 11: y
>
>
> To get move the padding to the end, you can add `align(1):` inside, so given:
>
> struct MoveCommand
> {
>   align(1):
>  	byte serialNumber;
> 	int x;
> 	int y;
> }
>
> The layout will look like this:
>
> 0: serialNumber
> 1: x
> 2: x
> 3: x
> 4: x
> 5: y
> 6: y
> 7: y
> 8: y
> 9: padding
> 10: padding
> 11: padding
>
>
> The align(1) is kinda like the __packed__ thing in C compilers. The size of will still read 12 there, but you can read and write an individual item direct off a binary thing reliably. But an array of them will have that padding still. To get rid of that, you put an align(1) on the *outside* of the struct:
>
> align(1) // this one added
> struct MoveCommand
> {
>   align(1): // in ADDITION to this one
>  	byte serialNumber;
> 	int x;
> 	int y;
> }
>
>
> And now the sizeof will read 9, with the padding cut off fromt the end too. You can do an array of these now totally packed.
>
>
> This behavior is consistent across D compilers; it is defined by the spec.
>
> Just remember types like `string` have a pointer embedded and probably shouldn't be memcpyed!

Awesome. I kind of figured the D language would put this type of thing in the spec. Since the layout in memory will be consistent, I think I'm just going to let D do its normal thing and keep the members aligned in the default manner at a cost of a few extra bytes.

It's nice to know about align, though. In the unlikely event I end up running into this a lot, I will revisit that decision. I can just order my members for efficient space usage for now. As a side note, it would be cool if there were blank pages in the end of the The D Programming Language book to put addendums like this in there. I'll have to keep that in mind if I ever write a book.

Forums