Thread overview
How to read a C++ class from file into memory
Mar 22, 2007
David Finlayson
Mar 23, 2007
Daniel Keep
Mar 23, 2007
David Finlayson
Mar 23, 2007
David Finlayson
Mar 23, 2007
Daniel Keep
Mar 23, 2007
David Finlayson
Mar 23, 2007
David Finlayson
March 22, 2007
I am coming from Python to D, so forgive my limited C/C++ knowledge.

What is the idiomatic way to read a heterogeneous binary structure in D?

In my C++ book, it shows examples of defining a class or struct with the appropriate types and then passing a pointer to this class to fread().

However, in Java or Python I could just read the types directly from a binary stream (including the padding bytes associated with the structure on disk).

How should I do this in D?

I did see this post:

http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D.learn&article_id=6071

Note that I ultimately want to store these data back into classes where I can work with it.

Thanks,

David




March 23, 2007

David Finlayson wrote:
> I am coming from Python to D, so forgive my limited C/C++ knowledge.

Don't worry; I'd be inclined to think it's a good thing :3

> What is the idiomatic way to read a heterogeneous binary structure in D?
> 
> In my C++ book, it shows examples of defining a class or struct with the appropriate types and then passing a pointer to this class to fread().

For my money, that's a bad idea because the binary representation of a struct or object isn't necessarily the same on different machines or even same machine, different operating system.

Quick example: real is a different size on Windows to Linux (IIRC).

> However, in Java or Python I could just read the types directly from a binary stream (including the padding bytes associated with the structure on disk).
> 
> How should I do this in D?
> 
> I did see this post:
> 
> http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D.learn&article_id=6071
> 
> Note that I ultimately want to store these data back into classes where I can work with it.
> 
> Thanks,
> 
> David

The nice thing about Python is the pickle protocol.  That's what I assume you were using.  Since Python can interactively inspect objects to find out what data is attached to them, this is really easy.  It's also nice because pickle isn't blind: it will serialise things in a predictable format based on type, not on in-memory layout.

So you can dump a bunch of Python objects to a file, send it to another machine, and read them back out again.

In D, we're kinda-sorta there.  The way I'm solving this is using the .tupleof property of structures.  For example:

struct Point { double x, y; }

{
    Point pt;
    foreach( member ; pt.tupleof )
        member = 0.0;
}

At which point, pt.x = pt.y = 0.  Combining this with templated functions lets you write out or read in any structure you please.

Note: I haven't tried *any* of this with classes, because my feeling is that classes are often much more complex than structures (which are just plain old data, clumped together), plus they're far more likely to have references to other stuff; then what do you do?

I thought about including some code from my serialisation library, but
it tends to be "all or nothing".  Plus, this was written for a research
project, and I'm not sure who *actually* owns the code (me or the uni) X_X.

Anyway, hope this helps :)

	-- Daniel

-- 
int getRandomNumber()
{
    return 4; // chosen by fair dice roll.
              // guaranteed to be random.
}

http://xkcd.com/

v2sw5+8Yhw5ln4+5pr6OFPma8u6+7Lw4Tm6+7l6+7D i28a2Xs3MSr2e4/6+7t4TNSMb6HTOp5en5g6RAHCP  http://hackerkey.com/
March 23, 2007

Daniel Keep Wrote:

> For my money, that's a bad idea because the binary representation of a struct or object isn't necessarily the same on different machines or even same machine, different operating system.
> 

Agreed, however, the file is a binary storage format for a sonar system. I have been given enough code snippets from the company to read the file (and I have a working Python version), what I want to do now is convert this code to D.

Ultimately, it is my problem to conform to their file format. I don't even know the byte alignment. However, I do have a working Python prototype. (I guess if I thought about it, I could figure the alignment out now).

> The nice thing about Python is the pickle protocol.  That's what I assume you were using.  Since Python can interactively inspect objects to find out what data is attached to them, this is really easy.  It's also nice because pickle isn't blind: it will serialise things in a predictable format based on type, not on in-memory layout.
> 

In my case, I used Python's struct.unpack module to build a reader for each of the classes, structs and unions (yes, they used all three types). It took me a while, but I was able to identify where the padding bytes were placed to fill out the structures on disk. So, I understand exactly how the file is stored on disk. All I need to do is learn how to read the file efficiently in D.

Thanks for answering my post. Do you know how I might use std.stream to read these files?

David




March 23, 2007
"David Finlayson" <david.p.finlayson@gmail.com> wrote in message news:etuh4g$2ijp$1@digitalmars.com...
>I am coming from Python to D, so forgive my limited C/C++ knowledge.
>
> What is the idiomatic way to read a heterogeneous binary structure in D?
>
> In my C++ book, it shows examples of defining a class or struct with the appropriate types and then passing a pointer to this class to fread().
>
> However, in Java or Python I could just read the types directly from a binary stream (including the padding bytes associated with the structure on disk).
>
> How should I do this in D?
>
> I did see this post:
>
> http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D.learn&article_id=6071
>
> Note that I ultimately want to store these data back into classes where I can work with it.
>

If you haven't got too many classes/structures to serialize, I've attached a module with a simple serialization/deserialization mechanism that works with std.stream.  It will automatically serialize out all primitive and array types, as well as structures which have no unions.  You can specify custom serialization and deserialization methods for classes and structures, and you can make structures behave as though they were an opaque chunk of data (for performance when reading/writing).  It's very easy to use; to serialize anything, you just write:

Serialize(stream, data);

And to deserialize it again:

Deserialize(stream, data);

Defining the custom methods for classes and structs is easy.  The serialize function should just be declared as "void serialize(Stream s)" and the deserialize function as "static T deserialize(Stream s)", where T is the type for which you're defining the deserialize function.

I extracted this code from a larger module, and I think it has everything it needs to work, if it doesn't let me know!



March 23, 2007
"David Finlayson" <david.p.finlayson@gmail.com> wrote in message news:etvlr7$1a9p$1@digitalmars.com...

> Thanks for answering my post. Do you know how I might use std.stream to read these files?

If you exactly how the data is structured, you may be able to define several structures which define the layout of the data, e.g.

align(1) struct Header
{
    uint magic;
    uint version;
    char[100] comments;
}

Or something along those lines, and then read it in with readExact:

Stream s = ...
Header h;
s.readExact(&h, Header.sizeof);

If the format is more complex, it'll probably take a bit more work, but that's the general idea.


March 23, 2007
For the moment, I just want to understand the key part of your Deserialize function.

The secret sauce is in this line (and others like it):

s.readExact(strptr, char.sizeof * len)

where s is a file stream. It looks like with this method, it is only possible to read a single variable or an array of the same type (as you are doing here). Is it possible to send a pointer to a struct and read THAT in from the stream? If so, how does it handle padding bytes? I know there is an align() attribute for structs that might apply here.
March 23, 2007
Jarrett Billingsley Wrote:

> "David Finlayson" <david.p.finlayson@gmail.com> wrote in message news:etvlr7$1a9p$1@digitalmars.com...
> 
> > Thanks for answering my post. Do you know how I might use std.stream to read these files?
> 
> If you exactly how the data is structured, you may be able to define several structures which define the layout of the data, e.g.
> 
> align(1) struct Header
> {
>     uint magic;
>     uint version;
>     char[100] comments;
> }
> 
> Or something along those lines, and then read it in with readExact:
> 
> Stream s = ...
> Header h;
> s.readExact(&h, Header.sizeof);
> 
> If the format is more complex, it'll probably take a bit more work, but that's the general idea.
> 

OK, this is what I want. Question:

Header h creates a structure of type Header. Is h a pointer? It looks like you dereferenced it with &h in readExact(). I really don't understand how D uses pointers yet.

> 

March 23, 2007

David Finlayson wrote:
> Jarrett Billingsley Wrote:
> 
>> "David Finlayson" <david.p.finlayson@gmail.com> wrote in message news:etvlr7$1a9p$1@digitalmars.com...
>>
>>> Thanks for answering my post. Do you know how I might use std.stream to read these files?
>> If you exactly how the data is structured, you may be able to define several structures which define the layout of the data, e.g.
>>
>> align(1) struct Header
>> {
>>     uint magic;
>>     uint version;
>>     char[100] comments;
>> }
>>
>> Or something along those lines, and then read it in with readExact:
>>
>> Stream s = ...
>> Header h;
>> s.readExact(&h, Header.sizeof);
>>
>> If the format is more complex, it'll probably take a bit more work, but that's the general idea.
>>
> 
> OK, this is what I want. Question:
> 
> Header h creates a structure of type Header. Is h a pointer? It looks like you dereferenced it with &h in readExact(). I really don't understand how D uses pointers yet.

No, structs in D are POD: Plain Old Data.  &h is taking the address of h.

What readExact does is it takes a pointer, and a length, and reads exactly that many bytes, and puts them at that pointer.  &h works out *where* h is being stored (so readExact can write to it), and Header.sizeof tells it how many bytes to read.

	-- Daniel

-- 
int getRandomNumber()
{
    return 4; // chosen by fair dice roll.
              // guaranteed to be random.
}

http://xkcd.com/

v2sw5+8Yhw5ln4+5pr6OFPma8u6+7Lw4Tm6+7l6+7D i28a2Xs3MSr2e4/6+7t4TNSMb6HTOp5en5g6RAHCP  http://hackerkey.com/
March 23, 2007
"David Finlayson" <david.p.finlayson@gmail.com> wrote in message news:etvmlg$1auh$1@digitalmars.com...
> For the moment, I just want to understand the key part of your Deserialize function.
>
> The secret sauce is in this line (and others like it):
>
> s.readExact(strptr, char.sizeof * len)
>
> where s is a file stream. It looks like with this method, it is only possible to read a single variable or an array of the same type (as you are doing here). Is it possible to send a pointer to a struct and read THAT in from the stream? If so, how does it handle padding bytes? I know there is an align() attribute for structs that might apply here.

Yes, as you've seen in the other post, you can do that.  As for alignment issues, that's up to your structure to know the layout of your data.  Say you know the format is something like:

Header structure:
0x0000 4 bytes: Magic number
0x0004 2 bytes: Version
0x0005 1 byte: Flags
0x0006 1 byte: (padding)
0x0008 24 bytes: Comments
0x0020 4 bytes: offset in file to some table
0x0023 4 bytes: (reserved)
--------------------------
Total: 40 bytes

So you could write your structure like this.  Notice we use "align(1)" on the structure to signal to the compiler that the members should be packed in as tightly as possible.  This way we have complete control over how the data is laid out.

align(1) struct Header
{
    uint magic;
    ushort version;
    ubyte flags;

    ubyte _padding1;

    char[24] comments;
    uint tableOffset;

    uint _reserved;
}

// For good measure
static assert(Header.sizeof == 40);

That static assert is there to make sure that the structure's size matches the calculated size of the header, and to make sure that we don't inadvertently change the header struct and mess things up.  Of course, if the header is a variable length, that assert probably wouldn't be there.


March 23, 2007
Thanks Dan and Jerret -

Using readExact() to load a struct worked perfectly.