Jump to page: 1 2
Thread overview
serialization library
Nov 08, 2006
Christian Kamm
Nov 09, 2006
Walter Bright
Nov 09, 2006
Christian Kamm
Nov 09, 2006
Bill Baxter
Nov 09, 2006
Walter Bright
Nov 09, 2006
Bill Baxter
Nov 10, 2006
Georg Wrede
Nov 11, 2006
Bill Baxter
Nov 11, 2006
Georg Wrede
Nov 11, 2006
Sean Kelly
Dec 01, 2009
Robert Ramey
Nov 09, 2006
Christian Kamm
Nov 09, 2006
Christian Kamm
Nov 09, 2006
Fredrik Olsson
Nov 09, 2006
Bill Baxter
Nov 10, 2006
Fredrik Olsson
Jan 06, 2007
Paulo Herrera
November 08, 2006
Based on initial work from Tom S and clayasaurus, I've written this serialization library. If hope something like this doesn't already exist!

http://www.math.tu-berlin.de/~kamm/d/serialization.zip

Currently, it only provides binary file io through the Serializer class. It can
- write/read almost (hopefully) every type through a call to Serializer.describe
- track class references and pointers by default
- serialize classes and structs through a templated 'describe' member function
- write derived classes from base class reference*
- read derived classes into base class reference*
- serialize not default constructible classes*

(* for this to work, the class needs to be registered with the archive type)

It has far less features than boost::serialization but is already in a very usable state: FreeUniverse, a D game based on the Arc library, uses it for writing and loading savegames as well as other persistant state information.

What it does not do/is missing:
- exception safety / multithread safety
- out-of-class/struct serialization methods (is it possible to check whether a specific overload exists at compile time?)
- static arrays need to be serialized with describe_staticarray (static arrays can't be inout, so the general-purpose template method doesn't work... is there a way around the problem?)
- things I forgot right now

Documentation is still rather sparse. This short example shows the basic usage

---
struct Foo
{
  int a = 3;

  void describe(T)(T archive)
  {
    archive.describe(a);
  }
}

void main()
{
  real bar = 3.141;
  Foo foo;

  // write data
  Serializer s = new Serializer("testfile", FileMode.Out);
  s.describe(bar);
  s.describe(foo);
  delete s;

  // read data
  s = new Serializer("testfile", FileMode.In);
  s.describe(bar);
  s.describe(foo);
}
---

See the unittests in serializer.d for other details. Most of the logic is in basicarchive.d. Docs definitely need work.

Since FreeUniverse was its first real user, it is currently maintained in the FreeUniverse svn. However, if other people are interested, I will request a seperate project for it on dsource.

Comments and improvements are of course welcome.

Best Regards,
Christian Kamm
November 09, 2006
Christian Kamm wrote:
> Based on initial work from Tom S and clayasaurus, I've written this serialization library. If hope something like this doesn't already exist!

Great! Can you do some of the suggestions on http://www.digitalmars.com/d/howto-promote.html?
November 09, 2006
Christian Kamm wrote:
> Based on initial work from Tom S and clayasaurus, I've written this serialization library. If hope something like this doesn't already exist!

Great!

> 
> http://www.math.tu-berlin.de/~kamm/d/serialization.zip
> 
> Currently, it only provides binary file io through the Serializer class. It can
> - write/read almost (hopefully) every type through a call to Serializer.describe
> - track class references and pointers by default
> - serialize classes and structs through a templated 'describe' member function
> - write derived classes from base class reference*
> - read derived classes into base class reference*
> - serialize not default constructible classes*
> 
> (* for this to work, the class needs to be registered with the archive type)
> 
> It has far less features than boost::serialization but is already in a very usable state: FreeUniverse, a D game based on the Arc library, uses it for writing and loading savegames as well as other persistant state information.

I'm using Boost::serialization but I'm not at all happy with it.  But the things that I don't like mostly have to do with versioning, which it looks like you don't support anyway.

> What it does not do/is missing:
> - exception safety / multithread safety
> - out-of-class/struct serialization methods (is it possible to check whether a specific overload exists at compile time?)

I could be mistaken but I think this is that ADL / Koenig Lookup territory that Walter doesn't want go into.

> - static arrays need to be serialized with describe_staticarray (static arrays can't be inout, so the general-purpose template method doesn't work... is there a way around the problem?)
> - things I forgot right now

Endian issues?

> 
> Documentation is still rather sparse. This short example shows the basic usage


Just a wish list item, but I'd prefer an actual "file format" library as opposed to a serialization library.  Maybe a file format library would build on top of the serialization library, but anyway, the key difference is that a serialization lib aims to turn *particular* data structures into a binary format that can be losslessly loaded back into the same data structure later.

But that is not the way people design generic file formats, like say the Photoshop file format.  Things like that need to be very extensible and shouldn't be tied to particular data structures.  I think that's where boost::serialization gets into trouble.  Once you start talking about versioning, you're no longer talking about one specific data structure.

For instance Boost::serialization lacks a way to ignore blocks or skip chunks of data that are not recognized or obsolete.  You actually have to load the obsolete thing into the proper (possibly obsolete) data structure and then delete the unnecessary thing you just created.  This is not good from the forwards/backwards compatibility view.  Old code simply cannot read the file (even if it understands the majority of the chunks that matter), and new code is forced to maintain old data structures just for the purpose of loading up obsolete data and throwing it away.

How do you fix it?  Very simple really.  Just store the file as a series of chunks with fixed length headers, and each header contains the length of the data in that chunk.  If you get a chunk header with a tag you don't understand, just ignore it.  A particular chunk can have sub-chunks too.  I think it's similar in many ways to a grammar definition:

  file:
    header chunklist

  chunklist:
    chunk
    chunk chunklist

  header:
    typeIndicator versionNumber DataEndianness

  chunk:
    chunkHeader data

  chunkHeader:
    chunkType DataLength

  data:
    // Here's where you list all the types of data known to you

Or something like that.
I'd like a library that helps me read and write my data in that sort of data-structure independent format.

--bb
November 09, 2006
Bill Baxter wrote:
> How do you fix it?  Very simple really.  Just store the file as a series of chunks with fixed length headers, and each header contains the length of the data in that chunk.  If you get a chunk header with a tag you don't understand, just ignore it.  A particular chunk can have sub-chunks too.

Evolution of a file format:

1.0: Just spew the struct contents out into a file using something like fwrite().

2.0: Oops! Need to update 1.0 and retain backwards compatibility. Solution: 2.0 files put out 'illegal' values into the 1.0 format to signal it's a 2.0 file.

3.0: Doh! Find another set of illegal 2.0 values. This time, get smarter and have another field with a version number in it.

4.0: Get smart and implement your suggestion, so you can have both backwards and *forwards* compatibility.

Think I'm joking? Just look at a few! Everyone learns this the hard way.

Me, if practical, I like file formats to be in ascii so I can examine them easily to see if they're working right.
November 09, 2006
Walter Bright wrote:
> Bill Baxter wrote:
>> How do you fix it?  Very simple really.  Just store the file as a series of chunks with fixed length headers, and each header contains the length of the data in that chunk.  If you get a chunk header with a tag you don't understand, just ignore it.  A particular chunk can have sub-chunks too.
> 
> Evolution of a file format:
> 
> 1.0: Just spew the struct contents out into a file using something like fwrite().
> 
> 2.0: Oops! Need to update 1.0 and retain backwards compatibility. Solution: 2.0 files put out 'illegal' values into the 1.0 format to signal it's a 2.0 file.
> 
> 3.0: Doh! Find another set of illegal 2.0 values. This time, get smarter and have another field with a version number in it.
> 
> 4.0: Get smart and implement your suggestion, so you can have both backwards and *forwards* compatibility.
> 
> Think I'm joking? Just look at a few! Everyone learns this the hard way.

I guess I'm no exception.  ;-)  I've been through the 4 step program a few times myself.

> Me, if practical, I like file formats to be in ascii so I can examine them easily to see if they're working right.

That is one thing I do like about boost::serialization.  With basically one line of code I can switch between xml serialization and binary serialization.  Only thing I didn't like was I couldn't figure out how to keep some things binary.

--bb
November 09, 2006
>> What it does not do/is missing:
>
> Endian issues?

Oh, indeed. It does not take care of them yet. Additionally, classes are (if unregistered) identified by their mangled name, which might vary between compilers, I think.

> Just a wish list item, but I'd prefer an actual "file format" library as opposed to a serialization library.  ...

I agree that it is not very well suited for writing/reading user data or files with a long life-expectancy. It is very nice for temporarily swapping data to disk and similar tasks, where the same process reads back the data it wrote to a file earlier.

A full-fledged "file format" library, while being something very useful I'd love to see as well, would be a project for another day though.

Christian
November 09, 2006
> Great! Can you do some of the suggestions on http://www.digitalmars.com/d/howto-promote.html?

Sure, I wanted to wait until I got some responses and feedback from the community though.
November 09, 2006
> How do you fix it?  Very simple really.  Just store the file as a series of chunks with fixed length headers, and each header contains the length of the data in that chunk.  If you get a chunk header with a tag you don't understand, just ignore it.  A particular chunk can have sub-chunks too.  I think it's similar in many ways to a grammar definition:

Check out
http://www.math.tu-berlin.de/~kamm/d/serializationchunk.zip

The chunk.d contains a hackish implementation of your chunk idea: when reading, it discards any chunk-parts it doesn't understand. Once it got to one it can process, it discards any other older-versioned chunks of the same type. When writing, it is possible to write legacy chunks for older versions.

To test it, you need to compile with -version=V1_SER for the 1.0 version of the program and with -version=V1_SER -version=V2_SER for the 2.0 version of the program. Try running the v2 version, copy data_out to data_in and run the v1 version. (sorry for the complicated instructions, it's just a hack!)

Is this, approximately, what you had in mind? Personally, I'm not sure about all those classes required and how it would look in a larger project: maybe writing a version number and then having the user write a switch statement for it would have been ok too.

Christian
November 09, 2006
Bill Baxter skrev:
<snip>
> How do you fix it?  Very simple really.  Just store the file as a series of chunks with fixed length headers, and each header contains the length of the data in that chunk.  If you get a chunk header with a tag you don't understand, just ignore it.  A particular chunk can have sub-chunks too.  I think it's similar in many ways to a grammar definition:
> 
>   file:
>     header chunklist
> 
>   chunklist:
>     chunk
>     chunk chunklist
> 
>   header:
>     typeIndicator versionNumber DataEndianness
> 
>   chunk:
>     chunkHeader data
> 
>   chunkHeader:
>     chunkType DataLength
> 
>   data:
>     // Here's where you list all the types of data known to you
> 
> Or something like that.
> I'd like a library that helps me read and write my data in that sort of data-structure independent format.
> 
What you want is a lib for reading and writing EA IFF-85 compatible files?

I actually have some code for D doing this around somewhere. Written to be able to read IFF graphics files and Lightwave 3D objects.

I shall dig up the code, clean it up in a presentable shape, and make it public.


// Fredrik Olsson

> --bb
November 09, 2006
Fredrik Olsson wrote:
> Bill Baxter skrev:
> <snip>

>>
> What you want is a lib for reading and writing EA IFF-85 compatible files?

I've never heard of EA IFF-85, but a brief skim of the description here:
http://www.newtek.com/lightwave/developer/LW80/8lwsdk/docs/filefmts/eaiff85.html
sounds good.

Is it something 3D-graphics specific though?  Electronic Arts created the standard, and the website I found above is for a 3D modeling package...  But it looks right.

But the truth is I don't know what I want exactly in terms of API.  I just want something that makes it easy to take my data structures -> extract the data into something generic and ESPECIALLY not intrinsically tied to the types in my program -> save it to disk -> load it back into whatever data structures I choose later.  It's ok if it's a little more painful than   MyData.serialize(archive); MyData.load(archive);  as long as it achieves the goal.

With Boost::serialization I've ended up having to write upgrader programs a few times over the course of development.  It's always a pain because boost::serialization wants to be smart so what I end up doing is taking an old version of my data structures header file, wrapping it in an "oldversion" namespace then load via the olversion::type, and save via the newversion::type.

> I actually have some code for D doing this around somewhere. Written to be able to read IFF graphics files and Lightwave 3D objects.

Oh, right, lightwave.  So I guess it's not a coincidence google for EA IFF-85 turned up NewTek's page.

> I shall dig up the code, clean it up in a presentable shape, and make it public.

Cool, can you explain how the API works a little?  I guess I can imagine that loading such a file is not so different from loading an XML file. So like XML parsing there are a few ways to do it.


--bb
« First   ‹ Prev
1 2