November 10, 2006
Bill Baxter wrote:
> Fredrik Olsson wrote:
>> Bill Baxter skrev:
>> <snip>
> 
>>>
>> What you want is a lib for reading and writing EA IFF-85 compatible files?
> 
> I've never heard of EA IFF-85, but a brief skim of the description here:
> http://www.newtek.com/lightwave/developer/LW80/8lwsdk/docs/filefmts/eaiff85.html 
> 
> sounds good.
> 
> Is it something 3D-graphics specific though?  Electronic Arts created the standard, and the website I found above is for a 3D modeling package...  But it looks right.
> 
No it is not limited to any specific kind of files. EA made it as an atempt to make a general file format structure for any use. Basically it just takes care of bundling chunks, marking them as required of optional. And byte-order-independence! IFF as such has been used for images as in IFF, audio AIFF, 3D objects OBJ, and lots more.

When Microsoft created BMP and WAV they more or less ripped the EA IFF rationale, but changed the required byte order.

EA IFF is a low level format. How to actually interpret the data that is contained is up to each application.


> But the truth is I don't know what I want exactly in terms of API.  I just want something that makes it easy to take my data structures -> extract the data into something generic and ESPECIALLY not intrinsically tied to the types in my program -> save it to disk -> load it back into whatever data structures I choose later.  It's ok if it's a little more painful than   MyData.serialize(archive); MyData.load(archive);  as long as it achieves the goal.
> 
> With Boost::serialization I've ended up having to write upgrader programs a few times over the course of development.  It's always a pain because boost::serialization wants to be smart so what I end up doing is taking an old version of my data structures header file, wrapping it in an "oldversion" namespace then load via the olversion::type, and save via the newversion::type.
> 
>> I actually have some code for D doing this around somewhere. Written to be able to read IFF graphics files and Lightwave 3D objects.
> 
> Oh, right, lightwave.  So I guess it's not a coincidence google for EA IFF-85 turned up NewTek's page.
> 
EA IFF 85 was a joint venture of Electronic Arts and Commodore, for creating a universal file format for the Amiga. Lightwave 3D is an old Amiga application, so them using IFF as a base is kind of natural.

>> I shall dig up the code, clean it up in a presentable shape, and make it public.
> 
> Cool, can you explain how the API works a little?  I guess I can imagine that loading such a file is not so different from loading an XML file. So like XML parsing there are a few ways to do it.
> 
IFF is not quite as flexible as XML, much more flat. So the API is very simple.
My current implementation wraps over a Stream instance, and implement simple methods as:
foo.seekNextChunk("CTAB");
auto bar = foo.readInt();
Etc, just for working with the basics, as defined by EA IFF 85. NewTek the creators of Lightwave 3D have made some additions, that would be nice to have as well.

But it was one of my first attempts at D, so I will rewrite it. Just how is something I shall think about. Having the chunks as independent instances over a seekable stream might be a good idea.

// Fredrik Olsson

> 
> --bb
November 10, 2006
Bill Baxter wrote:
> Walter Bright wrote:
> 
>> Bill Baxter wrote:
>>
>>> How do you fix it?  Very simple really.  Just store the file as a series of chunks with fixed length headers, and each header contains the length of the data in that chunk.  If you get a chunk header with a tag you don't understand, just ignore it.  A particular chunk can have sub-chunks too.
>>
>>
>> Evolution of a file format:
>>
>> 1.0: Just spew the struct contents out into a file using something like fwrite().
>>
>> 2.0: Oops! Need to update 1.0 and retain backwards compatibility. Solution: 2.0 files put out 'illegal' values into the 1.0 format to signal it's a 2.0 file.
>>
>> 3.0: Doh! Find another set of illegal 2.0 values. This time, get smarter and have another field with a version number in it.
>>
>> 4.0: Get smart and implement your suggestion, so you can have both backwards and *forwards* compatibility.
>>
>> Think I'm joking? Just look at a few! Everyone learns this the hard way.
> 
> 
> I guess I'm no exception.  ;-)  I've been through the 4 step program a few times myself.
> 
>> Me, if practical, I like file formats to be in ascii so I can examine them easily to see if they're working right.

Heh, something Microsoft is only now trying to learn. And Unix guys knew right from the start. Even most of the communications protocols are in text.

> That is one thing I do like about boost::serialization.  With basically one line of code I can switch between xml serialization and binary serialization.  Only thing I didn't like was I couldn't figure out how to keep some things binary.

With a text file, you can tell what it is, even when the file has got misplaced or renamed, but with a binary it's pretty hopeless.

And of course you might look at a few languages ( ~ fileformats) especially made for serializing.

YAML looks very clean, and is easily readable by humans (XML is not)
JSON looks like ECMA script

The following page, although only vaguely related, gives an excellent intro to the ideology, at the center:

http://mike.teczno.com/json.html

With these, you'll be right where Walter was talking about.
November 11, 2006
Georg Wrede wrote:
> Bill Baxter wrote:
> 
>> That is one thing I do like about boost::serialization.  With basically one line of code I can switch between xml serialization and binary serialization.  Only thing I didn't like was I couldn't figure out how to keep some things binary.
> 
> 
> With a text file, you can tell what it is, even when the file has got misplaced or renamed, but with a binary it's pretty hopeless.

Having all the structure in ASCII is great, and maybe everything in ASCII while you're debugging, but some things just don't work well as ascii -- images, videos, audio files, 3D meshes, etc.  It makes sense to have the structure annotated in ascii, but when it comes to storing raw image data there's not much to be gained from storing that as a giant ASCII string.  With the boost::serialization's XML I wanted to be able to store that image as something like

<image width=1024 height=768 format=RGBA type="float">
  [big hunk o raw binary image data]
</image>

But I couldn't find any way to do that.

> And of course you might look at a few languages ( ~ fileformats) especially made for serializing.
> 
> YAML looks very clean, and is easily readable by humans (XML is not)

I took a look at that one before.  I agree that it would be nice if a more human-friendly alternative to XML caught on.

--bb
November 11, 2006
Bill Baxter wrote:
> Georg Wrede wrote:
> 
>> Bill Baxter wrote:
>>
>>> That is one thing I do like about boost::serialization.  With basically one line of code I can switch between xml serialization and binary serialization.  Only thing I didn't like was I couldn't figure out how to keep some things binary.
>>
>> With a text file, you can tell what it is, even when the file has got misplaced or renamed, but with a binary it's pretty hopeless.
> 
> Having all the structure in ASCII is great, and maybe everything in ASCII while you're debugging, but some things just don't work well as ascii -- images, videos, audio files, 3D meshes, etc.  It makes sense to have the structure annotated in ascii, but when it comes to storing raw image data there's not much to be gained from storing that as a giant ASCII string.  With the boost::serialization's XML I wanted to be able to store that image as something like
> 
> <image width=1024 height=768 format=RGBA type="float">
>   [big hunk o raw binary image data]
> </image>
> 
> But I couldn't find any way to do that.

Probably just as well, since then a normal parser could not handle it.

Or if you didn't care, you could invent your own tag type, like

<image width=1024 height=768 format=RGBA type="float">
   <binarydata length=2359296>*^%&ÄÖ/Ä%&*^Ä&%Ö/*^(ÄÖ%&*^/Ö(*%^&ÄÖ/*(^ÄÖ&%*/ÄÖ(*^&%ÄÖ/(*^ÄÖ&%*(/ÄÖ%&*^ÄÖ(/*^&%ÄÖ/*^(ÄÖ&%*^/ÄÖ(%Ä/&(*^ÄÖ*^%ÄÖ/&*fffTHIS_REPRESENTS_2MEGS_OF_BINARY_CRAP_RIGHT_HEREfff^#¤ÄÖ%*^&"ÄÖ¤*^%"Ä#Ö¤*^%ÄÖ</binarydata >
</image>

And, as you can see, you'd immediately lose the gains of having the thing as a text file because it gets unwieldy in a text viewer.

You could always serialize into a subdirectory and save the binaries (pictures, etc) as separate files there. Then the XML would only contain their names. (Not my invention. Java uses this, OpenOffice, and others.)

To save space you then zip the whole thing, thus getting your single serialization file, as originally wanted.

---

This can be very simple in the program, I remember seeing somewhere a library that made a zip file on disk look to the program like a subdirectory tree. Thus the zipping, creation of directories and other chores become transparent to the programmer.

Or you could just use the Phobos zip without a tree.
November 11, 2006
Bill Baxter wrote:
> Georg Wrede wrote:
>> Bill Baxter wrote:
>>
>>> That is one thing I do like about boost::serialization.  With basically one line of code I can switch between xml serialization and binary serialization.  Only thing I didn't like was I couldn't figure out how to keep some things binary.
>>
>>
>> With a text file, you can tell what it is, even when the file has got misplaced or renamed, but with a binary it's pretty hopeless.
> 
> Having all the structure in ASCII is great, and maybe everything in ASCII while you're debugging, but some things just don't work well as ascii -- images, videos, audio files, 3D meshes, etc.  It makes sense to have the structure annotated in ascii, but when it comes to storing raw image data there's not much to be gained from storing that as a giant ASCII string.  With the boost::serialization's XML I wanted to be able to store that image as something like
> 
> <image width=1024 height=768 format=RGBA type="float">
>   [big hunk o raw binary image data]
> </image>
> 
> But I couldn't find any way to do that.

Base64 encoding :-p


Sean
January 06, 2007
On Thu, 09 Nov 2006 02:06:21 +0100, Bill Baxter <dnewsgroup@billbaxter.com> wrote:

> Christian Kamm wrote:
>> Based on initial work from Tom S and clayasaurus, I've written this serialization library. If hope something like this doesn't already exist!
>
> Great!
>
>>  http://www.math.tu-berlin.de/~kamm/d/serialization.zip
>>  Currently, it only provides binary file io through the Serializer class. It can
>> - write/read almost (hopefully) every type through a call to Serializer.describe
>> - track class references and pointers by default
>> - serialize classes and structs through a templated 'describe' member function
>> - write derived classes from base class reference*
>> - read derived classes into base class reference*
>> - serialize not default constructible classes*
>>  (* for this to work, the class needs to be registered with the archive type)
>>  It has far less features than boost::serialization but is already in a very usable state: FreeUniverse, a D game based on the Arc library, uses it for writing and loading savegames as well as other persistant state information.
>
> I'm using Boost::serialization but I'm not at all happy with it.  But the things that I don't like mostly have to do with versioning, which it looks like you don't support anyway.
>
>> What it does not do/is missing:
>> - exception safety / multithread safety
>> - out-of-class/struct serialization methods (is it possible to check whether a specific overload exists at compile time?)
>
> I could be mistaken but I think this is that ADL / Koenig Lookup territory that Walter doesn't want go into.
>
>> - static arrays need to be serialized with describe_staticarray (static arrays can't be inout, so the general-purpose template method doesn't work... is there a way around the problem?)
>> - things I forgot right now
>
> Endian issues?
>
>>  Documentation is still rather sparse. This short example shows the basic usage
>
>
> Just a wish list item, but I'd prefer an actual "file format" library as opposed to a serialization library.  Maybe a file format library would build on top of the serialization library, but anyway, the key difference is that a serialization lib aims to turn *particular* data structures into a binary format that can be losslessly loaded back into the same data structure later.
>
> But that is not the way people design generic file formats, like say the Photoshop file format.  Things like that need to be very extensible and shouldn't be tied to particular data structures.  I think that's where boost::serialization gets into trouble.  Once you start talking about versioning, you're no longer talking about one specific data structure.
>
> For instance Boost::serialization lacks a way to ignore blocks or skip chunks of data that are not recognized or obsolete.  You actually have to load the obsolete thing into the proper (possibly obsolete) data structure and then delete the unnecessary thing you just created.  This is not good from the forwards/backwards compatibility view.  Old code simply cannot read the file (even if it understands the majority of the chunks that matter), and new code is forced to maintain old data structures just for the purpose of loading up obsolete data and throwing it away.
>
> How do you fix it?  Very simple really.  Just store the file as a series of chunks with fixed length headers, and each header contains the length of the data in that chunk.  If you get a chunk header with a tag you don't understand, just ignore it.  A particular chunk can have sub-chunks too.  I think it's similar in many ways to a grammar definition:
>
>    file:
>      header chunklist
>
>    chunklist:
>      chunk
>      chunk chunklist
>
>    header:
>      typeIndicator versionNumber DataEndianness
>
>    chunk:
>      chunkHeader data
>
>    chunkHeader:
>      chunkType DataLength
>
>    data:
>      // Here's where you list all the types of data known to you
>
> Or something like that.
> I'd like a library that helps me read and write my data in that sort of data-structure independent format.
>
> --bb

Take a look at the HDF file format that is used to serialize huge amounts of scientific data.
It implements a format that is very similar to the one you described.

http://www.hdfgroup.org/

Paulo


-- 
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/
December 01, 2009
boost serialization has an object wrapper for binary data - called (surprise) binary_object.  On text based formats, it uses base64 encoding it's in the documentation and also there is a specific test which shows how to use it.  And its extremely easy to use.

Robert Ramey
1 2
Next ›   Last »