Thread overview
persistence
May 07, 2003
Sean L. Palmer
May 08, 2003
Walter
May 09, 2003
C. Sauls
May 09, 2003
Sean L. Palmer
May 09, 2003
Helmut Leitner
May 09, 2003
Mark Evans
May 30, 2003
anderson
May 30, 2003
Georg Wrede
May 30, 2003
anderson
May 07, 2003
I've been thinking alot about persistence lately, saw a few other languages recently that offer this, and realized that a significant portion of my work is bogged down in the details of persistence of objects state.

One trick that we pull in games-programming land is to save off the entire stack and heap of the program (and OS if necessary) to a big binary file and compress it, then put it on the CD as a save game.  With creative use of setjmp/longjmp or equivalent, this allows one to "suspend" a program indefinitely and restore it at any time.  It's also blazingly fast compared to fiddling around with individual structure members, dealing with endianness and alignment, manipulating file pointers, size conversions, etc, etc ad nauseum.

Some other languages recently allow you to suspend the program to disk at any time and resume it later.

This seems to me to be the ideal form of persistence.

All you need is a way for an object to transform its binary image in memory in such a way that if stored to disk, it could rebuild the object from memory accessible elsewhere in the program dataspace and the object's disk image.

Most objects would be able to store enough information that they could be rebuilt, and fit that into the same amount of bits.

The compiler could obviously help out here by keeping track of where the objects are and what type they are, and providing enough hooks into its low level typeinfo structures that the objects could recouple themselves with other objects that had been saved.  Stuff like transforming object id's into references on load, and references to object id's on save.

Then adding a simple command keyword ("yield"? "suspend"?) or a library call would allow us to suspend program execution at any time in a way that allows the program, when run again on the same machine, to pick up exactly where it left off (probably in a function call off the main program loop.)

This is extremely powerful as it allows one to write database-like applications with nary a stream in sight.  No printfs, no fwrites, no sizeof's, no byte order, zilch.

The language could even go so far as to support running the program out of virtual memory.  It asks for memory, if there is none, it saves off some old objects and reclaims their space.  When the old objects are needed again, it (thru some page faulting mechanism) loads them back again.  On Win32 this would be an ideal situation for memory-mapped files.  D programs then wouldn't be bound by the amount of RAM on the target machine, but merely total backing store.  The OS could be paging stuff to network store or who knows where behind that;  potentially limitless storage.  With little to no work on the part of the application programmer.

That's the tricky part.  We applications programmers are lazy as all hell. We get off on finding out ways to *not* write code.  I freely admit it.

Instead we think of ways to do more work with less code.  The less code the better.

Similarly individual objects could be presented to the OS as separate files. You override the extension, you can specify where to put them at time of storage.  You can tell individual objects, go to storage, summon from storage.  Imagine manipulating arrays of objects which may or may not be loaded, and loading/unloading them from their corresponding files.

You could write code, run functions that generate objects, load files into memory, get everything set up and then, when you're ready to ship, cause a yield function call, zip up the directory, wrap an install program around it and ship it onto a CD along with the language runtime package.

For this to all work we need a couple things to happen:

You can't have objects changing memory layout without versioning problems. So the compiler would have to keep track of generations of object classes. One way to limit this is that each time a program is successfully recompiled, all data from existing running copies gets converted to the new format immediately.  The programmers have to provide these "conversion" functions, but they only have to do it once and then the code can be tossed if you wish, because then all the old objects are in the new format. Obviously this could go wrong so maybe the compiler could keep backups of all the old versions of these conversion functions in case it runs into old files or the programmer has to restore from backup.

Each time the memory structure of an object changes (thru successful recompile) the compiler could analyze the change (was it just moving this data from here to there, or was it something more complicated?) and perhaps auto generate these conversion functions.  It might only prompt for them when necessary.  If you had a good IDE all this could be done as you drag stuff around, with undo and everything.  Arguably this would be the trickiest part of the whole process.  The physical act of keeping track of the bits and moving them to/from disk is almost trivial by comparison to what it would take to make such a system not inherently brittle.  Admittedly I have not given this part of it much perusal yet.

Anyway there's also the issue of OS-level handles;  how to reconstruct things such as open files, reload textures from disk, etc.  Then there is the problem of figuring out how many things are taking up way more space in memory than they need to on disk (for instance, is the data *already* on the disk somewhere, and doesn't need to be resaved as a new copy just because it was in RAM when the program suspended).  For this we'd need some declaration helpers;  storage classes perhaps.  const, readonly_file, readwrite_file. Ways to specify the filename extensions or directories when saving.

This technology would be great for debugging too.  Imagine saving off program states in various stages of execution and analyzing them to determine the location of a bug.  Or to help reproduce errors (save right before the crash, then do what it takes to get it to crash)

Does this sound like it would make your life easier?  If so, help me flesh it out.

Sean




May 08, 2003
"Sean L. Palmer" <palmer.sean@verizon.net> wrote in message news:b9a3jf$1eeh$1@digitaldaemon.com...
> One trick that we pull in games-programming land is to save off the entire stack and heap of the program (and OS if necessary) to a big binary file
and
> compress it, then put it on the CD as a save game.  With creative use of setjmp/longjmp or equivalent, this allows one to "suspend" a program indefinitely and restore it at any time.  It's also blazingly fast
compared
> to fiddling around with individual structure members, dealing with endianness and alignment, manipulating file pointers, size conversions,
etc,
> etc ad nauseum.

That trick goes back at least to the 1970's! I used it myself in game programming. It's also how the DMC compiler does precompiled headers.


> Some other languages recently allow you to suspend the program to disk at
> any time and resume it later.
> This seems to me to be the ideal form of persistence.
> All you need is a way for an object to transform its binary image in
memory
> in such a way that if stored to disk, it could rebuild the object from memory accessible elsewhere in the program dataspace and the object's disk image.

If it can be reloaded at a different address, then what you need is a way to find all the pointers and add an offset.

> Most objects would be able to store enough information that they could be rebuilt, and fit that into the same amount of bits.
>
> The compiler could obviously help out here by keeping track of where the objects are and what type they are, and providing enough hooks into its
low
> level typeinfo structures that the objects could recouple themselves with other objects that had been saved.  Stuff like transforming object id's
into
> references on load, and references to object id's on save.
>
> Then adding a simple command keyword ("yield"? "suspend"?) or a library
call
> would allow us to suspend program execution at any time in a way that
allows
> the program, when run again on the same machine, to pick up exactly where
it
> left off (probably in a function call off the main program loop.)
>
> This is extremely powerful as it allows one to write database-like applications with nary a stream in sight.  No printfs, no fwrites, no sizeof's, no byte order, zilch.
>
> The language could even go so far as to support running the program out of virtual memory.  It asks for memory, if there is none, it saves off some
old
> objects and reclaims their space.  When the old objects are needed again,
it
> (thru some page faulting mechanism) loads them back again.  On Win32 this would be an ideal situation for memory-mapped files.  D programs then wouldn't be bound by the amount of RAM on the target machine, but merely total backing store.  The OS could be paging stuff to network store or who knows where behind that;  potentially limitless storage.  With little to
no
> work on the part of the application programmer.
>
> That's the tricky part.  We applications programmers are lazy as all hell. We get off on finding out ways to *not* write code.  I freely admit it.
>
> Instead we think of ways to do more work with less code.  The less code
the
> better.
>
> Similarly individual objects could be presented to the OS as separate
files.
> You override the extension, you can specify where to put them at time of storage.  You can tell individual objects, go to storage, summon from storage.  Imagine manipulating arrays of objects which may or may not be loaded, and loading/unloading them from their corresponding files.
>
> You could write code, run functions that generate objects, load files into memory, get everything set up and then, when you're ready to ship, cause a yield function call, zip up the directory, wrap an install program around
it
> and ship it onto a CD along with the language runtime package.
>
> For this to all work we need a couple things to happen:
>
> You can't have objects changing memory layout without versioning problems.

> So the compiler would have to keep track of generations of object classes. One way to limit this is that each time a program is successfully recompiled, all data from existing running copies gets converted to the
new
> format immediately.  The programmers have to provide these "conversion" functions, but they only have to do it once and then the code can be
tossed
> if you wish, because then all the old objects are in the new format. Obviously this could go wrong so maybe the compiler could keep backups of all the old versions of these conversion functions in case it runs into
old
> files or the programmer has to restore from backup.
>
> Each time the memory structure of an object changes (thru successful recompile) the compiler could analyze the change (was it just moving this data from here to there, or was it something more complicated?) and
perhaps
> auto generate these conversion functions.  It might only prompt for them when necessary.  If you had a good IDE all this could be done as you drag stuff around, with undo and everything.  Arguably this would be the trickiest part of the whole process.  The physical act of keeping track of the bits and moving them to/from disk is almost trivial by comparison to what it would take to make such a system not inherently brittle.
Admittedly
> I have not given this part of it much perusal yet.
>
> Anyway there's also the issue of OS-level handles;  how to reconstruct things such as open files, reload textures from disk, etc.  Then there is the problem of figuring out how many things are taking up way more space
in
> memory than they need to on disk (for instance, is the data *already* on
the
> disk somewhere, and doesn't need to be resaved as a new copy just because
it
> was in RAM when the program suspended).  For this we'd need some
declaration
> helpers;  storage classes perhaps.  const, readonly_file, readwrite_file. Ways to specify the filename extensions or directories when saving.
>
> This technology would be great for debugging too.  Imagine saving off program states in various stages of execution and analyzing them to determine the location of a bug.  Or to help reproduce errors (save right before the crash, then do what it takes to get it to crash)
>
> Does this sound like it would make your life easier?  If so, help me flesh it out.

It sounds complicated!


May 09, 2003
I'm a big fan of this, and as I'm getting into the game-programming world myself I would love to see this feature for precisely that purpose, if not for others.  A server project of mine, for example, could definitely benifit from this as a potential method of "true-state" database backups (small dbs, so the memory footprint is not an issue).

My only thing is, how to implement it directly, and should it neccesarily be a part of the language core.

I could imagine a syntax likened unto:
    import persistance;
    ...
    File statefile = new File("mstate.dat");
    ...
    memstate_dump(statefile);
    ...
    memstate_load(statefile);
    ...

-- C. Sauls


May 09, 2003
Minus all the registration and possible pointer fixups, that's essentially it.

Oh well, like Walter said, it's pretty complicated for something intended for the language core.  How do you support that on an embedded system for instance?

A library would be the ideal place for it, but it'd need serious cooperation from the GC and the OS would have to help, probably.  And definitely need an easy way to register conversion and update functions for when you want to change the memory layout (frequently;  that's why having the IDE watch for it and do conversion functions automatically would be great).

Maybe this is something you'd only enable toward the very end of your project, when object structures aren't changing much.

Also you need to do some kind of setjmp/longjmp type of thing.  Ideally you break the code up into 3 versions:  one to load raw data and build the in-memory working set, one to deal with being started from that point (the "real" app), and one that mainly serves to update files from old to new formats.

The hard part is writing all the functions to save and restore OS state (you will need to recreate hardware state as well, BTW, plus recreate all OS-allocated objects).

It's not trivial.  I guess it was a stupid request.  But other languages seem to have pulled off something similar;  maybe their way is less complicated somehow.  I should check some of that out.

Sean

"C. Sauls" <ibisbasenji@yahoo.com> wrote in message news:b9f9i4$2973$1@digitaldaemon.com...
> I'm a big fan of this, and as I'm getting into the game-programming world myself I would love to see this feature for precisely that purpose, if not for others.  A server project of mine, for example, could definitely
benifit
> from this as a potential method of "true-state" database backups (small
dbs,
> so the memory footprint is not an issue).
>
> My only thing is, how to implement it directly, and should it neccesarily
be
> a part of the language core.
>
> I could imagine a syntax likened unto:
>     import persistance;
>     ...
>     File statefile = new File("mstate.dat");
>     ...
>     memstate_dump(statefile);
>     ...
>     memstate_load(statefile);
>     ...
>
> -- C. Sauls


May 09, 2003

"C. Sauls" wrote:
> 
> I'm a big fan of this, and as I'm getting into the game-programming world myself I would love to see this feature for precisely that purpose, if not for others.  A server project of mine, for example, could definitely benifit from this as a potential method of "true-state" database backups (small dbs, so the memory footprint is not an issue).
> 
> My only thing is, how to implement it directly, and should it neccesarily be a part of the language core.
> 
> I could imagine a syntax likened unto:
>     import persistance;
>     ...
>     File statefile = new File("mstate.dat");
>     ...
>     memstate_dump(statefile);
>     ...
>     memstate_load(statefile);

I think

     FileSetMemstate("mstate.dat");
     FileGetMemstate("mstate.dat");

would be better. Except if you want to handle states internally, then

     String s=MemstateRetString();
     MemstateSetString(s);

would be better, allowing trivial

    void FileSetMemstate(String filename) {
      String ms=MemstateRetString();
      FileSetString(filename,ms);
    }

    void FileGetMemstate(String filename) {
      String ms=FileRetString(filename);
      MemstateSetString(ms);
    }

Sorry for throwing my LOP at you. :-)

-- 
Helmut Leitner    leitner@hls.via.at
Graz, Austria   www.hls-software.com
May 09, 2003
There are limits to this proposal but language introspection seems apropos.  A metaclass facility could list all instances of a given class.  A metaclass method could reveal an object's binary layout and serialize it to hex or base64 encoding, for example.

Personally I would use such encoding inside an XML format that is easy to maintain across versions of the program.  I might even use actual string translations instead of encoding, making the XML human-readable, at least for some objects.  Binary formats are too version-brittle.  A language able to emit objects in an XML format might be quite useful.

Ultimately no combination of language and library will produce hassle-free persistence.  At minimum you always have OS resources to reconstruct.  Still the more support for it the better, and metaclasses plus XML seem promising to me.

Mark


May 30, 2003
"Sean L. Palmer" <palmer.sean@verizon.net> wrote in message news:b9a3jf$1eeh$1@digitaldaemon.com...
> I've been thinking alot about persistence lately, saw a few other
languages
> recently that offer this, and realized that a significant portion of my
work
> is bogged down in the details of persistence of objects state.
>
> One trick that we pull in games-programming land is to save off the entire stack and heap of the program (and OS if necessary) to a big binary file
and
> compress it, then put it on the CD as a save game.  With creative use of setjmp/longjmp or equivalent, this allows one to "suspend" a program indefinitely and restore it at any time.  It's also blazingly fast
compared
> to fiddling around with individual structure members, dealing with endianness and alignment, manipulating file pointers, size conversions,
etc,
> etc ad nauseum.
>
> Some other languages recently allow you to suspend the program to disk at any time and resume it later.
>
> This seems to me to be the ideal form of persistence.
>
> All you need is a way for an object to transform its binary image in
memory
> in such a way that if stored to disk, it could rebuild the object from memory accessible elsewhere in the program dataspace and the object's disk image.
>
> Most objects would be able to store enough information that they could be rebuilt, and fit that into the same amount of bits.
>
> The compiler could obviously help out here by keeping track of where the objects are and what type they are, and providing enough hooks into its
low
> level typeinfo structures that the objects could recouple themselves with other objects that had been saved.  Stuff like transforming object id's
into
> references on load, and references to object id's on save.
>
> Then adding a simple command keyword ("yield"? "suspend"?) or a library
call
> would allow us to suspend program execution at any time in a way that
allows
> the program, when run again on the same machine, to pick up exactly where
it
> left off (probably in a function call off the main program loop.)
>
> This is extremely powerful as it allows one to write database-like applications with nary a stream in sight.  No printfs, no fwrites, no sizeof's, no byte order, zilch.
>
> The language could even go so far as to support running the program out of virtual memory.  It asks for memory, if there is none, it saves off some
old
> objects and reclaims their space.  When the old objects are needed again,
it
> (thru some page faulting mechanism) loads them back again.  On Win32 this would be an ideal situation for memory-mapped files.  D programs then wouldn't be bound by the amount of RAM on the target machine, but merely total backing store.  The OS could be paging stuff to network store or who knows where behind that;  potentially limitless storage.  With little to
no
> work on the part of the application programmer.
>
> That's the tricky part.  We applications programmers are lazy as all hell. We get off on finding out ways to *not* write code.  I freely admit it.
>
> Instead we think of ways to do more work with less code.  The less code
the
> better.
>
> Similarly individual objects could be presented to the OS as separate
files.
> You override the extension, you can specify where to put them at time of storage.  You can tell individual objects, go to storage, summon from storage.  Imagine manipulating arrays of objects which may or may not be loaded, and loading/unloading them from their corresponding files.
>
> You could write code, run functions that generate objects, load files into memory, get everything set up and then, when you're ready to ship, cause a yield function call, zip up the directory, wrap an install program around
it
> and ship it onto a CD along with the language runtime package.
>
> For this to all work we need a couple things to happen:
>
> You can't have objects changing memory layout without versioning problems. So the compiler would have to keep track of generations of object classes. One way to limit this is that each time a program is successfully recompiled, all data from existing running copies gets converted to the
new
> format immediately.  The programmers have to provide these "conversion" functions, but they only have to do it once and then the code can be
tossed
> if you wish, because then all the old objects are in the new format. Obviously this could go wrong so maybe the compiler could keep backups of all the old versions of these conversion functions in case it runs into
old
> files or the programmer has to restore from backup.
>
> Each time the memory structure of an object changes (thru successful recompile) the compiler could analyze the change (was it just moving this data from here to there, or was it something more complicated?) and
perhaps
> auto generate these conversion functions.  It might only prompt for them when necessary.  If you had a good IDE all this could be done as you drag stuff around, with undo and everything.  Arguably this would be the trickiest part of the whole process.  The physical act of keeping track of the bits and moving them to/from disk is almost trivial by comparison to what it would take to make such a system not inherently brittle.
Admittedly
> I have not given this part of it much perusal yet.
>
> Anyway there's also the issue of OS-level handles;  how to reconstruct things such as open files, reload textures from disk, etc.  Then there is the problem of figuring out how many things are taking up way more space
in
> memory than they need to on disk (for instance, is the data *already* on
the
> disk somewhere, and doesn't need to be resaved as a new copy just because
it
> was in RAM when the program suspended).  For this we'd need some
declaration
> helpers;  storage classes perhaps.  const, readonly_file, readwrite_file. Ways to specify the filename extensions or directories when saving.
>
> This technology would be great for debugging too.  Imagine saving off program states in various stages of execution and analyzing them to determine the location of a bug.  Or to help reproduce errors (save right before the crash, then do what it takes to get it to crash)
>
> Does this sound like it would make your life easier?  If so, help me flesh it out.

Yes, that'd cut my code down by 20%. It's such a common task, and serilization methods still require quite a bit of work.  I think it may be possible mainly as a library, but that'd be even more complex then building it into the complier.

> Sean
>
>
>
>


May 30, 2003
"Sean L. Palmer" <palmer.sean@verizon.net> wrote in message news:b9a3jf$1eeh$1@digitaldaemon.com...
> I've been thinking alot about persistence lately, saw a few other
languages
> recently that offer this, and realized that a significant portion of my
work
> is bogged down in the details of persistence of objects state.

My, .02c.

Just an idea to toss around, just incase Walter ever changes his mind on persistent objects in D.

I think that at least some conversion could be generated by the complier. If a pointer to an object (or object) conversion can be also generated, then the parent object could also be generated.

//The serilization methods for these can be generated
class A
{
private(archive): //This is only prototype
    int x;
    int y;
};

class B
{
private(archive):
    B *b;
};

With pointers, (I guess this is kindy garden stuff to you guys), the first time it's refereced it'd be saved to the achive, and subsquently only the referece would be achived.

Things that are connections would need to be wrapped (int the standard lib).

//Connection file
connection class FileC : File //File would be treated more like a friend
{
private(archive):
    char *filename;

private:

    //The serialize method would be like a constructor.
    //default will cause the default varables to save as per normal
(staticly detects members and builds that code)
    void serialize(archive X):default(X)
    {
        //Do any other data conversion's nessary
        switch (X.mode)
        {
        case SAVE:
            //Stuff releated to input (filename is done by default(X))
        break;
        case LOAD:
            //Stuff releated to output (filename is done by default(X))
        break;
        case OPEN:
            //Stuff to do with opening connections
            open(filename);
        break;
        case CLOSE:
            //Stuff to do with closing connections
            close(filename);
        break;
     };
};


class D : E
{

     FileC A;

    //Serialize this class
    //default will cause the default varables to save as per normal
    void serialize(archive X)
    {
        default(X);
        //Do any other data conversion's nessary
        switch (X.mode)
        {
        case SAVE:
            //Stuff releated to input
        break;
        case LOAD:
            //Stuff releated to output
        break;
        case OPEN:
            //Stuff to do with opening connections
        break;
        case CLOSE:
            //Stuff to do with closing connections
        break;
        };
    }
};

On creation an object would first be created in memory, and then it's connections would be opened.

static members should also be serilizable (they wouldn't need a method).

Anyway most of that appears to be something that could be in the standard lib except for

connection //Indicates, that the class is a special type connection default(...) //Method that saves/loads/open/closes the appropriate information.

Another idea.
Default  could be part of the standard lib, if. you could loop through every
variable in a class and
determine it's type information.  The problem here is that variables can be
different sizes.

//Has the for each statement been determined for D yet?
foreach(member; theObject)
{
    ...
    if (Class) //Pseudo
    {
        member.serialize(X);
    }
    else if (int) //Pseudo
    {
        X.put(member);
    }
    else //... othertypes
}

Anyway, perhaps that'll spark some more ideas.

PS - I also brought this topic up a while ago as well.


May 30, 2003
>>
>> This technology would be great for debugging too.  Imagine saving off program states in various stages of execution and analyzing them to determine the location of a bug.  Or to help reproduce errors (save right before the crash, then do what it takes to get it to crash)
>>
>> Does this sound like it would make your life easier?  If so, help me flesh it out.
>
>Yes, that'd cut my code down by 20%. It's such a common task, and serilization methods still require quite a bit of work.  I think it may be possible mainly as a library, but that'd be even more complex then building it into the complier.

This is standard issue in Unix. You can force your program to save its entire state on disk. Actually, even if you don't, Unix will save it for you if your program crashes hard. (That's why you find all those large files called "core".) Examining then this file with standard Unix debug tools lets you see exactly what was going on at the time it crached.