Thread overview
Advanced features (for future)
Jan 29, 2003
Ilya Minkov
Jan 29, 2003
Bill Cox
Mar 05, 2003
Ilya Minkov
Mar 06, 2003
Bill Cox
Mar 09, 2003
Walter
Mar 06, 2003
Bill Cox
Mar 09, 2003
Walter
Jan 29, 2003
Burton Radons
Jan 29, 2003
Ilya Minkov
January 29, 2003
Hello.

It would be very good to be able to save classes to disk in a safe manner, so that (maybe only public?) fields can be saved and then read in, even if a class has been sublassed or expanded (not too hard, with current memory model), or even if the underlying machine is different (hard). But even saving would probably become much harder if powerful data reordering for arrays of classes is implemented.

For this i thing a special problem are Unions. A smart union type has to be introduced(switch?), which would keep information on active field, and thus provide debugging capabilities. BTW, a parsing library and many other usages would draw profit of such a "switch", being shorter to write and easier to maintain than a union.

Another useful thing is ML-style pattern matching which i have already wished. I was thinking about possible implementation, but then i got busy with other things. Yesterday i stumbled over a document describing *exactly this* - a C++ extention for this feature. I have only looked briefly at the document. Maybe their syntax is overbent, but it might be worth a look anyway.

http://citeseer.nj.nec.com/leung96cbased.html

-i.

January 29, 2003
Hi, Ilya.

Ilya Minkov wrote:
> Hello.
> 
> It would be very good to be able to save classes to disk in a safe manner, so that (maybe only public?) fields can be saved and then read in, even if a class has been sublassed or expanded (not too hard, with current memory model), or even if the underlying machine is different (hard). But even saving would probably become much harder if powerful data reordering for arrays of classes is implemented.
> 
> For this i thing a special problem are Unions. A smart union type has to be introduced(switch?), which would keep information on active field, and thus provide debugging capabilities. BTW, a parsing library and many other usages would draw profit of such a "switch", being shorter to write and easier to maintain than a union.

Some of the code gerators we use at work automatically create binary load and save functions.  In the early 90's we used them at QuickLogic, but we ran into difficulties maintaining binary backwards compatibility with our simple binary dumps.  We also found that a simple memory image of binary data structures typically takes up more space than a carefully designed ASIC format (which takes up more than a carefully designed binary format).

As a result, no one has used the binary load/save feature in a decade. It sounds cool. I even wrote code in one of the generators to do it.  It just hasn't been as usefull as I thought it would be.

Instead of building functions like binary load/save into the language, I'd recommend providing the hooks for users to do it with code generators.  Even if there's no direct generation capability in the language, there are a few things that could make D work better than C++ does with code generators.  In particular:

- Having a way to split up class definitions into multiple parts.

For example, an 'extend' keyword in front of a class could mean we're adding to an existing class.  This isn't inheritance.  We'd be modifying a class directly rather than creating a new one.

- Do the same thing for modules, functions, variables, and class methods.

It's kind of nice for code generator to be able to put a few fields here, add a few statements there, and add a couple functions to an existing module.  For example, the auto-generated recursive destructors we use were hell to write for C.  Every kind of class relationship supported had to be considered in big switch statements to generate all the different parts of the function.  Really ugly.  When targetting a language that supports these after-the-fact extensions, the complexity of the code gerator was reduced tremendously.  The same code that adds fields to the parent and child classes also adds a few statements to the recursive destructor.  It's much nicer.

Extensions like these allow code generators like ClassWizard to simply add files to your project, and not need to modify your hand written files.  No more parsing the whole language to do a simple generator.  No mor ugly /* !!! Do not edit this !!! */ machine generated crud in my files.

If you were to go the whole 9 yards, you might also allow a similar feature:  not just extensions... replacement!  You could use something like a replace keyword in front of your module or class or function or method or variable.

With this syntax, you can add little edit files to your projects that fix problems in a library you've been handed.

For example, if you run into a performance problem with a third party library (like that never happens ;-)) and track it down to the use of a singly linked list instead of doubly linked, you type a few lines of code in a patch file, and problem solved!  For the next ten years that it takes your library vendor to get around to fixing the problem, you have a work around that usually works with their new releases.

What do you think?

Bill Cox

January 29, 2003
Ilya Minkov wrote:
> It would be very good to be able to save classes to disk in a safe manner, so that (maybe only public?) fields can be saved and then read in, even if a class has been sublassed or expanded (not too hard, with current memory model), or even if the underlying machine is different (hard). But even saving would probably become much harder if powerful data reordering for arrays of classes is implemented.

This is in DLI under the pickle.d module.  It transfers a class field image, so new and reordered fields don't matter, and handles single transferrence of pointers, references, and arrays.  The only non-portable part is a dependency on IEEE.

> For this i thing a special problem are Unions. A smart union type has to be introduced(switch?), which would keep information on active field, and thus provide debugging capabilities. BTW, a parsing library and many other usages would draw profit of such a "switch", being shorter to write and easier to maintain than a union.

Unions don't get serialisation.  If you want to save a union, save the active state.

January 29, 2003
Burton Radons wrote:
> Ilya Minkov wrote:
> 
>> It would be very good to be able to save classes to disk in a safe manner, so that (maybe only public?) fields can be saved and then read in, even if a class has been sublassed or expanded (not too hard, with current memory model), or even if the underlying machine is different (hard). But even saving would probably become much harder if powerful data reordering for arrays of classes is implemented.
> 
> 
> This is in DLI under the pickle.d module.  It transfers a class field image, so new and reordered fields don't matter, and handles single transferrence of pointers, references, and arrays.  The only non-portable part is a dependency on IEEE.
> 

Cool. Thanks.
So it handles endianness.

> 
> Unions don't get serialisation.  If you want to save a union, save the active state.
> 

OK...
But do you doubt usefulness of a switching union?


Thanks a lot.

-i.

March 05, 2003
Hello. Sorry it took me that long to become aware of this post. :)

Comments embedded.

-i.

Bill Cox wrote:
> Hi, Ilya.
> 
> Ilya Minkov wrote:
> 
>> Hello.
>>
>> It would be very good to be able to save classes to disk in a safe manner, so that (maybe only public?) fields can be saved and then read in, even if a class has been sublassed or expanded (not too hard, with current memory model), or even if the underlying machine is different (hard). But even saving would probably become much harder if powerful data reordering for arrays of classes is implemented.
>>
>> For this i thing a special problem are Unions. A smart union type has to be introduced(switch?), which would keep information on active field, and thus provide debugging capabilities. BTW, a parsing library and many other usages would draw profit of such a "switch", being shorter to write and easier to maintain than a union.
> 
> 
> Some of the code gerators we use at work automatically create binary load and save functions.  In the early 90's we used them at QuickLogic, but we ran into difficulties maintaining binary backwards compatibility with our simple binary dumps.  We also found that a simple memory image of binary data structures typically takes up more space than a carefully designed ASIC format (which takes up more than a carefully designed binary format).

Hm. You have mentioned dynamic properties a while ago. With them, you probably wouldn't have such difficulties.
There also has to be some framework, which would allow extending the format, even if the serialisation code is written manually. A basic support for it would include that a basic class has a (stub) method for converting it into the stream of data (.Serialize ?, analogous to current ToHash and ToString). You would then implement this method in the simplest case with statements like "serstream ~ thisproperty.Serialize". This would also imply that .Serialize is implemented in the basic types. Analogous about reading.

Languages with dynamic only object methods seem to have this one problem less. However, implicit serialisation sequence would also allow to interpret some data, which cannot be represented in the object directly due to changes.

As to the framework, XML is one example of it. I consideer it though appropriate for such things, i would also prefer to have an equivalent binary format (with conversion utilities back and forth), since it would work faster and take up less space.

BTW, i could make such an XML-like framework... make a function like ToXMLData, which would be overloaded for basic types. A user can overload it for his own types. And for classes, it should take the corresponding method of a class. It should be doable with interfaces. Then a way to compose one XMLData of many and to save it all in binary, or convert it into real XML.

And i have to consider the Pizza contest. Don't expect much though since i'm not the major brain here and i'm only 20, i just started to study CS. And since i *never* eat at Pizza Hut, but rather in Restaurant Italy, Asado Steak, and some others. I still have over 100 restaurants to explore. :)

> As a result, no one has used the binary load/save feature in a decade. It sounds cool. I even wrote code in one of the generators to do it.  It just hasn't been as usefull as I thought it would be.

For static languages binary dumps are much less useful that to dynamic ones.

> Instead of building functions like binary load/save into the language, I'd recommend providing the hooks for users to do it with code generators.  Even if there's no direct generation capability in the language, there are a few things that could make D work better than C++ does with code generators.  In particular:
> 
> - Having a way to split up class definitions into multiple parts.
> 
> For example, an 'extend' keyword in front of a class could mean we're adding to an existing class.  This isn't inheritance.  We'd be modifying a class directly rather than creating a new one.
> 
> - Do the same thing for modules, functions, variables, and class methods.
> 
> It's kind of nice for code generator to be able to put a few fields here, add a few statements there, and add a couple functions to an existing module.  For example, the auto-generated recursive destructors we use were hell to write for C.  Every kind of class relationship supported had to be considered in big switch statements to generate all the different parts of the function.  Really ugly.  When targetting a language that supports these after-the-fact extensions, the complexity of the code gerator was reduced tremendously.  The same code that adds fields to the parent and child classes also adds a few statements to the recursive destructor.  It's much nicer.

These are all good ideas. Also consider, that one could possibly have very few classes in the application, but very many methods to add to them. Then it would make sense to split up the class across multiple files for easy navigation and editing. This means however, that all these units have to be compiled simultaneously. Dependencies can be awful to track.

> Extensions like these allow code generators like ClassWizard to simply add files to your project, and not need to modify your hand written files.  No more parsing the whole language to do a simple generator.  No mor ugly /* !!! Do not edit this !!! */ machine generated crud in my files.
> 
> If you were to go the whole 9 yards, you might also allow a similar feature:  not just extensions... replacement!  You could use something like a replace keyword in front of your module or class or function or method or variable.

Ouch.

> With this syntax, you can add little edit files to your projects that fix problems in a library you've been handed.
> 
> For example, if you run into a performance problem with a third party library (like that never happens ;-)) and track it down to the use of a singly linked list instead of doubly linked, you type a few lines of code in a patch file, and problem solved!  For the next ten years that it takes your library vendor to get around to fixing the problem, you have a work around that usually works with their new releases.

Cool :)

> What do you think?
> 
> Bill Cox
> 

March 06, 2003
> And i have to consider the Pizza contest. Don't expect much though since i'm not the major brain here and i'm only 20, i just started to study CS. And since i *never* eat at Pizza Hut, but rather in Restaurant Italy, Asado Steak, and some others. I still have over 100 restaurants to explore. :)

You've got a lot of knowledge about computer languages for being only 20.  Pretty impressive.  I'm 39, just old enough to have actually had a job programming in Fortran on a PDP-11/45.

-- Bill

March 06, 2003
I>> We also found that a simple memory image
>> of binary data structures typically takes up more space than a carefully designed ASIC format (which takes up more than a carefully designed binary format).
>
>Hm. You have mentioned dynamic properties a while ago. With them,
you
>probably wouldn't have such difficulties.

I thought a simple example might illustrate the trouble I had with binary save formats.  Suppose we're saving a directed graph to disk.  It's classes look like:

class Node {
LinkedList<Edge> inEdges, outEdges;
bool visited, marked;
char *name;
}

class Edge {
Node fromNode, toNode;
}

Now, let's assume I have a graph that in a text file would be represented as:

A B C
B C E
C A D
D A B C
E B D E

The first colum is node names, and the remaining symbols are destinations of edges.  This takes 34 bytes.

If we stream binary to the disk, I assume all Edges and Nodes wind up there.  Assume the LinkedList class has a head pointer a name, and two Booleans that I could pack into 1 byte.  Each Node would take 7 bytes. Each Edge has two Node pointers and two next pointers.  They would take 16 bytes.

On disk, the simple binary dump takes 5*7 + 12*16 = 227 bytes.  That's a whole lot worse than 34 bytes.

As for compatibility, suppose we later on convert our LinkedList relationships to DoublyLinkedList.  First, the binary size gets worse, while the text file doesn't.  Second, we now have to write converters to be able to load the old binary files.  We could gain some backward compatibility by using an even larger binary format that tags all the fields, but what's the point?  Are we trying to be efficient, or just trying to avoid writing a parser?

File size isn't important for most apps.  Look at how large MS Word files are.  No one cares.  I work with design files representing .13u chips.  A small file for us migh be 100 meg.  Not only does the text version reduce the size, but our users demand text so they can hack our data structurs with Perl scripts.

Bill


March 09, 2003
"Bill Cox" <bill@viasic.com> wrote in message news:3E6734FB.5060406@viasic.com...
> I'm 39, just old enough to have actually had a
> job programming in Fortran on a PDP-11/45.

Been there, done that <g>.


March 09, 2003
"Bill Cox" <Bill_member@pathlink.com> wrote in message news:b47fib$5io$1@digitaldaemon.com...
> File size isn't important for most apps.  Look at how large MS Word files are.  No one cares.  I work with design files representing .13u chips.  A small file for us migh be 100 meg.  Not only does the text version reduce the size, but our users demand text so they can hack our data structurs with Perl scripts.

You hit on a big advantage with text files - they can be checked visually for correctness, and can be editted with ordinary text editors. Binary files require a custom dumper/editor to be written.

One reason I don't use .doc files is because I need a specific version of the word processor installed to read them. 20 years from now, who will have that? (Yes, I have 20 year old files I still use.) With ascii text format, I'm covered.