Jump to page: 1 2
Thread overview
Void-safety (and related things)
Aug 11, 2009
bearophile
Aug 11, 2009
Jason House
Aug 11, 2009
Ary Borenszweig
Aug 11, 2009
bearophile
Aug 11, 2009
Ary Borenszweig
Aug 21, 2009
Joel C. Salomon
Aug 21, 2009
bearophile
OT: Plan 9's C (Was re: Void-safety (and related things))
Aug 21, 2009
Joel C. Salomon
August 11, 2009
Found on Lambda the Ultimate blog, Void-safety in Eiffel language, another attempt at solving this problem: http://docs.eiffel.com/sites/default/files/void-safe-eiffel.pdf


I think to solve this problem a language like D can use three different strategies at the same time. Three kinds of object references can be defined:
1) the default one (its syntax is the shorter one, they are defined using the like current ones) is the "non nullable object reference". Many objects in a program are like this. The type system assures the code to be correct, you don't need to test such references for null. As in C# the compiler keeps eyes open to avoid the usage of uninitialized references of such kind. (this is a situation where "good" is better than "perfect". C# seems to work well enough in its ability to spot uninitialized variables).
2) The second kind is the current one, "unsafe nullabile object reference", it's faster, its syntax is a bit longer, to be used only where max performance is necessary.
3) The third kind is the "safe nullabile object reference". You can define it like using the syntax "Foo? f;". It's a "fat" reference, so beside the pointer this reference contains an integer number that represents the class. If your program has 500 classes, you need 500 different values for it. On the other hand usually in a program a specific reference can't be of 500 different classes, so the maximum number can be decreased, and you can keep at runtime sono conversion tables that convert some subsets of such numbers into a full pointer to class info. Such tables are a bit slow to use (but they don't need too much memory), but the program uses them only when a reference (of the third kind) is null, so it's not a bit problem. On 64-bit systems such numeric tag can be put into the most significant bits of the pointer itself (so when such pointer isn't null you just need a test and a mask, the shift is required only in the uncommon case of null). This also means that the max number of possible class instances decreases, but not so much (you can have some conversion tables to reduce such such numeric tag to 2-5 bits in most programs). When the code uses a method of a null reference of such kind the program may call the correct method of a "default" instance of that class (or even a user-specified instance).

Do you like? :-)

Bye,
bearophile
August 11, 2009
I've recently convinced myself that nullability should be the exception instead of the norm. So much of the code I write in C#/D uses reference objects assuming they're non-null. Only in certain special cases do I handle null explicitly. The issue is that if any special case is missed/mishandled, it can spread to other code.

I'm also too lazy to write non-null contracts in D. They also have far less value since violations are not caught at compile time (or better yet, in my IDE as I write code).

It may be as simple as having the following 3 types:
T // non-nullable
T? // nullable, safe
T* //  nullable, unsafe

I'd also like to remove all default initialization in favor of use of uninitialized variable errors. Default initialization in D is cute, but it is not a solution for programmer oversight. Single-threaded code will reproducibly do the wrong thing, but may be harder to notice in the first place. The very fact that the signalling nan change has made it into D shows that people want this type of behavior!


bearophile Wrote:

> Found on Lambda the Ultimate blog, Void-safety in Eiffel language, another attempt at solving this problem: http://docs.eiffel.com/sites/default/files/void-safe-eiffel.pdf
> 
> 
> I think to solve this problem a language like D can use three different strategies at the same time. Three kinds of object references can be defined:
> 1) the default one (its syntax is the shorter one, they are defined using the like current ones) is the "non nullable object reference". Many objects in a program are like this. The type system assures the code to be correct, you don't need to test such references for null. As in C# the compiler keeps eyes open to avoid the usage of uninitialized references of such kind. (this is a situation where "good" is better than "perfect". C# seems to work well enough in its ability to spot uninitialized variables).
> 2) The second kind is the current one, "unsafe nullabile object reference", it's faster, its syntax is a bit longer, to be used only where max performance is necessary.
> 3) The third kind is the "safe nullabile object reference". You can define it like using the syntax "Foo? f;". It's a "fat" reference, so beside the pointer this reference contains an integer number that represents the class. If your program has 500 classes, you need 500 different values for it. On the other hand usually in a program a specific reference can't be of 500 different classes, so the maximum number can be decreased, and you can keep at runtime sono conversion tables that convert some subsets of such numbers into a full pointer to class info. Such tables are a bit slow to use (but they don't need too much memory), but the program uses them only when a reference (of the third kind) is null, so it's not a bit problem. On 64-bit systems such numeric tag can be put into the most significant bits of the pointer itself (so when such pointer isn't null you just need a test and a mask, the shift is required only in the uncommon case of null). This also means that the max number of possible class instances decreases, but not so much (you can have some conversion tables to reduce such such numeric tag to 2-5 bits in most programs). When the code uses a method of a null reference of such kind the program may call the correct method of a "default" instance of that class (or even a user-specified instance).
> 
> Do you like? :-)
> 
> Bye,
> bearophile

August 11, 2009
Jason House wrote:
> I've recently convinced myself that nullability should be the exception instead of the norm. So much of the code I write in C#/D uses reference objects assuming they're non-null. Only in certain special cases do I handle null explicitly. The issue is that if any special case is missed/mishandled, it can spread to other code.
> 
> I'm also too lazy to write non-null contracts in D. They also have far less value since violations are not caught at compile time (or better yet, in my IDE as I write code).
> 
> It may be as simple as having the following 3 types:
> T // non-nullable
> T? // nullable, safe
> T* //  nullable, unsafe
> 
> I'd also like to remove all default initialization in favor of use of uninitialized variable errors. Default initialization in D is cute, but it is not a solution for programmer oversight. Single-threaded code will reproducibly do the wrong thing, but may be harder to notice in the first place. The very fact that the signalling nan change has made it into D shows that people want this type of behavior!

Yes. Default initialization is really week against uninitialized variables errors. You notice the errors of the first one at runtime, and the errors of the second one at compile-time.

But I don't see that changing anytime soon... (I think it's because "it gets hard").
August 11, 2009
Ary Borenszweig:
>(I think it's because "it gets hard").<

You can't ask a single person to be able to do everything. Are you able to implement that thing? Probably I am not able. If someone here is able and willing to do it then I suggest such person to ask Walter permission to implement it.

Bye,
bearophile
August 11, 2009
bearophile wrote:

> You can't ask a single person to be able to do everything. Are you able to implement that thing? Probably I am not able. If someone here is able and willing to do it then I suggest such person to ask Walter permission to implement it.

I doubt it's the direction D wants to go. Because proving correctness at compile-time requires the holy grail, and testing correctness at runtime requires extra space for each variable and extra time for each access.

-- 
Michiel Helvensteijn

August 11, 2009
Michiel Helvensteijn wrote:
> bearophile wrote:
> 
>> You can't ask a single person to be able to do everything. Are you able to
>> implement that thing? Probably I am not able. If someone here is able and
>> willing to do it then I suggest such person to ask Walter permission to
>> implement it.
> 
> I doubt it's the direction D wants to go. Because proving correctness at
> compile-time requires the holy grail, and testing correctness at runtime
> requires extra space for each variable and extra time for each access.

What do you mean by "holy grail"?
August 11, 2009
Ary Borenszweig wrote:

>> I doubt it's the direction D wants to go. Because proving correctness at compile-time requires the holy grail, and testing correctness at runtime requires extra space for each variable and extra time for each access.
> 
> What do you mean by "holy grail"?

You missed that discussion, did you? Basically, if you want to know at compile-time whether a variable is initialized, there are several possibilities:

* Be overly conservative: Make sure every possible computational path has an assignment to the variable, otherwise give an error. This would throw out the baby with the bathwater. Many valid programs would cause an error.

* Actually analyze the control flow: Make sure that exactly all reachable states have the variable initialized, otherwise give an error. Dubbed "holy grail", because this sort of analysis is still some time off, and would allow some very cool correctness verification.

-- 
Michiel Helvensteijn

August 21, 2009
Michiel Helvensteijn wrote:
>>> I doubt it's the direction D wants to go. Because proving correctness at compile-time requires the holy grail, and testing correctness at runtime requires extra space for each variable and extra time for each access.
> 
> Basically, if you want to know at compile-time whether a variable is initialized, there are several possibilities:
> 
> * Be overly conservative: Make sure every possible computational path has an assignment to the variable, otherwise give an error. This would throw out the baby with the bathwater. Many valid programs would cause an error.
> 
> * Actually analyze the control flow: Make sure that exactly all reachable states have the variable initialized, otherwise give an error. Dubbed "holy grail", because this sort of analysis is still some time off, and would allow some very cool correctness verification.

Third (stop-gap) option:
• Be conservative, but trust the programmer:  Allow some sort of pragma
to tell the compiler that the programmer has done the flow analysis and
the variable really is set (or non-null, or…).  It will be an unchecked
error to lie to the compiler--until the holy grail is implemented, when
it will become a checked error.

This is a feature of the Plan 9 C compilers (cf. “The compile-time environment” in <http://plan9.bell-labs.com/sys/doc/comp.html>).

“If you lie to the compiler, it will get its revenge.” —Henry Spencer

—Joel Salomon
August 21, 2009
Joel C. Salomon:

>http://plan9.bell-labs.com/sys/doc/comp.html<

Thank you for that link.
I can see some interesting things in that very C-like language:

>The #if directive was omitted because it greatly complicates the preprocessor, is never necessary, and is usually abused. Conditional compilation in general makes code hard to understand; the Plan 9 source uses it sparingly. Also, because the compilers remove dead code, regular if statements with constant conditions are more readable equivalents to many #ifs.<

Can the "static if" be removed from D then?

------------------

Variables inside functions can have any order, are D compilers too doing this?

>Unlike its counterpart on other systems, the Plan 9 loader rearranges data to optimize access. This means the order of variables in the loaded program is unrelated to its order in the source. Most programs don’t care, but some assume that, for example, the variables declared by
int a;
int b;
will appear at adjacent addresses in memory. On Plan 9, they won’t.<


------------------

Plan 9 uses this strategy to solve endianess-induced troubles in integer I/O:

>Plan 9 is a heterogeneous environment, so programs must expect that external files will be written by programs on machines of different architectures. The compilers, for instance, must handle without confusion object files written by other machines. The traditional approach to this problem is to pepper the source with #ifdefs to turn byte-swapping on and off. Plan 9 takes a different approach: of the handful of machine-dependent #ifdefs in all the source, almost all are deep in the libraries. Instead programs read and write files in a defined format, either (for low volume applications) as formatted text, or (for high volume applications) as binary in a known byte order. If the external data were written with the most significant byte first, the following code reads a 4-byte integer correctly regardless of the architecture of the executing machine (assuming an unsigned long holds 4 bytes):

ulong getlong(void) {
    ulong l;
    l = (getchar()&0xFF)<<24;
    l |= (getchar()&0xFF)<<16;
    l |= (getchar()&0xFF)<<8;
    l |= (getchar()&0xFF)<<0;
    return l;
}

Note that this code does not ‘swap’ the bytes; instead it just reads them in the correct order. Variations of this code will handle any binary format and also avoid problems involving how structures are padded, how words are aligned, and other impediments to portability. Be aware, though, that extra care is needed to handle floating point data.<

------------------

I don't fully understand this:

>the declaration
extern register reg;
(this appearance of the register keyword is not ignored) allocates a global register to hold the variable reg. External registers must be used carefully: they need to be declared in all source files and libraries in the program to guarantee the register is not allocated temporarily for other purposes. Especially on machines with few registers, such as the i386, it is easy to link accidentally with code that has already usurped the global registers and there is no diagnostic when this happens. Used wisely, though, external registers are powerful. The Plan 9 operating system uses them to access per-process and per-machine data structures on a multiprocessor. The storage class they provide is hard to create in other ways.<

Bye,
bearophile
August 21, 2009
bearophile wrote, re. <http://plan9.bell-labs.com/sys/doc/comp.html>:
> I can see some interesting things in that very C-like language:
> 
>> The #if directive was omitted because it greatly complicates the preprocessor, is never necessary, and is usually abused. Conditional compilation in general makes code hard to understand; the Plan 9 source uses it sparingly. Also, because the compilers remove dead code, regular if statements with constant conditions are more readable equivalents to many #ifs.
> 
> Can the "static if" be removed from D then?

D uses "static if" for things other than versioning.  But this attitude is relevant when considering “enhancements” to D’s version(foo).

> I don't fully understand this:
> 
>> the declaration
>>     extern register reg;
>> (this appearance of the register keyword is not ignored) allocates a global register to hold the variable reg. External registers must be used carefully: they need to be declared in all source files and libraries in the program to guarantee the register is not allocated temporarily for other purposes. Especially on machines with few registers, such as the i386, it is easy to link accidentally with code that has already usurped the global registers and there is no diagnostic when this happens. Used wisely, though, external registers are powerful. The Plan 9 operating system uses them to access per-process and per-machine data structures on a multiprocessor. The storage class they provide is hard to create in other ways.

Generally, the Plan 9 C compilers ignore the "register" keyword, preferring to handle this sort of optimization themselves.  The "extern register" declaration is not for optimization, but to allocate a register as a global variable.  This register will never be used by the compiler as a temporary, or to pass arguments, or whatever compilers use registers for; it has been completely given over for the programmer’s use.  Apparently, this was helpful in writing the Plan 9 kernel.

—Joel Salomon
« First   ‹ Prev
1 2