September 28, 2009
BCS wrote:

[...]
> I wouldn't want to hire a programer that *habitually* (and unnecessarily) hacks past a feature designed to prevent bugs.

In the short time of an interview its not possible to test for habits (or necessarity) to hack past a feature designed to provent bugs.

Therefore the only measures of code quality are the number of bugs detected by the users---or the number of WTF's exclaimed during a code review.

Are you able to give an upper limit for the number of WTF's during a code review for which the coder is not fired?

-manfred

September 28, 2009
On 28/09/2009 12:05, Jeremie Pelletier wrote:
> Nick Sabalausky wrote:
>> "Jeremie Pelletier" <jeremiep@gmail.com> wrote in message
>> news:h9mmre$1i8j$1@digitalmars.com...
>>> Ary Borenszweig wrote:
>>>> Object is not-nullable, Object? (or whatever syntax you like) is
>>>> nullable. So that line is a compile-time error: you can't cast a
>>>> null to an Object (because Object *can't* be null).
>>>>
>>> union A {
>>> Object foo;
>>> Object? bar;
>>> }
>>>
>>> Give me a type system, and I will find backdoors :)
>>>
>>
>> Unions are nothing more than an alternate syntax for a reinterpret
>> cast. And it's an arguably worse syntax because unlike casts, uses of
>> it are indistinguishable from normal safe code, there's nothing to
>> grep for. As such, unions should never be considered any more safe
>> than cast(x)y. The following is just as dangerous as your example
>> above and doesn't even touch the issue of nullability/non-nulability:
>>
>> union A {
>> int foo;
>> float bar;
>> }
>>
>
> Yet it's the only way I know of to do bitwise logic on floating points
> in D to extract the exponent, sign and mantissa for example.
>
> And yes they are much, much more than a simple reinterpret cast, a
> simple set of casts will not set the size of the union to its largest
> member. Unions make for elegant types which can have many valid
> representations:
>
> union Vec3F {
> struct { float x, y, z; }
> float[3] v;
> }
>
> I just can't picture D without unions :)

here's a type-safe alternative
note: untested

struct Vec3F {
  float[3] v;
  alias v[0] x;
  alias v[1] y;
  alias v[2] z;
}

D provides alignment control for structs, why do we need to have a separate union construct if it is just a special case of struct alignment?

IMO the use cases for union are very rare and they all can be redesigned in a type safe manner.
when software was small and simple, hand tuning code with low level mechanisms (such as unions and even using assembly) made a lot of sense. Today's software is typically far more complex and is way to big to risk loosing safety features for marginal performance gains.

micro optimizations simply doesn't scale.
September 28, 2009
Yigal Chripun wrote:
> On 28/09/2009 12:05, Jeremie Pelletier wrote:
>> Nick Sabalausky wrote:
>>> "Jeremie Pelletier" <jeremiep@gmail.com> wrote in message
>>> news:h9mmre$1i8j$1@digitalmars.com...
>>>> Ary Borenszweig wrote:
>>>>> Object is not-nullable, Object? (or whatever syntax you like) is
>>>>> nullable. So that line is a compile-time error: you can't cast a
>>>>> null to an Object (because Object *can't* be null).
>>>>>
>>>> union A {
>>>> Object foo;
>>>> Object? bar;
>>>> }
>>>>
>>>> Give me a type system, and I will find backdoors :)
>>>>
>>>
>>> Unions are nothing more than an alternate syntax for a reinterpret
>>> cast. And it's an arguably worse syntax because unlike casts, uses of
>>> it are indistinguishable from normal safe code, there's nothing to
>>> grep for. As such, unions should never be considered any more safe
>>> than cast(x)y. The following is just as dangerous as your example
>>> above and doesn't even touch the issue of nullability/non-nulability:
>>>
>>> union A {
>>> int foo;
>>> float bar;
>>> }
>>>
>>
>> Yet it's the only way I know of to do bitwise logic on floating points
>> in D to extract the exponent, sign and mantissa for example.
>>
>> And yes they are much, much more than a simple reinterpret cast, a
>> simple set of casts will not set the size of the union to its largest
>> member. Unions make for elegant types which can have many valid
>> representations:
>>
>> union Vec3F {
>> struct { float x, y, z; }
>> float[3] v;
>> }
>>
>> I just can't picture D without unions :)
> 
> here's a type-safe alternative
> note: untested
> 
> struct Vec3F {
>   float[3] v;
>   alias v[0] x;
>   alias v[1] y;
>   alias v[2] z;
> }
> 
> D provides alignment control for structs, why do we need to have a separate union construct if it is just a special case of struct alignment?

These aliases won't compile, and that was only one out of many union uses.

> IMO the use cases for union are very rare and they all can be redesigned in a type safe manner.

Not always true.

> when software was small and simple, hand tuning code with low level mechanisms (such as unions and even using assembly) made a lot of sense. Today's software is typically far more complex and is way to big to risk loosing safety features for marginal performance gains.
> 
> micro optimizations simply doesn't scale.

Again, that's a lazy view on programming. High level constructs are useful to isolate small and simple algorithms which are implemented at low level.

These aren't just marginal performance gains, they can easily be up to 15-30% improvements, sometimes 50% and more. If this is too complex or the risk is too high for you then don't use a systems language :)
September 28, 2009
On 28-9-2009 18:09, Jeremie Pelletier wrote:
> Max Samukha wrote:
>> Lionello Lunesu wrote:
>>
>>> On 27-9-2009 9:20, Walter Bright wrote:
>>>> language_fan wrote:
>>>>> The idea behind non-nullable types and other contracts is to catch
>>>>> these errors on compile time. Sure, the code is a bit harder to write,
>>>>> but it is safe and never segfaults. The idea is to minimize the amount
>>>>> of runtime errors of all sorts. That's also how other features of
>>>>> statically typed languages work.
>>>>
>>>> I certainly agree that catching errors at compile time is preferable by
>>>> far. Where I disagree is the notion that non-nullable types achieve
>>>> this. I've argued extensively here that they hide errors, not fix them.
>>>>
>>>> Also, by "safe" I presume you mean "memory safe" which means free of
>>>> memory corruption. Null pointer exceptions are memory safe. A null
>>>> pointer could be caused by memory corruption, but it cannot *cause*
>>>> memory corruption.
>>> // t.d
>>> void main()
>>> {
>>> int* a;
>>> a[20000] = 2;
>>> }
>>>
>>> [C:\Users\Lionello] dmd -run t.d
>>>
>>> [C:\Users\Lionello]
>>>
>>> This code passes on Vista. Granted, needs a big enough offset and some
>>> luck, but indexing null will never be secure in the current flat memory
>>> models.
>>>
>>> L.
>>
>> That is a strong argument. If an object is big enough, modifying it
>> via a null reference may still cause memory corruption. Initializing
>> references to null does not guarantee memory safety.
>
> How is that corruption? These pointers were purposely set to 0x00000002,
> corruption I believe is when memory is modified without the programmer
> being aware of it. For example if the GC was to free memory that is
> still reachable, that would cause corruption.
>
> Corruption is near impossible to trace back, this case is trivial.

Uh? What pointer is being set to 0x00000002?

I'm indexing an array that happens to be uninitialized, which means: null. The code passes without problems, but modifies a 'random' address, with unpredictable consequences.

According to Walter a compile time check is not needed, because at run-time it is guaranteed that the program will abort when a null pointer is about to be used. But, that's not always the case, see my example.

L.
September 28, 2009
Jeremie Pelletier:

> Not always true.

I agree, I'm using D also because it offers unions. Sometimes they are useful.

But beside normal C unions that I don't want to remove from C, it can be also useful to have safe automatic tagged unions of Cyclone. They are safer and give just a little less performance compared to C unions. In D they may be denoted with "record" or "tunion" or just "safe union" to save keywords. They always contain an invisible tag (that can be read with a special built-in union method, like Unioname.tagcheck). Such "safe unions" may even become the only ones allowed in SafeD modules!

The following is from Cyclone docs:
<<
The C Standard says that if you read out any member of a union other than the last one written, the result is undefined.
To avoid this problem, Cyclone provides a built-in form of tagged union and always ensures that the tag is correlated with the last member written in the union. In particular, whenever a tagged union member is updated, the compiler inserts code to update the tag associated with the union. Whenever a member is read, the tag is consulted to ensure that the member was the last one written. If not, an exception is thrown.

Thus, the aforementioned example can be rewritten in Cyclone like this:

@tagged union U { int i; int *p; };
void pr(union U x) {
  if (tagcheck(x.i))
    printf("int(%d)",x.i);
  else
    printf("ptr(%d)",*x.p);
}

The @tagged qualifier indicates to the compiler that U should be a tagged union. The operation tagcheck(x.i) returns true when i was the last member written so it can be used to extract the value.
>>


> Again, that's a lazy view on programming. High level constructs are useful to isolate small and simple algorithms which are implemented at low level.

Software is inherently multi-scale. Probably in 90-95% of the code of a program micro-optimizations aren't that necessary because those operations are done only once in a while. But then it often happens that certain loops are done an enormous amount of times, so even small inefficiencies inside them lead to low performance. That's why profiling helps.

This can be seen by how HotSpot (and modern dynamic language JITters work): usually virtual calls like you can find in a D program are quick, they don't slow down code. Yet if a dynamic call prevents the compile to perform a critical inlining or such dynamic call is left in the middle of a critical code, it may lead to a slower program. That's why I have Java code go 10-30% faster than D code compiled with LDC, not because of the GC and memory allocations, but just because LDC isn't smart enough to inline certain virtual methods.

------------------------------------

More quotations from the Cyclone documentation:

>In contrast, Cyclone's analysis extends to struct, union members, and pointer contents to ensure everything is initialized before it is used. This has two benefits: First, we tend to catch more bugs this way, and second, programmers don't pay for the overhead of automatic initialization on top of their own initialization code.<


This is right on-topic:
>This requires little effort from the programmer, but the NULL checks slow down getc. To repair this, we have extended Cyclone with a new kind of pointer, called a “never-NULL” pointer, and indicated with ‘@’ instead of ‘*’. For example, in Cyclone you can declare
int getc(FILE @);
indicating that getc expects a non-NULL FILE pointer as its argument. This one-character change tells Cyclone that it does not need to insert NULL checks into the body of getc. If getc is called with a possibly-NULL pointer, Cyclone will insert a NULL check at the call :<



>Goto C's goto statements can lead to safety violations when they are used to jump into scopes. Here is a simple example:

int z;
{ int x = 0xBAD; goto L; }
{ int *y = &z;
L: *y = 3; // Possible segfault
}

Cyclone's static analysis detects this situation and signals an error. A goto that does not enter a scope is safe, and is allowed in Cyclone. We apply the same analysis to switch statements, which suffer from a similar vulnerability in C.<

Bye,
bearophile
September 28, 2009
Jari-Matti M.:

> It depends on the boolean representation. I see no reason why a built-in feature should be slower than some bitwise logic operation in user code. After all, the set of operations the language provides for the user is a subset of all possible operations the language implementation can do.

I agree. One of the best qualities of C++ is that it often allows the programmers to build abstractions with no or minimal cost. A good systems language is a language that allows you to define a built-in looking syntactic construct (for example a function) that for example allows you to access and use parts of a floating point number with the same efficiency of C/asm code.

Bye,
bearophile
September 28, 2009
bearophile wrote:
> Jeremie Pelletier:
>> Again, that's a lazy view on programming. High level constructs are useful to isolate small and simple algorithms which are implemented at low level.
> 
> Software is inherently multi-scale. Probably in 90-95% of the code of a program micro-optimizations aren't that necessary because those operations are done only once in a while. But then it often happens that certain loops are done an enormous amount of times, so even small inefficiencies inside them lead to low performance. That's why profiling helps.
> 
> This can be seen by how HotSpot (and modern dynamic language JITters work): usually virtual calls like you can find in a D program are quick, they don't slow down code. Yet if a dynamic call prevents the compile to perform a critical inlining or such dynamic call is left in the middle of a critical code, it may lead to a slower program. That's why I have Java code go 10-30% faster than D code compiled with LDC, not because of the GC and memory allocations, but just because LDC isn't smart enough to inline certain virtual methods.

Certainly agreed on virtual calls: on my machine, I timed a simple example as executing 65 interface calls per microsecond, 85 virtual calls per microsecond, and 210 non-member function calls per microsecond. So you should almost never worry about the cost of interface calls since they're so cheap, but they are 3.5 times slower than non-member functions.

In most cases, the body of a method is a lot more expensive than the method call, so even when optimizing, it won't often benefit you to use free functions rather than class or interface methods.
September 28, 2009
Christopher Wright wrote:
> bearophile wrote:
>> Jeremie Pelletier:
>>> Again, that's a lazy view on programming. High level constructs are useful to isolate small and simple algorithms which are implemented at low level.
>>
>> Software is inherently multi-scale. Probably in 90-95% of the code of a program micro-optimizations aren't that necessary because those operations are done only once in a while. But then it often happens that certain loops are done an enormous amount of times, so even small inefficiencies inside them lead to low performance. That's why profiling helps.
>>
>> This can be seen by how HotSpot (and modern dynamic language JITters work): usually virtual calls like you can find in a D program are quick, they don't slow down code. Yet if a dynamic call prevents the compile to perform a critical inlining or such dynamic call is left in the middle of a critical code, it may lead to a slower program. That's why I have Java code go 10-30% faster than D code compiled with LDC, not because of the GC and memory allocations, but just because LDC isn't smart enough to inline certain virtual methods.
> 
> Certainly agreed on virtual calls: on my machine, I timed a simple example as executing 65 interface calls per microsecond, 85 virtual calls per microsecond, and 210 non-member function calls per microsecond. So you should almost never worry about the cost of interface calls since they're so cheap, but they are 3.5 times slower than non-member functions.

Thanks for posting these interesting numbers. I seem to recall that interface dispach in D does a linear search in the interfaces list, so you may want to repeat your tests with a variable number of interfaces, and a variable position of the interface being used.

Andrei

September 28, 2009
Rainer Deyke Wrote:

> You could argue that assigned a 'B' to a variable that is declared to hold an 'A' is already a memory safety violation.

Yeah, it was brought to my attention that "type safety" by a friend could be another form. bearophile also brings up a good example.

>If so, then the exact argument also applies to assigning 'null' to the same variable.

I think that is what Walter is getting at, you're not dealing with memory that is correct, when this happens the program should halt and be dealt with from outside the program.
September 28, 2009
language_fan Wrote:

> > Now if you really want to throw some sticks into the spokes, you would say that if the program crashes due to a null pointer, it is still likely that the programmer will just initialize/set the value to a "default" that still isn't valid just to get the program to continue to run.
> 
> Why should it crash in the first place? I hate crashes. You liek them? I can prove by structural induction that you do not like them when you can avoid crashes with static checking.

No one likes programs that crash, doesn't that mean it is an incorrect behavior though?

> Have you ever used functional languages? When you develop in Haskell or SML, how often you feel there is a good change something will be initialized to the wrong value? Can you show some statistics that show how unsafe this practice is?

So isn't that the question? Does/can "default" (by human or machine) initialization create an incorrect state? If it does, do we continue to work as if nothing was wrong or crash? I don't know how often the initialization would be incorrect, but I don't think Walter is concerned with it's frequency, but that it is possible.