View mode: basic / threaded / horizontal-split · Log in · Help
April 19, 2010
Re: Undefined behaviours in D and C
bearophile wrote:
> Walter Bright:
> 
> Sorry for the delay, I was away.
> In this post I try to write in a quite explicit way.
> 
> 
>> I don't see any way to make conversions between pointers and ints implementation defined,<
> 
> I see. Thank you for the explanation, I'm often ignorant enough.
> 
> 
> In my original post I was talking about all places where C standard leaves things undefined. I'm not a C language lawyer, so I don't know all the things the C standard leaves undefined, but I know there are other undefined things in C beside the pointer <-> int conversion. That's why I was saying that it can be quite positive to write down a list of such things. So even if there is no hope to fix this pointer <-> int hole, maybe there are other C holes that can be fixed. I will not be able to write down a complete list, but I think having a complete list can be a good starting point.
> 
> In my original post I have listed two more things that I think the C standard leaves undefined:
> - Pointer aliasing;
> - Read of an enum field different from the last field written;
> 
> The first of them is fixed in C99 with the 'restrict' keyword. I guess the D compiler has to assume all pointers can be an alias to each other (but I don't remember if the D docs say this explicitely somewhere) because I think D prefers to not give keywords that the compiler itself can't then test and make sure they are correct.
> 
> The second of them is relative to code like:
> 
> enum SI { short s; int i; }
> void main() {
>   SI e;
>   e.i = 1_000_000;
>   int foo = e.s;
> }


Don't you mean 'union' here, not 'enum'?

-Lars
April 19, 2010
Re: Undefined behaviours in D and C
Lars T. Kyllingstad:
> > enum SI { short s; int i; }
> > void main() {
> >   SI e;
> >   e.i = 1_000_000;
> >   int foo = e.s;
> > }
> 
> Don't you mean 'union' here, not 'enum'?

Yes, sorry -.- In Python newsgroups most code snippets shown by people are being run before post. It's an habit that I must keep in D newsgroups too.
This whole thread is mostly showing how smart I am not.

Bye and thank you,
bearophile
April 19, 2010
Re: Undefined behaviours in D and C
On 19-apr-10, at 08:23, Pelle wrote:

> On 04/18/2010 02:46 PM, Walter Bright wrote:
>> Michel Fortin wrote:
>>> So you shouldn't be able to cast a value to a pointer. The reverse,
>>> casting a pointer to a value, makes sense in my opinion: you may  
>>> want
>>> to print the pointer value in a debug output of some sort. There's
>>> nothing unsafe with that so it should be allowed.
>>
>> These are allowed in safe functions.
>
> Just checking, this is allowed:
>
> @safe void crash_maybe() {
>    int* p = cast(int*)uniform(size_t.min, size_t.max);
>    *p = 14;
> }
>
> right?

no the opposite is safe (pointer -> size_t) but there is no way size_t- 
>pointer can be safe...
April 19, 2010
Re: Undefined behaviours in D and C
Fawzi Mohamed:
> no the opposite is safe (pointer -> size_t) but there is no way size_t- 
> pointer can be safe...

In the stdint.h of C99 there is (optionally) uintptr_t that's is an unsigned int that is large enough to contain a pointer (there is a intptr_t too, signed). In C99 you use that to convert a pointer to an integral.

I don't know if D specs assert that D size_t is wide enough to represent a pointer.

Bye,
bearophile
April 19, 2010
Re: Undefined behaviours in D and C
> In the stdint.h of C99 there is (optionally) uintptr_t that's is an unsigned int that is large enough to contain a pointer (there is a intptr_t too, signed). In C99 you use that to convert a pointer to an integral.
> 
> I don't know if D specs assert that D size_t is wide enough to represent a pointer.

There's uintptr_t in D std lib too, I have to start using it:
http://www.digitalmars.com/d/2.0/phobos/std_stdint.html

Bye,
bearophile
April 19, 2010
Re: Undefined behaviours in D and C
On 19-apr-10, at 12:32, bearophile wrote:

>> In the stdint.h of C99 there is (optionally) uintptr_t that's is an  
>> unsigned int that is large enough to contain a pointer (there is a  
>> intptr_t too, signed). In C99 you use that to convert a pointer to  
>> an integral.
>>
>> I don't know if D specs assert that D size_t is wide enough to  
>> represent a pointer.
>
> There's uintptr_t in D std lib too, I have to start using it:
> http://www.digitalmars.com/d/2.0/phobos/std_stdint.html

that is for C compatibility, D has always defined size_t and ptrdiff_t  
(without needing to import anything) exactly like that.
April 19, 2010
Re: Undefined behaviours in D and C
On 04/19/2010 11:47 AM, Fawzi Mohamed wrote:
> no the opposite is safe (pointer -> size_t) but there is no way
> size_t->pointer can be safe...

Michel Fortin wrote:
> So you shouldn't be able to  *cast a value to a pointer*.  The reverse,
> casting a pointer to a value, makes sense in my opinion:

On 04/18/2010 02:46 PM, Walter Bright wrote:
>  *These*  are allowed in safe functions.

(emphasis mine)

I was trying to visualize a point.
April 19, 2010
Re: Undefined behaviours in D and C
Walter Bright:

>D doesn't have this problem because D doesn't have the restrict qualifier.<

So the D2 specs have to explicitly state that all D pointers can be an alias of each other (and this will make D code slower than Fortran77 code).


>If restrict is used incorrectly, however, undefined behavior can result.<

And one of the few ways out of this, while keeping the language safe, is the ownership/lent/etc extensions to the type system, that are cute, but they are not so easy to learn to use and can become a little burden for the D programmer.

Another solution is the restrict keyword as in C. In a D program the restrict keyword can be useful only in few numeric kernels, often less than 30 lines of code, that perform tons of computations in few loops. In such loops the knowledge of distinct pointers can be significantly useful to improve the code. In all other parts of the program such keyword is useless or not essential (such loops can even enjoy a harder form or compilation, almost a supercompilation. The programmer can even give an attribute like @hot to this loop/function. GCC too has a 'hot' function attribute, but I think in GCC it's not very useful).

I don't know what to think about this. Being D a system language, the language is expected to offer unsafe features too, as this one. So maybe offering restrict, to be used in very limited situations, can be acceptable in D too.

In many situations the numerical kernels work over arrays, and D arrays have both a pointer and a length, so it's easy to test if a pointer is inside such interval and if two interval are fully distinct. Such tests can be done in nonrelease mode to give a little more safety to the restrict keyword. Some of such tests can even be kept in release mode if they are outside the heavy loops.

Maybe it can be invented something like restrict but more limited, that works on D arrays only. An extension of the D type system that's useful for numerical kernels that work on arrays. Something like:

@enforce_restrict(array1, array2, ...) {
   // numerical kernel that uses the arrays
}

Inside that enforce the D type system knows they are distinct, it's like a restrict applied to their pointers. I don't know if this can work in practical situations. Maybe there's an acceptable solution to this problem of D2.

---------------

I think in C you can't reliably cast a pointer from a type to a different type. I think because the C compiler (and D compiler, I presume) can optimize away some things, making this unsafe/undefined.

This conversion is sometimes done using an union, that's a bit safer than the reinterpret cast:

union Foo2Bar {
  int* iptr;
  double* dptr;
}

But I think the C standard says that from a union you can't read a field different from the last field you have written, so that too is unsafe:

import std.stdio;
union U { int i; float f; }
void main() {
 U u;
 u.i = 10;
 writeln(u.i); // defined
 U u;
 u.f = 10;
 writeln(u.f); // defined  
 writeln(u.i); // undefined
}


I think this not because of endianeess problems, but because the compiler can keep values in registers and optimize away the read/write inside the union. D language can state this is defined, making unions a safer way to statically convert ints to floats, or it can follow the C way to make code a little faster.

Strict aliasing means that two objects of different types cannot refer to the same location in memory.

See also the -fno-strict-aliasing GCC compiler switch, and related matters:
>>In C99, it is illegal to create an alias of a different type than the original. This is often refered to as the strict aliasing rule.<<
I don't know if D here follows C99 or not.
http://cellperformance.beyond3d.com/articles/2006/06/understanding-strict-aliasing.html

Bye and thank you,
bearophile
April 19, 2010
Re: Undefined behaviours in D and C
bearophile wrote:
> Walter Bright:
> 
>> D doesn't have this problem because D doesn't have the restrict qualifier.<
> 
> So the D2 specs have to explicitly state that all D pointers can be an alias of each other (and this will make D code slower than Fortran77 code).

Array operations address the same as issue as restrict, but are much 
easier for the compiler. (They don't completely overlap in 
functionality, but the most important cases are covered by both).
AFAIK 'restrict' hasn't been a terribly successful feature in the C world.
April 20, 2010
Re: Undefined behaviours in D and C
Don:
> (They don't completely overlap in 
> functionality, but the most important cases are covered by both).

I will need to use array ops more to if you are right.


> AFAIK 'restrict' hasn't been a terribly successful feature in the C world.

I agree. (And I am not sure compilers use it well).

Bye,
bearophile
Next ›   Last »
1 2 3
Top | Discussion index | About this forum | D home