Jump to page: 1 2 3
Thread overview
Undefined behaviours in D and C
Apr 14, 2010
bearophile
Apr 15, 2010
bearophile
Apr 15, 2010
bearophile
Apr 15, 2010
BCS
Apr 17, 2010
Walter Bright
Apr 17, 2010
BCS
Apr 17, 2010
Walter Bright
Apr 17, 2010
Michel Fortin
Apr 18, 2010
Walter Bright
Apr 19, 2010
Pelle
Apr 19, 2010
Fawzi Mohamed
Apr 19, 2010
bearophile
Apr 19, 2010
bearophile
Apr 19, 2010
Fawzi Mohamed
Apr 19, 2010
Pelle
Apr 18, 2010
BCS
Apr 16, 2010
Walter Bright
Apr 16, 2010
bearophile
Apr 17, 2010
Walter Bright
Apr 18, 2010
bearophile
Apr 19, 2010
Walter Bright
Apr 19, 2010
bearophile
Apr 19, 2010
Don
Apr 20, 2010
bearophile
Apr 19, 2010
bearophile
Apr 15, 2010
Jesse Phillips
April 14, 2010
This recent blog post says nothing new for people that know C, it contains just few notes about some undefined C behaviours, but it's a starting point for what I want to say in this post:

http://james-iry.blogspot.com/2010/04/c-is-not-assembly.html

Undefined behaviours help adapt the language to different CPUs, but today PC CPUs are more similar to each other compared to the CPUs used when C was defined (because in an evolutionary tree most diversity is located near the root, in space or time); and Java/C# shows that with a very good JIT compiler you can have an efficient enough C-family language even if you remove many/most undefined behaviours from it (a JIT compiler can be better than a static compiler in this).

D semantics is quite based on C, but of course there are no written formal language specs yet, as you can find for C. Undefined behaviours are a really good source of bugs in programs (to avoid some of them you can try to put warnings in your compiler/lint for each undefined behaviour of your language).

D already defines some behaviours that are left undefined in C, for example I think operations like 5%(-2) and 5/(-2) are defined in D, as well as shifts << >> when the number of bits shifted is larger than the number of bits of the value. And the removal from D of some other undefined C behaviours is planned in D, like the eval order of function arguments.

But I think some other undefined holes coming from C remain in D, for example regarding:
- Static casts between size_t/ptrdiff_t and pointers;
- Pointer aliasing;
- Read of an enum field different from the last field written;
- etc.

It can be positive to write down a complete list of such undefined C behaviours and decide if it's good to leave them undefined in D too, and where the answer is negative to define them. Here the C# language specs too can give some good suggestions.

D Bugzilla shows that there are few 'undefined behaviours' in some D constructs too, but starting from the C ones is good because there's already a lot of experience about using C to write programs.

Bye,
bearophile
April 15, 2010
bearophile wrote:
> This recent blog post says nothing new for people that know C, it contains just few notes about some undefined C behaviours, but it's a starting point for what I want to say in this post:
> 
> http://james-iry.blogspot.com/2010/04/c-is-not-assembly.html
> 
> Undefined behaviours help adapt the language to different CPUs, but today PC CPUs are more similar to each other compared to the CPUs used when C was defined (because in an evolutionary tree most diversity is located near the root, in space or time); and Java/C# shows that with a very good JIT compiler you can have an efficient enough C-family language even if you remove many/most undefined behaviours from it (a JIT compiler can be better than a static compiler in this).
> 
> D semantics is quite based on C, but of course there are no written formal language specs yet, as you can find for C. Undefined behaviours are a really good source of bugs in programs (to avoid some of them you can try to put warnings in your compiler/lint for each undefined behaviour of your language).

Some time ago, I believe Walter decided to let @safe mean "no undefined behaviour".  Hopefully, this will reduce the number of undefined-behaviour related bugs.  After all, most D code should be marked @safe.

Here it is:
http://www.digitalmars.com/d/archives/digitalmars/D/Safety_undefined_behavior_safe_trusted_100138.html

-Lars
April 15, 2010
Lars T. Kyllingstad:

Thank you for your answer & thread link.

Some time ago, I believe Walter decided to let @safe mean "no undefined behaviour".<

I find it hard to believe that safe modules can define for example the semantic of static casts between size_t and a pointer, while unsafe modules can leave it undefined as in C :-) To me this will lead to a mess even worse than the C situation.

So a better solution is to define such behaviours in both kinds of modules, or leave them undefined in both. I prefer the first possibility. And to make this happen a starting point is to list all things C standard leaves undefined.

Bye,
bearophile
April 15, 2010
bearophile wrote:
> Lars T. Kyllingstad:
> 
> Thank you for your answer & thread link.
> 
>> Some time ago, I believe Walter decided to let @safe mean "no undefined behaviour".
> 
> I find it hard to believe that safe modules can define for example the semantic of static casts between size_t and a pointer, while unsafe modules can leave it undefined as in C :-) To me this will lead to a mess even worse than the C situation.
> 
> So a better solution is to define such behaviours in both kinds of modules, or leave them undefined in both. I prefer the first possibility. And to make this happen a starting point is to list all things C standard leaves undefined.

The effect of @safe would be to forbid code that leads to undefined behaviour, not make it well-defined.

-Lars
April 15, 2010
Part of the reason D leaves undefined behavior is because you are breaking compiler guarantees. Such as:

    char[] s = ...;
    immutable(char)[] p = cast(immutable)s;     // undefined behavior

I think what would be more helpful is instead propose what undefined behavior should be defined as and push that. Walter doesn't like undefined behavior, so I'm sure either he doesn't know what it should be defined as or has a good reason to leave it.Part of the reason D leaves undefined behavior is because you are breaking compiler guarentees.
April 15, 2010
On Thu, 15 Apr 2010 10:24:07 -0400, Jesse Phillips <jessekphillips+D@gmail.com> wrote:

> Part of the reason D leaves undefined behavior is because you are
> breaking compiler guarantees. Such as:
>
>     char[] s = ...;
>     immutable(char)[] p = cast(immutable)s;     // undefined behavior

This is not undefined behavior.  Continuing to use s would be.

I just wanted to make that clear.  Except for strings, there is currently no way to generate immutable data except via casting.  Don't use idup except on pure value types, that is currently unsafe, see http://d.puremagic.com/issues/show_bug.cgi?id=3550

-Steve
April 15, 2010
Lars T. Kyllingstad:
> The effect of @safe would be to forbid code that leads to undefined behaviour, not make it well-defined.

Right, but that's not the solution I was looking for, and it's not going to solve the problems inherited from C. Because if people that use D want to use unsafe code too, otherwise they use C#/Java. Having safe modules in D is a good idea, but safe modules can't be a replacement for efforts to make safer the low level code too.

Bye,
bearophile
April 15, 2010
Hello bearophile,

> [...] people that
> use D want to use unsafe code too, otherwise they use C#/Java.

Wrong! >90% of the time, when I want to use D over some other language, it is because of features other than D's unsafe stuff. And regarding C#/Java, I have never wanted to use them for any language related reasons. The only thing I prefer C# for over D is the .NET libs and the tool sets (I've never liked Java, but that's a personal preference thing).

> Bye,
> bearophile
-- 
... <IXOYE><



April 16, 2010
bearophile wrote:
> I find it hard to believe that safe modules can define for example
> the semantic of static casts between size_t and a pointer, while
> unsafe modules can leave it undefined as in C :-) To me this will
> lead to a mess even worse than the C situation.

You won't be able to cast pointers from integral types in safe functions.
April 16, 2010
Walter Bright:
> You won't be able to cast pointers from integral types in safe functions.

That doesn't solve the problem, because I will surely want to use unsafe code in D, and unsafe modules will keep having the same undefined-derived bugs inherited from C. What I was asking for in this thread is to fix some of the C holes, not to just forbid the things I was looking for in D in the first place. If I use D instead of for example Python is because D has unions and pointers, that allow me to create the tight data structures that have a good performance. I am not interested in using D just as a Java.

This can be an irreducible difference between my ideal language and D. Maybe my purpose is  hopeless, who knows. My ideal system language is like a C that helps me avoid a large percentage of possible bugs. A language that the programmer can predict what it will do, with lower level features. Maybe someday I'll try to create this language :-)

Bye,
bearophile
« First   ‹ Prev
1 2 3