GCC Undefined Behavior Sanitizer - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » GCC Undefined Behavior Sanitizer

Thread overview

GCC Undefined Behavior Sanitizer
Oct 16, 2014 bearophile
Oct 17, 2014 Paulo Pinto
Oct 17, 2014 Marco Leise
Oct 17, 2014 Ola Fosheim Grøstad
Oct 17, 2014 eles
Oct 17, 2014 Ola Fosheim Grøstad
Oct 19, 2014 eles
Oct 19, 2014 Ola Fosheim Grøstad
Oct 19, 2014 eles
Oct 17, 2014 ketmar
Oct 17, 2014 Ola Fosheim Grøstad
Oct 17, 2014 ketmar
Oct 17, 2014 Ola Fosheim Grøstad
Oct 17, 2014 ketmar
Oct 17, 2014 Ola Fosheim Grøstad
Oct 17, 2014 ketmar
Oct 18, 2014 monarch_dodra
Oct 18, 2014 Ola Fosheim Grøstad
Oct 19, 2014 monarch_dodra
Oct 19, 2014 Iain Buclaw
Oct 20, 2014 Walter Bright
Oct 20, 2014 Ola Fosheim Grøstad
Oct 19, 2014 Ola Fosheim Grøstad
Oct 19, 2014 monarch_dodra
Oct 19, 2014 Ola Fosheim Grøstad
Oct 18, 2014 Walter Bright
Oct 19, 2014 Ola Fosheim Grøstad
Oct 17, 2014 Iain Buclaw
Oct 17, 2014 Andrei Alexandrescu
Oct 17, 2014 ketmar
Oct 17, 2014 Iain Buclaw
Oct 17, 2014 eles
Oct 17, 2014 Andrei Alexandrescu
Oct 17, 2014 eles
Oct 18, 2014 Walter Bright

October 16, 2014

GCC Undefined Behavior Sanitizer

Posted by bearophile

bearophile

Just found with Reddit. C seems one step ahead of D with this:

http://developerblog.redhat.com/2014/10/16/gcc-undefined-behavior-sanitizer-ubsan/

Bye,
bearophile

October 17, 2014

Re: GCC Undefined Behavior Sanitizer

Posted by Paulo Pinto
in reply to bearophile

Paulo Pinto

Posted in reply to bearophile

On Thursday, 16 October 2014 at 21:00:18 UTC, bearophile wrote:
> Just found with Reddit. C seems one step ahead of D with this:
>
> http://developerblog.redhat.com/2014/10/16/gcc-undefined-behavior-sanitizer-ubsan/
>
> Bye,
> bearophile

The sad thing about this tools is that they are all about fixing the holes introduced by C into the wild.

So in the end when using C and C++, we need to have compiler + static analyzer + sanitizers, in a real life example of "Worse is Better", instead of fixing the languages.

At least, C++ is on the path of having less undefined behaviors, as the work group clearly saw the benefits don't outweigh the costs and is now the process of cleaning the standard in that regard.

As an outsider, I think D would be better by having only defined behaviors.

--
Paulo

October 17, 2014

Re: GCC Undefined Behavior Sanitizer

Posted by Marco Leise
in reply to Paulo Pinto

Marco Leise

Posted in reply to Paulo Pinto

Am Fri, 17 Oct 2014 08:38:11 +0000
schrieb "Paulo  Pinto" <pjmlp@progtools.org>:

> On Thursday, 16 October 2014 at 21:00:18 UTC, bearophile wrote:
> > Just found with Reddit. C seems one step ahead of D with this:
> >
> > http://developerblog.redhat.com/2014/10/16/gcc-undefined-behavior-sanitizer-ubsan/
> >
> > Bye,
> > bearophile
> 
> The sad thing about this tools is that they are all about fixing the holes introduced by C into the wild.
> 
> So in the end when using C and C++, we need to have compiler + static analyzer + sanitizers, in a real life example of "Worse is Better", instead of fixing the languages.
> 
> At least, C++ is on the path of having less undefined behaviors, as the work group clearly saw the benefits don't outweigh the costs and is now the process of cleaning the standard in that regard.
> 
> As an outsider, I think D would be better by having only defined behaviors.
> 
> --
> Paulo

I have a feeling back then the C designers weren't quite sure how the language would work out on current and future architectures, so they gave implementations some freedom here and there. Now that C/C++ is the primary language for any architecture, the table turned and the hardware designers build chips that behave "as expected" in some cases that C/C++ left undefined. That in turn allows C/C++ to become more restrictive. Or maybe I don't know what I'm talking about.

What behavior is undefined in D? I'm not kidding, I don't really know of any list of undefined behaviors. The only thing I remember is casting away immutable and modifying the content is undefined behavior. Similar to C/C++ I think this is to allow current and future compilers to perform as of yet unknown optimizations on immutable data structures.

Once such optimizations become well known in 10 to 20 years or so, D will define that behavior, too. Just like C/C++.

-- 
Marco

October 17, 2014

Re: GCC Undefined Behavior Sanitizer

Posted by Ola Fosheim Grøstad
in reply to Paulo Pinto

Ola Fosheim Grøstad

Posted in reply to Paulo Pinto

On Friday, 17 October 2014 at 08:38:12 UTC, Paulo  Pinto wrote:
> As an outsider, I think D would be better by having only defined behaviors.

Actually, this is the first thing I would change about D and make it less dependent on x86. I think a system level language should enable max optimization on basic types and rather inject integrity tests for debugging/testing or support debug-exceptions where available.

The second thing I would change is to make whole program analysis mandatory so that you can deduce and constrain value ranges. I don't believe the argument about separate compilation and commercial needs (and even then augmented object code is a distinct possibility). Even FFI is not a great argument, you should be able to specify what can happen in a foreign function.

It is just plain wrong to let integers wrap by default in an accessible result. That is not integer behaviour.  The correct thing to do is to inject overflow checks in debug mode and let overflow in results (that are accessed) be undefined. Otherwise you end up giving the compiler a difficult job:

uint y=x+1;
if (x < y){…}

Should be optimized to:

{…}

In D (and C++) you would get:

if (x < ((x+1)&0xffffffff)){…}

As a result you are encouraged to use signed int everywhere in C++, since unsigned ints use modulo-arithmetic. Unsigned ints in C++ are only meant for bit-field stuff. And the C++ designers admit that the C++ library is ill-specified because it uses unsigned ints for integers that cannot be negative, while that is now considered a bad practice…

In D it is even worse since you are forced to use a fixed size modulo even for int, so you cannot do 32 bit arithmetic in a 64 bit register without getting extra modulo operations.

So, "undefined behaviour" is not so bad, as long as you qualify it. You could for instance say that overflow on ints leads to an unknown value, but no other side effects. That was probably the original intent for C, but compiler writers have taken it a step further…

D has locked itself to Pentium-style x86 behaviour. Unfortunately it is very difficult to have everything be well-defined in a low level programming language. It isn't even obvious that a byte should be 8 bits, although the investments in creating UTF-8 resources on the Internet probably has locked us to it for the next 100 years… :)

October 17, 2014

Re: GCC Undefined Behavior Sanitizer

Posted by Iain Buclaw
in reply to bearophile

Iain Buclaw

Posted in reply to bearophile

On 16 October 2014 22:00, bearophile via Digitalmars-d <digitalmars-d@puremagic.com> wrote:
> Just found with Reddit. C seems one step ahead of D with this:
>
> http://developerblog.redhat.com/2014/10/16/gcc-undefined-behavior-sanitizer-ubsan/
>


*cough* GDC *cough*  :o)

October 17, 2014

Re: GCC Undefined Behavior Sanitizer

Posted by eles
in reply to Ola Fosheim Grøstad

eles

Posted in reply to Ola Fosheim Grøstad

On Friday, 17 October 2014 at 09:46:49 UTC, Ola Fosheim Grøstad wrote:
> On Friday, 17 October 2014 at 08:38:12 UTC, Paulo  Pinto wrote:

>
> The second thing I would change is to make whole program analysis mandatory so that you can deduce and constrain value ranges.

Nice idea, but how to persuade libraries to play that game?

October 17, 2014

Re: GCC Undefined Behavior Sanitizer

Posted by Ola Fosheim Grøstad
in reply to eles

Ola Fosheim Grøstad

Posted in reply to eles

On Friday, 17 October 2014 at 10:30:14 UTC, eles wrote:
> On Friday, 17 October 2014 at 09:46:49 UTC, Ola Fosheim Grøstad wrote:
>> The second thing I would change is to make whole program analysis mandatory so that you can deduce and constrain value ranges.
>
> Nice idea, but how to persuade libraries to play that game?

1. Provide a meta-language for writing propositions that describes what libraries do if they are foreign (pre/post conditions). Could be used for "asserts" too.

2. Provide a C compiler that compiles to the same internal representation as the new language, so you can run the same analysis on C code.

3. Remove int so that you have to specify the range and make typedefs local to the library

4. Provide the ability to specify additional constraints on library functions you use in your project or even probabilistic information.

Essentially it is a cultural thing, so the standard library has to be very well written.

Point 4 above could let you specify properties on the input to a sort function on the call site and let the compiler use that information for optimization. E.g. if one million values are evenly distributed over a range of 0..100000 then a quick sort could break it down without using pivots. If the range is 0..1000 then it could switch to an array of counters. If the input is 99% sorted then it could switch to some insertion-sort based scheme.

If you allow both absolute and probabilistic meta-information then the probabilistic information can be captured on a corpus of representative test-data. You could run the algorithm within the "measured probable range" and switch to a slower algorithm when you detect values outside it.

Lots of opportunities for improving "state-of-the-art".

October 17, 2014

Re: GCC Undefined Behavior Sanitizer

Posted by eles
in reply to bearophile

eles

Posted in reply to bearophile

On Thursday, 16 October 2014 at 21:00:18 UTC, bearophile wrote:
> Just found with Reddit. C seems one step ahead of D with this:
>
> http://developerblog.redhat.com/2014/10/16/gcc-undefined-behavior-sanitizer-ubsan/
>
> Bye,
> bearophile

"Not every software bug has as serious consequences as seen in the Ariane 5 rocket crash."

"if ubsan detects any problem, it outputs a “runtime error:” message, and in most cases continues executing the program."

The latter won't really solve the former...

October 17, 2014

Re: GCC Undefined Behavior Sanitizer

Posted by ketmar
in reply to Ola Fosheim Grøstad

ketmar

Posted in reply to Ola Fosheim Grøstad

Attachments:

signature.asc

On Fri, 17 Oct 2014 09:46:48 +0000
via Digitalmars-d <digitalmars-d@puremagic.com> wrote:

> It is just plain wrong to let integers wrap by default in an accessible result. That is not integer behaviour.
do you know any widespread hardware with doesn't work this way?

yet i know very widespread language which doesn't care. by a strange coincidence programs in this language tend to have endless problems with overflows.

> The correct thing to do is to inject overflow checks in debug mode and let overflow in results (that are accessed) be undefined.
the correct thing is to not turning perfectly defined operations to undefined ones.

> Otherwise you end up giving the compiler a difficult job:
> 
> uint y=x+1;
> if (x < y){…}
> 
> Should be optimized to:
> 
> {…}
no, it shouldn't. at least not until there will be something like 'if_carry_set'.

> In D (and C++) you would get:
> 
> if (x < ((x+1)&0xffffffff)){…}
perfect. nice and straightforward way to do overflow checks.

> In D it is even worse since you are forced to use a fixed size modulo even for int, so you cannot do 32 bit arithmetic in a 64 bit register without getting extra modulo operations.
why should i, as programmer, care? what i *really* care about is portable code. having size of base types not strictly defined is not helping at all.

> So, "undefined behaviour" is not so bad
yes, it's not bad, it's terrible. having "undefined behavior" in language is like saying "hey, we don't know what to do with this, and we don't want to think about it. so we'll turn our problem into your problem. have a nice day, sucker!"

> You could for instance say that overflow on ints leads to an unknown value, but no other side effects. That was probably the original intent for C, but compiler writers have taken it a step further…
how is this differs from the current interpretation?

> D has locked itself to Pentium-style x86 behaviour.
oops. 2's complement integer arithmetic is "pentium-style x86" now... i bet x86_64 does everything in ternary, right? oh, and how about pre-pentium era?

> Unfortunately it is very difficult to have everything be well-defined in a low level programming language. It isn't even obvious that a byte should be 8 bits
it is very easy. take current hardware, evaluate it's popularity, do what most popular hardware does. that's it. i, for myself, don't need a language for "future hardware", i need to work with what i have now. if we'll have some drastic changes in the future... well, we always can emulate old HW to work with old code, and rewrite that old code for new HW.

October 17, 2014

Re: GCC Undefined Behavior Sanitizer

Posted by Ola Fosheim Grøstad
in reply to ketmar

Ola Fosheim Grøstad

Posted in reply to ketmar

On Friday, 17 October 2014 at 13:44:24 UTC, ketmar via Digitalmars-d wrote:
> On Fri, 17 Oct 2014 09:46:48 +0000
> via Digitalmars-d <digitalmars-d@puremagic.com> wrote:
>
>> It is just plain wrong to let integers wrap by default in an accessible result. That is not integer behaviour.
> do you know any widespread hardware with doesn't work this way?

Yes, the carry flag is set if you add with carry. It means you SHOULD add to another hi-word with carry.  :P

You can also add with clamp with SSE, so you clamp to max/min. Too bad languages don't support it. I've always thought it be nice to have clamp operators, so you can say x(+)y and have the result clamped to the max/min values. Useful for stuff like DSP on integers.

>> if (x < ((x+1)&0xffffffff)){…}
> perfect. nice and straightforward way to do overflow checks.

Uh, so you want slow? If you want this you should also check the overflow flag so that you can catch overflows and throw an exception.

But then you have a high level language. All high level languages should do this.

>> In D it is even worse since you are forced to use a fixed size modulo even for int, so you cannot do 32 bit arithmetic in a 64 bit register without getting extra modulo operations.
> why should i, as programmer, care? what i *really* care about is
> portable code. having size of base types not strictly defined is not
> helping at all.

So you want to have lots of masking on your shiny new 64-bit register only CPU, because D is stuck on promoting to 32-bits by spec?

That's not portable, that is "portable".

>> So, "undefined behaviour" is not so bad
> yes, it's not bad, it's terrible. having "undefined behavior" in
> language is like saying "hey, we don't know what to do with this, and

Nah, it is saying: if your code is wrong then you will get wrong results unless you turn on runtime checks.

What D is saying is: nothing is wrong even if you get something you never wanted to express, because we specify all operations to be boundless (circular) so that nothing can be wrong by definition (but your code will still crash and burn).

That also means that you cannot turn on runtime checks, since it is by definition valid. No way for the compiler to figure out if it is intentional or not.

>> D has locked itself to Pentium-style x86 behaviour.
> oops. 2's complement integer arithmetic is "pentium-style x86" now... i
> bet x86_64 does everything in ternary, right? oh, and how about
> pre-pentium era?

The overhead for doing 64bit calculations is marginal. Locking yourself to 32bit is a bad idea.

> it is very easy. take current hardware, evaluate it's popularity, do
> what most popular hardware does. that's it. i, for myself, don't need
> a language for "future hardware", i need to work with what i have now.

My first computer had no division or multiply and 8 bit registers and was insanely popular. It was inconceivable that I would afford anything more advanced in the next decade. In the next 5 years I had two 16 bit computers, one with 16x RAM and GPU… and at a much lower price…

> if we'll have some drastic changes in the future... well, we always can
> emulate old HW to work with old code, and rewrite that old code for new
> HW.

The most work on a codebase is done after it ships.

Interesting things may happen on the hardware side in the next few years:

- You'll find info on the net where Intel has planned buffered transactional memory for around 2017.

- AMD is interested in CPU/GPU intergration/convergence

- Intel has a many core "co-processor"

- SIMD registers are getting wider and wider… 512 bits is a lot!

etc...

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation