February 03, 2022

On Thursday, 3 February 2022 at 21:23:10 UTC, Dukc wrote:

>

We cannot allow undefined behaviour in @safe code.

Why not, make it implementation defined, with the requirement that memory safety is upheld by compiled code.

No need to overthink this.

>

That means that any integer that would have undefined semantics for overflows could not be used at @safe.

It can be left to the compiler by the language standard, but still impose generic memory safety requirements on the compiler.

Anyway, I tested overflow with -O3, and it did not remove the "bounds check". So there is no reason to believe that the optimization passes cannot be tuned in such a way that the compiler cannot upheld memory safety.

February 03, 2022

On Thursday, 3 February 2022 at 21:36:19 UTC, Ola Fosheim Grøstad wrote:

>

"bounds check". So there is no reason to believe that the optimization passes cannot be tuned in such a way that the compiler cannot upheld memory safety.

Typo: what I tried to say that it is up to the compiler vendor to make sure that the optimization passes are tuned such that it upholds memory safety.

February 03, 2022

On Thursday, 3 February 2022 at 21:36:19 UTC, Ola Fosheim Grøstad wrote:

>

On Thursday, 3 February 2022 at 21:23:10 UTC, Dukc wrote:

>

We cannot allow undefined behaviour in @safe code.

Why not, make it implementation defined, with the requirement that memory safety is upheld by compiled code.

That is a different solution. Implementation defined != undefined.

With the implementation-defined solution, there is the issue that potentially any change may break memory safety. Some other functions memory safety may be depending on correct behaviour of @safe function that has an overflowing integer.

So you'd have to start defining arbitrary rules on what the compiler can and what it cannot do on overflow. Just saying "preserve memory safety" does not work, because it depends on situation what is necessary for memory safety.

Even without that issue, I would not be supportive. D is old and used enough that any changing of overflow semantics of D integers is too disruptive to be worth it.

February 03, 2022

On Thursday, 3 February 2022 at 22:12:10 UTC, Dukc wrote:

>

With the implementation-defined solution, there is the issue that potentially any change may break memory safety. Some other functions memory safety may be depending on correct behaviour of @safe function that has an overflowing integer.

You mean in @trusted code, but then you need to be more specific. If it actually was an overflow that same argument would can be made about a wrap-around. Maybe the @trusted code did not expect a negative value…

If there is an overflow in computing x, then it makes sense that the value of x is an arbitrary bit-pattern constrained to the bit-width. You can constrain it further like that if that turns out to be needed.

Of course, this will only be relevant in @safe code sections where you disable trapping of overflow.

February 03, 2022

On Thursday, 3 February 2022 at 22:12:10 UTC, Dukc wrote:

>

On Thursday, 3 February 2022 at 21:36:19 UTC, Ola Fosheim Grøstad wrote:

>

On Thursday, 3 February 2022 at 21:23:10 UTC, Dukc wrote:

>

We cannot allow undefined behaviour in @safe code.

Why not, make it implementation defined, with the requirement that memory safety is upheld by compiled code.

That is a different solution. Implementation defined != undefined.

"implementation defined" means that the vendor must document the semantics.

"undefined behaviour" means that the vendor isn't required to document the behaviour, but that does not mean that they are discouraged from doing so. This was introduced in the C language spec to account for situations where hardware has undefined behaviour. Competition between C++ compilers made them exploit this for the most hardcore optimization options.

February 03, 2022
On 2/3/2022 8:25 AM, Paul Backus wrote:
> And yet:
> 
>      int a, b, c;
>      a = b + c;
> 
> `b + c` may create a value that does not fit in an int, but instead of the rejecting the code, the compiler accepts it and allows the result to wrap around.

Yup.


> The inconsistency is the problem here. Having integer types behave differently depending on their width makes the language harder to learn,

It's not really that hard - it's about two or three sentences. As long as one understands 2s-complement arithmetic. If one doesn't understand 2s-complement, and assumes it works like 3rd grade arithmetic, I agree it can be baffling.

There's really no fix for that other than making the effort to understand 2s-complement. Some noble attempts:

Java: disallowed all unsigned types. Wound up having to add that back in as a hack.

Python: numbers can grow arbitrarily large without loss of precision. Makes your code slow, though.

Javascript: everything is a double precision floating point value! Makes for all kinds of other problems. If there's anything people understand less (a lot less) than 2s-complement, it's floating point.


> and forces generic code to add special cases for narrow integers, like this one in `std.math.abs`:
> 
>      static if (is(immutable Num == immutable short) || is(immutable Num == immutable byte))
>          return x >= 0 ? x : cast(Num) -int(x);
>      else
>          return x >= 0 ? x : -x;
> 
> (Source: https://github.com/dlang/phobos/blob/v2.098.1/std/math/algebraic.d#L56-L59)

That's because adding abs(short) and abs(byte) was a giant mistake. There's good reason these functions never appeared in C.

Trying to hide the reality of how computer integer arithmetic works, and how integral promotions work, is a prescription for endless frustration and inevitable failure.

If anybody has questions about how 2s complement arithmetic works, and how integral promotions work, I'll be happy to answer them.
February 03, 2022
On 2/3/2022 8:35 AM, Adam D Ruppe wrote:
> On Thursday, 3 February 2022 at 05:50:24 UTC, Walter Bright wrote:
>> VRP makes many implicit conversions to bytes safely possible.
> 
> It also *causes* bugs. When code gets refactored, and the types change, those forced casts may not be doing what is desired, and can do things like unexpectedly truncating integer values.

No, then the VRP will emit an error.
February 04, 2022

On Thursday, 3 February 2022 at 05:50:24 UTC, Walter Bright wrote:

>

On 2/2/2022 6:25 PM, Siarhei Siamashka wrote:

>

On Thursday, 3 February 2022 at 01:05:15 UTC, Walter Bright wrote:

>

I find it works well. For example,

    int i;
    byte b = i & 0xFF;

passes without complaint with VRP.

No, it's doesn't pass: Error: cannot implicitly convert expression i & 255 of type int to byte.

My mistake. b should have been declared as ubyte.

Regarding your original example with the byte type. Maybe the use of the following code can be encouraged as a good idiomatic overflow-safe way to do it in D language?

int i;
byte b = i.to!byte;

i = -129;
b = i.to!byte; // std.conv.ConvOverflowException

This is 2 characters shorter and IMHO nicer looking than byte b = cast(byte)i;. An overflow check is done at runtime to catch bugs, but good optimizing compilers are actually smart enough to eliminate it when the range of possible values of i is known at compile time. For example:

void foobar(byte[] a)
{
    foreach (i ; 0 .. a.length)
        a[i] = (i % 37).to!byte;
}

Gets compiled into:

$ gdc-12.0.1 -O3 -fno-weak-templates -c test.d && objdump -d test.o

0000000000000000 <_D4test6foobarFAgZv>:
   0:	48 85 ff             	test   %rdi,%rdi
   3:	74 37                	je     3c <_D4test6foobarFAgZv+0x3c>
   5:	49 b8 8b 7c d6 0d a6 	movabs $0xdd67c8a60dd67c8b,%r8
   c:	c8 67 dd
   f:	31 c9                	xor    %ecx,%ecx
  11:	0f 1f 80 00 00 00 00 	nopl   0x0(%rax)
  18:	48 89 c8             	mov    %rcx,%rax
  1b:	49 f7 e0             	mul    %r8
  1e:	48 c1 ea 05          	shr    $0x5,%rdx
  22:	48 8d 04 d2          	lea    (%rdx,%rdx,8),%rax
  26:	48 8d 14 82          	lea    (%rdx,%rax,4),%rdx
  2a:	48 89 c8             	mov    %rcx,%rax
  2d:	48 29 d0             	sub    %rdx,%rax
  30:	88 04 0e             	mov    %al,(%rsi,%rcx,1)
  33:	48 83 c1 01          	add    $0x1,%rcx
  37:	48 39 cf             	cmp    %rcx,%rdi
  3a:	75 dc                	jne    18 <_D4test6foobarFAgZv+0x18>
  3c:	c3                   	retq

Slow division is replaced by multiplication and shifts, conditional branches are only done to compare i with the array length. The .to!byte part doesn't have any overhead at all and bytes are just directly written to the destination array via mov %al,(%rsi,%rcx,1) instruction.

February 04, 2022

On Friday, 4 February 2022 at 04:28:37 UTC, Walter Bright wrote:

>

On 2/3/2022 8:25 AM, Paul Backus wrote:

>

The inconsistency is the problem here. Having integer types behave differently depending on their width makes the language harder to learn,

It's not really that hard - it's about two or three sentences. As long as one understands 2s-complement arithmetic. If one doesn't understand 2s-complement, and assumes it works like 3rd grade arithmetic, I agree it can be baffling.

I don't think this is limited to learning. I don't think programmers with decades of experience with C/C++ has a problem understanding 2s-complement, but it is still creating annoyances and friction.

Maybe it is time to acknowledge that most of the D user base use the language for high level programming I would do the following:

  1. make 64 bit signed integers with overflow checks the "default" type across the board

  2. provide a library type for range-constrained integers that use intrinsic "assume" directives to provide the compiler with information about constraints. This type would choose a storage type that the constrained integer fits in.

  3. add some clean syntax for disabling runtime checks where higher speed is required.

D could become competitive with that and ARC + local GC.

D should try to improve on higher level programming as well as the ability to transition from high level to system level in a metamorphosis like evolution process.

February 04, 2022

On Friday, 4 February 2022 at 09:19:38 UTC, Ola Fosheim Grøstad wrote:

>

On Friday, 4 February 2022 at 04:28:37 UTC, Walter Bright wrote:

>

On 2/3/2022 8:25 AM, Paul Backus wrote:

>

The inconsistency is the problem here. Having integer types behave differently depending on their width makes the language harder to learn,

...

I don't think this is limited to learning. I don't think programmers with decades of experience with C/C++ has a problem understanding 2s-complement, but it is still creating annoyances and friction.
...

Gosling experience at kind of proved otherwise,

"In programming language design, one of the standard problems is that the language grows so complex that nobody can understand it. One of the little experiments I tried was asking people about the rules for unsigned arithmetic in C. It turns out nobody understands how unsigned arithmetic in C works. There are a few obvious things that people understand, but many people don't understand it."

https://www.artima.com/articles/james-gosling-on-java-may-2001

Then again, maybe Sun lacked enough people with decades of C and C++ experience, and someone with the track record of Gosling across the computing industry does have any clue about what he was talking about.