June 22, 2016
On Wednesday, 22 June 2016 at 07:23:54 UTC, Dan Olson wrote:
> Testsuite passes with LLVM 3.8.0 now.

Thx Dan, great stuff.

June 22, 2016
On Tuesday, 21 June 2016 at 19:41:59 UTC, Dan Olson wrote:
> On Tuesday, 21 June 2016 at 07:00:35 UTC, Joakim wrote:
>> On Tuesday, 21 June 2016 at 05:51:56 UTC, Joakim wrote:
>>> On Monday, 20 June 2016 at 20:15:52 UTC, Rainer Schuetze wrote:
>>>>
>>>>
>>>> On 19.06.2016 08:58, Joakim wrote:
>>>>> [...]
>>>>
>>>> If it fails in a specific optimization pass, it has already passed a number of verification passes. When I posted some bug reports, the LLVM people (who have been very helpful) considered it an LLVM bug if an invalid IR passes the verifier. So even if some assumptions are broken, I would suggest to report to the LLVM bug tracker.
>>>
>>> OK, will do.  I think the fact that the exact same ldc frontend linked with llvm 3.7.1 doesn't have this problem, only when linked against llvm 3.8.0, indicates this most likely isn't a problem on our end.
>>
>> Filed: https://llvm.org/bugs/show_bug.cgi?id=28224
>
> Hmmm.  I looked at failing code in std.conv.  I think it does a negative shift, which is undefined in C.  It is the opSlice() method.

I just tried the three-line reduced test case on linux/x64 with the official ldc 1.0.0 release linked against 3.8.0 and it asserts there too.  However, if I compile all of the std.conv unittests and run them all, they all pass.

It appears that you've found a clear off-by-one bug.  I didn't investigate our end because the tests were all passing otherwise- I now see that dmd doesn't assert even against the reduced test case- but I should have examined that code more closely.  It would have helped if llvm and/or the ddmd frontend had flagged that Phobos was shifting by a negative number, rather than passing it through silently for either correct or junk codegen.

I agree that the real fix for this bug should go in Phobos, but perhaps we can also improve the spots where this negative shift was most silently discarded so far, whether ddmd or llvm.
June 22, 2016
On Wednesday, 22 June 2016 at 08:54:22 UTC, Joakim wrote:
> On Tuesday, 21 June 2016 at 19:41:59 UTC, Dan Olson wrote:
>> On Tuesday, 21 June 2016 at 07:00:35 UTC, Joakim wrote:
>>> On Tuesday, 21 June 2016 at 05:51:56 UTC, Joakim wrote:
>>>> On Monday, 20 June 2016 at 20:15:52 UTC, Rainer Schuetze wrote:
>>>>>
>>>>>
>>>>> On 19.06.2016 08:58, Joakim wrote:
>>>>>> [...]
>>>>>
>>>>> If it fails in a specific optimization pass, it has already passed a number of verification passes. When I posted some bug reports, the LLVM people (who have been very helpful) considered it an LLVM bug if an invalid IR passes the verifier. So even if some assumptions are broken, I would suggest to report to the LLVM bug tracker.
>>>>
>>>> OK, will do.  I think the fact that the exact same ldc frontend linked with llvm 3.7.1 doesn't have this problem, only when linked against llvm 3.8.0, indicates this most likely isn't a problem on our end.
>>>
>>> Filed: https://llvm.org/bugs/show_bug.cgi?id=28224
>>
>> Hmmm.  I looked at failing code in std.conv.  I think it does a negative shift, which is undefined in C.  It is the opSlice() method.
>
> I just tried the three-line reduced test case on linux/x64 with the official ldc 1.0.0 release linked against 3.8.0 and it asserts there too.  However, if I compile all of the std.conv unittests and run them all, they all pass.
>
> It appears that you've found a clear off-by-one bug.  I didn't investigate our end because the tests were all passing otherwise- I now see that dmd doesn't assert even against the reduced test case- but I should have examined that code more closely.  It would have helped if llvm and/or the ddmd frontend had flagged that Phobos was shifting by a negative number, rather than passing it through silently for either correct or junk codegen.
>
> I agree that the real fix for this bug should go in Phobos, but perhaps we can also improve the spots where this negative shift was most silently discarded so far, whether ddmd or llvm.

I experimented a bit with this, using the latest dmd 2.071.0 release for linux/x64 and the following test that mimics what std.conv was doing wrong:

int check_shift(int x) { return 16 >>> ((2 - x) * 4);}
unittest
{
    assert(check_shift(3)  ==  16);
    //assert(( 16 >>> (2-3) * 4)  ==  16);
}

If the second assert isn't commented out, dmd always evaluates that expression at compile-time and gives this error:

shift.d(5): Error: shift by -4 is outside the range 0..31

If it's left commented out and the file is compiled with this command,

./2.071.0/linux/bin64/dmd -O -unittest -main shift.d

the resulting binary asserts at runtime on line 4, ie the first assert.  If I compile again with inlining,

./2.071.0/linux/bin64/dmd -O -inline -unittest -main shift.d

the test passes!

The codegen for dmd seems all over the place here.
June 22, 2016
Joakim <dlang@joakim.fea.st> writes:
> shift.d(5): Error: shift by -4 is outside the range 0..31

Agree, should be explored more to understand why that error message didn't appear in this case.
June 29, 2016
On Wednesday, 22 June 2016 at 16:06:13 UTC, Dan Olson wrote:
> Joakim <dlang@joakim.fea.st> writes:
>> shift.d(5): Error: shift by -4 is outside the range 0..31
>
> Agree, should be explored more to understand why that error message didn't appear in this case.

I looked into this and it appears that dmd doesn't do the static analysis necessary to catch such bugs nor insert the runtime checks that could bail out with an exact error:

https://issues.dlang.org/show_bug.cgi?id=7604
https://issues.dlang.org/show_bug.cgi?id=4835

Walter says he doesn't want runtime checks for performance reasons (comment 11 in the second linked bug), which is understandable.  I'm not sure we can do anything but be more careful about such bugs in our D code.

I've filed a dmd bug for the unrelated optimization issue I ran into above:

https://issues.dlang.org/show_bug.cgi?id=16217
June 29, 2016
On Wednesday, 29 June 2016 at 08:18:46 UTC, Joakim wrote:
> On Wednesday, 22 June 2016 at 16:06:13 UTC, Dan Olson wrote:
>> Joakim <dlang@joakim.fea.st> writes:
>>> shift.d(5): Error: shift by -4 is outside the range 0..31
>>
>> Agree, should be explored more to understand why that error message didn't appear in this case.
>
> I looked into this and it appears that dmd doesn't do the static analysis necessary to catch such bugs nor insert the runtime checks that could bail out with an exact error:
>
> https://issues.dlang.org/show_bug.cgi?id=7604
> https://issues.dlang.org/show_bug.cgi?id=4835
>
> Walter says he doesn't want runtime checks for performance reasons (comment 11 in the second linked bug), which is understandable.  I'm not sure we can do anything but be more careful about such bugs in our D code.

This is LDC, not DMD, so for this it doesn't matter what Walter says :-)
If you can implement a runtime check (default off) for undefined shifts, how is that bad? Clang's -fsanitize flags look great. http://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html

cheers,
  Johan

June 30, 2016
On Wednesday, 29 June 2016 at 15:09:33 UTC, Johan Engelen wrote:
> On Wednesday, 29 June 2016 at 08:18:46 UTC, Joakim wrote:
>> On Wednesday, 22 June 2016 at 16:06:13 UTC, Dan Olson wrote:
>>> Joakim <dlang@joakim.fea.st> writes:
>>>> shift.d(5): Error: shift by -4 is outside the range 0..31
>>>
>>> Agree, should be explored more to understand why that error message didn't appear in this case.
>>
>> I looked into this and it appears that dmd doesn't do the static analysis necessary to catch such bugs nor insert the runtime checks that could bail out with an exact error:
>>
>> https://issues.dlang.org/show_bug.cgi?id=7604
>> https://issues.dlang.org/show_bug.cgi?id=4835
>>
>> Walter says he doesn't want runtime checks for performance reasons (comment 11 in the second linked bug), which is understandable.  I'm not sure we can do anything but be more careful about such bugs in our D code.
>
> This is LDC, not DMD, so for this it doesn't matter what Walter says :-)
> If you can implement a runtime check (default off) for undefined shifts, how is that bad? Clang's -fsanitize flags look great. http://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html

I'll look into what clang's doing there.

Speaking of shift errors, I took a look at why even ldc 1.0.0 fails for that undefined shift optimization bug I just reported for dmd and it does the same thing, simply removes the test block after inlining and optimization.

Here's the relevant IR before and after inlining with -O1 -enable-inlining on linux/x64:

*** IR Dump Before Function Integration/Inlining ***
; Function Attrs: uwtable
define void @_D5shift14__unittestL2_1FZv() #1 comdat {
  %1 = call i32 @_D5shift11check_shiftFiZi(i32 3) #1
  %2 = icmp eq i32 %1, 16
  br i1 %2, label %assertPassed, label %assertFailed

*** IR Dump After Function Integration/Inlining ***
; Function Attrs: uwtable
define void @_D5shift14__unittestL2_1FZv() #1 comdat {
%1 = icmp eq i32 undef, 16
br i1 %1, label %assertPassed, label %assertFailed

Eventually that gets optimized away to nothing, just like with dmd.

Would this be considered an llvm bug or the expected result from passing an undefined right shift to llvm?  It should at least warn about that undef after inlining, shouldn't it?
June 30, 2016
Just now I discovered that the optimization bug in my inlining branch is exactly this: a shift outside the defined range happens, and LLVM happily optimizes the program using the undefined behavior and terrible things happen.

A runtime check would have been a great help here, saving me hours of bug hunting.

The bug is in std.range.primitives, BitsSet(T)::popFront():
```
    void popFront()
    {
        assert(_value, "Cannot call popFront on empty range.");

        _value >>>= 1;
        uint n = countTrailingZeros(_value); // returns sizeof(_value) * 8 when _value==0
        _value >>>= n; // BOOM!
```
June 30, 2016
On Thursday, 30 June 2016 at 11:50:15 UTC, Joakim wrote:
>
> Speaking of shift errors, I took a look at why even ldc 1.0.0 fails for that undefined shift optimization bug I just reported for dmd and it does the same thing, simply removes the test block after inlining and optimization.
>
> Here's the relevant IR before and after inlining with -O1 -enable-inlining on linux/x64:
>
> *** IR Dump Before Function Integration/Inlining ***
> ; Function Attrs: uwtable
> define void @_D5shift14__unittestL2_1FZv() #1 comdat {
>   %1 = call i32 @_D5shift11check_shiftFiZi(i32 3) #1
>   %2 = icmp eq i32 %1, 16
>   br i1 %2, label %assertPassed, label %assertFailed
>
> *** IR Dump After Function Integration/Inlining ***
> ; Function Attrs: uwtable
> define void @_D5shift14__unittestL2_1FZv() #1 comdat {
> %1 = icmp eq i32 undef, 16
> br i1 %1, label %assertPassed, label %assertFailed
>
> Eventually that gets optimized away to nothing, just like with dmd.
>
> Would this be considered an llvm bug or the expected result from passing an undefined right shift to llvm?  It should at least warn about that undef after inlining, shouldn't it?

I am not a big LLVM expert, but I would not expect a warning at all. I think "undef" is used e.g. by frontends (e.g. LDC) to allow LLVM to optimize certain constructs. I'm guessing LLVM is thinking: "undef", great!, let's optimize the hell out of it!

June 30, 2016
On Thursday, 30 June 2016 at 12:33:21 UTC, Johan Engelen wrote:
> On Thursday, 30 June 2016 at 11:50:15 UTC, Joakim wrote:
>>
>> Speaking of shift errors, I took a look at why even ldc 1.0.0 fails for that undefined shift optimization bug I just reported for dmd and it does the same thing, simply removes the test block after inlining and optimization.
>>
>> Here's the relevant IR before and after inlining with -O1 -enable-inlining on linux/x64:
>>
>> *** IR Dump Before Function Integration/Inlining ***
>> ; Function Attrs: uwtable
>> define void @_D5shift14__unittestL2_1FZv() #1 comdat {
>>   %1 = call i32 @_D5shift11check_shiftFiZi(i32 3) #1
>>   %2 = icmp eq i32 %1, 16
>>   br i1 %2, label %assertPassed, label %assertFailed
>>
>> *** IR Dump After Function Integration/Inlining ***
>> ; Function Attrs: uwtable
>> define void @_D5shift14__unittestL2_1FZv() #1 comdat {
>> %1 = icmp eq i32 undef, 16
>> br i1 %1, label %assertPassed, label %assertFailed
>>
>> Eventually that gets optimized away to nothing, just like with dmd.
>>
>> Would this be considered an llvm bug or the expected result from passing an undefined right shift to llvm?  It should at least warn about that undef after inlining, shouldn't it?
>
> I am not a big LLVM expert, but I would not expect a warning at all. I think "undef" is used e.g. by frontends (e.g. LDC) to allow LLVM to optimize certain constructs. I'm guessing LLVM is thinking: "undef", great!, let's optimize the hell out of it!

I assumed that undef was some kind of poison value, but it looks like it's normally used:

http://www.playingwithpointers.com/problem-with-undef.html

I guess the issue then is whether the inlining pass should just be returning undef from evaluating check_shift(3), and moving on happily from there.  Since this is at compile-time, I don't think it should.

Does anyone know what llvm's stance is on this?  Are we supposed to be running sanitizers or something else to avoid these bugs?