Jump to page: 1 2
Thread overview
[dmd-internals] [D-Programming-Language/dmd] 48ed15: Remove import of std.random:rand from test22
Aug 16, 2011
Don Clugston
Aug 16, 2011
Jonathan M Davis
Aug 16, 2011
Don Clugston
Aug 16, 2011
Brad Roberts
Aug 22, 2011
Don Clugston
Aug 22, 2011
Walter Bright
Aug 23, 2011
Don Clugston
Aug 23, 2011
Don Clugston
Aug 23, 2011
Brad Roberts
Aug 23, 2011
Jason House
Aug 23, 2011
Brad Roberts
Aug 23, 2011
Walter Bright
August 15, 2011
  Branch: refs/heads/master
  Home:   https://github.com/D-Programming-Language/dmd

  Commit: 48ed15159bb36410ef91ffe6967a6fb7cd606d78
      https://github.com/D-Programming-Language/dmd/commit/48ed15159bb36410ef91ffe6967a6fb7cd606d78
  Author: Don Clugston <dclugston at googlemail.com>
  Date:   2011-08-15 (Mon, 15 Aug 2011)

  Changed paths:
    M test/runnable/test22.d

  Log Message:
  -----------
  Remove import of std.random:rand from test22

I checked this reduced test case on DMD 0.120, to make sure it accurately reproduced the bug.


  Commit: cfab198ee186f6e69c364aaf4206434220d83204
      https://github.com/D-Programming-Language/dmd/commit/cfab198ee186f6e69c364aaf4206434220d83204
  Author: Walter Bright <walter at walterbright.com>
  Date:   2011-08-15 (Mon, 15 Aug 2011)

  Changed paths:
    M test/runnable/test22.d

  Log Message:
  -----------
  Merge pull request #310 from donc/trivialstuff

Remove import of std.random:rand from test22


Compare: https://github.com/D-Programming-Language/dmd/compare/9c23e89...cfab198
August 16, 2011
And now it fails on all platforms with -O. Yikes.  It seems that the original bug was never actually fixed. It's an optimizer bug, too. We've been running with a defective test case for years.


On 15 August 2011 10:48,  <noreply at github.com> wrote:
> ?Branch: refs/heads/master
> ?Home: ? https://github.com/D-Programming-Language/dmd
>
> ?Commit: 48ed15159bb36410ef91ffe6967a6fb7cd606d78
> ? ? ?https://github.com/D-Programming-Language/dmd/commit/48ed15159bb36410ef91ffe6967a6fb7cd606d78
> ?Author: Don Clugston <dclugston at googlemail.com>
> ?Date: ? 2011-08-15 (Mon, 15 Aug 2011)
>
> ?Changed paths:
> ? ?M test/runnable/test22.d
>
> ?Log Message:
> ?-----------
> ?Remove import of std.random:rand from test22
>
> I checked this reduced test case on DMD 0.120, to make sure it accurately reproduced the bug.
>
>
> ?Commit: cfab198ee186f6e69c364aaf4206434220d83204
> ? ? ?https://github.com/D-Programming-Language/dmd/commit/cfab198ee186f6e69c364aaf4206434220d83204
> ?Author: Walter Bright <walter at walterbright.com>
> ?Date: ? 2011-08-15 (Mon, 15 Aug 2011)
>
> ?Changed paths:
> ? ?M test/runnable/test22.d
>
> ?Log Message:
> ?-----------
> ?Merge pull request #310 from donc/trivialstuff
>
> Remove import of std.random:rand from test22
>
>
> Compare: https://github.com/D-Programming-Language/dmd/compare/9c23e89...cfab198
> _______________________________________________
> dmd-internals mailing list
> dmd-internals at puremagic.com
> http://lists.puremagic.com/mailman/listinfo/dmd-internals
>
August 15, 2011
On Tuesday, August 16, 2011 04:16:10 Don Clugston wrote:
> And now it fails on all platforms with -O. Yikes.  It seems that the original bug was never actually fixed. It's an optimizer bug, too. We've been running with a defective test case for years.

Well, all the better that someone who actually has some clue what the tests are for is fixing them rather than us using my attempt at fixing them.

- Jonathan M Davis
August 16, 2011
I've added it to bugzilla. http://d.puremagic.com/issues/show_bug.cgi?id=6505

On 16 August 2011 04:20, Jonathan M Davis <jmdavisProg at gmx.com> wrote:
> On Tuesday, August 16, 2011 04:16:10 Don Clugston wrote:
>> And now it fails on all platforms with -O. Yikes. ?It seems that the original bug was never actually fixed. It's an optimizer bug, too. We've been running with a defective test case for years.
>
> Well, all the better that someone who actually has some clue what the tests are for is fixing them rather than us using my attempt at fixing them.

It's funny -- I had thought I was wasting my time in tracking it down.
August 16, 2011
On Tuesday, August 16, 2011 4:41:30 AM, Don Clugston wrote:
> I've added it to bugzilla. http://d.puremagic.com/issues/show_bug.cgi?id=6505
> 
> On 16 August 2011 04:20, Jonathan M Davis <jmdavisProg at gmx.com> wrote:
>> On Tuesday, August 16, 2011 04:16:10 Don Clugston wrote:
>>> And now it fails on all platforms with -O. Yikes.  It seems that the original bug was never actually fixed. It's an optimizer bug, too. We've been running with a defective test case for years.
>>
>> Well, all the better that someone who actually has some clue what the tests are for is fixing them rather than us using my attempt at fixing them.
> 
> It's funny -- I had thought I was wasting my time in tracking it down.

This thread is also really good indicators why having a pointer back to
the original conversation / report / etc is valuable.  Being able to go
back and re-examine what the original report was is rarely needed, but
when it is, it's pretty useful.  The trend to name the t
est function
after the bugzilla number is a good one, imho.

I did it a handful of times when converting the original private test suite into the public one.  There were a few tests that were large enough that I was concerned about the privacy issues.  Being able to find that they'd been gathered through public postings to a news group or bugzilla was enough to allow them to be made public.

Good stuff..
Brad
August 23, 2011
On 16 August 2011 20:08, Brad Roberts <braddr at puremagic.com> wrote:
> On Tuesday, August 16, 2011 4:41:30 AM, Don Clugston wrote:
>> I've added it to bugzilla. http://d.puremagic.com/issues/show_bug.cgi?id=6505
>>
>> On 16 August 2011 04:20, Jonathan M Davis <jmdavisProg at gmx.com> wrote:
>>> On Tuesday, August 16, 2011 04:16:10 Don Clugston wrote:
>>>> And now it fails on all platforms with -O. Yikes. ?It seems that the original bug was never actually fixed. It's an optimizer bug, too. We've been running with a defective test case for years.

Walter -- are you working on fixing this?
If not, I'll have a go at it. It looks like a failure to spill FP
registers, and I don't think the backend currently has any capability
to spill, so a fix is probably not easy.
A quick hack fix would be to disable optimisation entirely when this happens.
August 22, 2011

On 8/22/2011 3:29 PM, Don Clugston wrote:
> On 16 August 2011 20:08, Brad Roberts<braddr at puremagic.com>  wrote:
>> On Tuesday, August 16, 2011 4:41:30 AM, Don Clugston wrote:
>>> I've added it to bugzilla. http://d.puremagic.com/issues/show_bug.cgi?id=6505
>>>
>>> On 16 August 2011 04:20, Jonathan M Davis<jmdavisProg at gmx.com>  wrote:
>>>> On Tuesday, August 16, 2011 04:16:10 Don Clugston wrote:
>>>>> And now it fails on all platforms with -O. Yikes.  It seems that the original bug was never actually fixed. It's an optimizer bug, too. We've been running with a defective test case for years.
> Walter -- are you working on fixing this?
> If not, I'll have a go at it. It looks like a failure to spill FP
> registers, and I don't think the backend currently has any capability
> to spill, so a fix is probably not easy.
> A quick hack fix would be to disable optimisation entirely when this happens.
>

The backend does spill the x87 stack when it overflows. I suspect the problem may lie in the scheduler. This is easily checked by disabling the scheduler and trying it.
August 23, 2011
On 23 August 2011 00:41, Walter Bright <walter at digitalmars.com> wrote:
>
>
> On 8/22/2011 3:29 PM, Don Clugston wrote:
>>
>> On 16 August 2011 20:08, Brad Roberts<braddr at puremagic.com> ?wrote:
>>>
>>> On Tuesday, August 16, 2011 4:41:30 AM, Don Clugston wrote:
>>>>
>>>> I've added it to bugzilla. http://d.puremagic.com/issues/show_bug.cgi?id=6505
>>>>
>>>> On 16 August 2011 04:20, Jonathan M Davis<jmdavisProg at gmx.com> ?wrote:
>>>>>
>>>>> On Tuesday, August 16, 2011 04:16:10 Don Clugston wrote:
>>>>>>
>>>>>> And now it fails on all platforms with -O. Yikes. ?It seems that the original bug was never actually fixed. It's an optimizer bug, too. We've been running with a defective test case for years.
>>
>> Walter -- are you working on fixing this?
>> If not, I'll have a go at it. It looks like a failure to spill FP
>> registers, and I don't think the backend currently has any capability
>> to spill, so a fix is probably not easy.
>> A quick hack fix would be to disable optimisation entirely when this
>> happens.
>>
>
> The backend does spill the x87 stack when it overflows. I suspect the problem may lie in the scheduler. This is easily checked by disabling the scheduler and trying it.

You're right, it's the scheduler. It loads 8 values on the x87 stack,
then does fdecstp.
Then, the 9th load lands on top of an existing value, causing a stack
overflow exception. The loaded value gets turned into a NaN.
I'll try to track it down further.
August 23, 2011
On 23 August 2011 08:59, Don Clugston <dclugston at googlemail.com> wrote:
> On 23 August 2011 00:41, Walter Bright <walter at digitalmars.com> wrote:
>>
>>
>> On 8/22/2011 3:29 PM, Don Clugston wrote:
>>>
>>> On 16 August 2011 20:08, Brad Roberts<braddr at puremagic.com> ?wrote:
>>>>
>>>> On Tuesday, August 16, 2011 4:41:30 AM, Don Clugston wrote:
>>>>>
>>>>> I've added it to bugzilla. http://d.puremagic.com/issues/show_bug.cgi?id=6505
>>> Walter -- are you working on fixing this?

>> The backend does spill the x87 stack when it overflows. I suspect the problem may lie in the scheduler. This is easily checked by disabling the scheduler and trying it.
>
> You're right, it's the scheduler. It loads 8 values on the x87 stack,
> then does fdecstp.
> Then, the 9th load lands on top of an existing value, causing a stack
> overflow exception. The loaded value gets turned into a NaN.
> I'll try to track it down further.

OK, here's the problem. The scheduler keeps track of the number of
used x87 registers via the "fpustackused" variable.
Each instruction has a "fpuadjust" value which says how many x87
registers it uses.
But, the problem is CALL instructions. They have fpuadjust = 0.
But, a function returning float returns it in ST(0) --- so fpuadjust
should be +1.
And if it's a complex result, fpuadjust should be +2.
I think it should be possible to put an assert in cgsched.c, line 2491:

            if (tbl[j])
            {
                fpu += tbl[j]->fpuadjust;
+ assert(fpu >= 0);
                if (fpu >= 8)                   // if FPU stack overflow

but this would fail at the moment, because the total goes negative
after a call followed by an FSTP.
So it needs to distinguish between each type of call, depending on how
it affects the FPU stack.
I think the bug lies in funccall() in cod1.c; it should be setting
something in 'code' to record how many x87 registers are returned.

Then, there'll be another minor problem: the number of FPU values CAN
reach 8. This is because of the code in push87() in cg87.c, which does
a
fdecstp; fstp; when the stack is full. So the condition above will
need to change to:
                if (fpu > 8)                   // if FPU stack overflow

This is too invasive a change for me to make and test.
August 23, 2011
On Tuesday, August 23, 2011 5:46:40 AM, Don Clugston wrote:
> On 23 August 2011 08:59, Don Clugston <dclugston at googlemail.com> wrote:
>> On 23 August 2011 00:41, Walter Bright <walter at digitalmars.com> wrote:
>>> On 8/22/2011 3:29 PM, Don Clugston wrote:
>>>> On 16 August 2011 20:08, Brad Roberts<braddr at puremagic.com>  wrote:
>>>>> On Tuesday, August 16, 2011 4:41:30 AM, Don Clugston wrote:
>>>>>> I've added it to bugzilla. http://d.puremagic.com/issues/show_bug.cgi?id=6505
>>>> Walter -- are you working on fixing this?
> 
>>> The backend does spill the x87 stack when it overflows. I suspect the problem may lie in the scheduler. This is easily checked by disabling the scheduler and trying it.
>>
>> You're right, it's the scheduler. It loads 8 values on the x87 stack,
>> then does fdecstp.
>> Then, the 9th load lands on top of an existing value, causing a stack
>> overflow exception. The loaded value gets turned into a NaN.
>> I'll try to track it down further.
>

> OK, here's the problem. The scheduler keeps track of the number of
> used x87 registers via the "fpustackused" variable.
> Each instruction has a "fpuadjust" value which says how many x87
> registers it uses.
> But, the problem is CALL instructions. They have fpuadjust = 0.
> But, a function returning float returns it in ST(0) --- so fpuadjust
> should be +1.
> And if it's a complex result, fpuadjust should be +2.
> I think it should be possible to put an assert in cgsched.c, line 2491:
> 
>             if (tbl[j])
>             {
>                 fpu += tbl[j]->fpuadjust;
> + assert(fpu >= 0);
>                 if (fpu >= 8)                   // if FPU stack overflow
> 
> but this would fail at the moment, because the total goes negative
> after a call followed by an FSTP.
> So it needs to distinguish between each type of call, depending on how
> it affects the FPU stack.
> I think the bug lies in funccall() in cod1.c; it should be setting
> something in 'code' to recor
d how many x87 registers are returned.
> 
> Then, there'll be another minor problem: the number of FPU values CAN
> reach 8. This is because of the code in push87() in cg87.c, which does
> a
> fdecstp; fstp; when the stack is full. So the condition above will
> need to change to:
>                 if (fpu > 8)                   // if FPU stack overflow
> 
> This is too invasive a change for me to make and test.

For the time being, since this has been broken for a long time, can we revert/disable the failing test so that it doesn't hide other failures?
« First   ‹ Prev
1 2