Jump to page: 1 2
Thread overview
Numeric limits tracking
May 13, 2013
Manu
May 13, 2013
John Colvin
May 13, 2013
Manu
May 13, 2013
Walter Bright
May 13, 2013
bearophile
May 13, 2013
Manu
May 13, 2013
Walter Bright
May 13, 2013
Manu
May 13, 2013
David Nadlinger
May 13, 2013
deadalnix
May 16, 2013
Marco Leise
May 16, 2013
Simen Kjaeraas
May 13, 2013
So, here's an issue that constantly drives me nuts, and an elegant solution seems so do-able.

void func(int x)
{
  x &= 0xFFFF;
  short s = x; // Error! (but we know x is 0 .. 65535)

  if(x < 256)
  {
    byte b = x; // Error! (we also know x is 0 .. 255)
  }
}

It would be really nice if the compiler would carry around the known
possible range of values, and refer to that information when performing
down-casting assignments.
It would also be very useful information for the back end, it can choose
more efficient types if it knows the range, or produce jump tables rather
than branch sequences.

I think the compiler would need a min, max, and mask for any integers, and update them as it works.

There are many sources of this information.
Comparisons effectively limit the range:
if(x > 0)
{
  // x = 0 .. int.max
}
else
{
  // x = int.min .. -1
}

Masks, obviously:
x &= 15;

Also contracts are a really great source of seeding this information on function entry.

Has this been discussed? It seems simple, and it's invisible to the language. It would just reduce some boilerplate (tedious casts), and also offer some great optimisation opportunities.


May 13, 2013
On Monday, 13 May 2013 at 00:00:54 UTC, Manu wrote:
> So, here's an issue that constantly drives me nuts, and an elegant solution
> seems so do-able.
>
> void func(int x)
> {
>   x &= 0xFFFF;
>   short s = x; // Error! (but we know x is 0 .. 65535)
>
>   if(x < 256)
>   {
>     byte b = x; // Error! (we also know x is 0 .. 255)
>   }
> }
>
> It would be really nice if the compiler would carry around the known
> possible range of values, and refer to that information when performing
> down-casting assignments.
> It would also be very useful information for the back end, it can choose
> more efficient types if it knows the range, or produce jump tables rather
> than branch sequences.
>
> I think the compiler would need a min, max, and mask for any integers, and
> update them as it works.
>
> There are many sources of this information.
> Comparisons effectively limit the range:
> if(x > 0)
> {
>   // x = 0 .. int.max
> }
> else
> {
>   // x = int.min .. -1
> }
>
> Masks, obviously:
> x &= 15;
>
> Also contracts are a really great source of seeding this information on
> function entry.
>
> Has this been discussed? It seems simple, and it's invisible to the
> language. It would just reduce some boilerplate (tedious casts), and also
> offer some great optimisation opportunities.

I've been thinking about this for a while from an optimisation point of view. Is this not a common practice for an optimising compiler?
May 13, 2013
It is, but there are limits to what it can do.
D's contracts potentially offer a lot of information that the optimiser
wouldn't naturally have (any function receiving blind arguments can't make
any assumptions without a contract present), but more from a
usability/safety point of view, I really hate casting when the compiler
should know the assignment is safe.

It's not only an inconvenience, it's also a safety thing.
Consider:
I have an int that I assign to a byte, I know the value fits in a byte, but
I'm forced to type the manual cast anyway.
Once that cast is written, if the valid range of my int happens to change
in outer code, I will not receive an error informing me that the new range
doesn't fit within a byte anymore, it'll just truncate it anyway, since a
blind cast is already there in the code.
It would be much better for the compiler to start complaining that it can
no longer assure that the int fits into the byte. With the cast present, my
bug has been hidden until I eventually crash.


On 13 May 2013 10:18, John Colvin <john.loughran.colvin@gmail.com> wrote:

> On Monday, 13 May 2013 at 00:00:54 UTC, Manu wrote:
>
>> So, here's an issue that constantly drives me nuts, and an elegant
>> solution
>> seems so do-able.
>>
>> void func(int x)
>> {
>>   x &= 0xFFFF;
>>   short s = x; // Error! (but we know x is 0 .. 65535)
>>
>>   if(x < 256)
>>   {
>>     byte b = x; // Error! (we also know x is 0 .. 255)
>>   }
>> }
>>
>> It would be really nice if the compiler would carry around the known
>> possible range of values, and refer to that information when performing
>> down-casting assignments.
>> It would also be very useful information for the back end, it can choose
>> more efficient types if it knows the range, or produce jump tables rather
>> than branch sequences.
>>
>> I think the compiler would need a min, max, and mask for any integers, and update them as it works.
>>
>> There are many sources of this information.
>> Comparisons effectively limit the range:
>> if(x > 0)
>> {
>>   // x = 0 .. int.max
>> }
>> else
>> {
>>   // x = int.min .. -1
>> }
>>
>> Masks, obviously:
>> x &= 15;
>>
>> Also contracts are a really great source of seeding this information on function entry.
>>
>> Has this been discussed? It seems simple, and it's invisible to the language. It would just reduce some boilerplate (tedious casts), and also offer some great optimisation opportunities.
>>
>
> I've been thinking about this for a while from an optimisation point of view. Is this not a common practice for an optimising compiler?
>


May 13, 2013
On 5/12/2013 5:00 PM, Manu wrote:
> So, here's an issue that constantly drives me nuts, and an elegant solution
> seems so do-able.

It's a form of "data flow analysis". This is done by the optimizer, although it doesn't track ranges.

It isn't quite as simple as you suggest, there are issues like dealing with arbitrary control flow.

  x &= 0xFFFF;
  short s = x; // Error! (but we know x is 0 .. 65535)

Ok, but what about:

  x &= 0xFFFF;
  while (expr) {
      short s = x;
      x += 1;
  }

There's a forward reference going on. We cannot semantically evaluate s=x until we semantically evaluate the rest of the function in order to know that x is or is not reassigned.
May 13, 2013
Walter Bright:

> There's a forward reference going on. We cannot semantically evaluate s=x until we semantically evaluate the rest of the function in order to know that x is or is not reassigned.

A first step to improve the situation is to propagate the range of const/immutable values inside a function:

void main() {
    uint x = 1000;
    immutable y = x % 256;
    ubyte z = y; // currently error.
}

Bye,
bearophile
May 13, 2013
On 13 May 2013 11:03, Walter Bright <newshound2@digitalmars.com> wrote:

> On 5/12/2013 5:00 PM, Manu wrote:
>
>> So, here's an issue that constantly drives me nuts, and an elegant
>> solution
>> seems so do-able.
>>
>
> It's a form of "data flow analysis". This is done by the optimizer, although it doesn't track ranges.
>
> It isn't quite as simple as you suggest, there are issues like dealing with arbitrary control flow.
>
>
>   x &= 0xFFFF;
>   short s = x; // Error! (but we know x is 0 .. 65535)
>
> Ok, but what about:
>
>   x &= 0xFFFF;
>   while (expr) {
>       short s = x;
>       x += 1;
>   }
>
> There's a forward reference going on. We cannot semantically evaluate s=x until we semantically evaluate the rest of the function in order to know that x is or is not reassigned.
>

It is done by the optimiser, but I'm suggesting that it doesn't only have value in terms of optimisation. It could improve language usability if performed in the front end.

In that context, unless the analysis is very powerful(/mature) and can
statically determine 'expr', then x would need to be relaxed to unlimited
at that point, since it obviously can't know the limits within (or after)
that loop.
In this case, the compiler would complain that it can't determine x and you
would need an explicit cast as usual.

I can see the effective forward reference issue here, which could be
addressed in time, but even just reasonably simple limits tracking (ie,
within sequential code) would make a massive difference. I think
programmers would intuitively understand this situation and wouldn't expect
magic from the compiler.
The places where it matters the most are places where it's dead obvious
what the numeric limits are, and it should be equally obvious to the
compiler.

It's the sort of thing that could be improved with time, but even a very simple start would be a big help in many cases.


May 13, 2013
On 5/12/2013 6:23 PM, Manu wrote:
> It is done by the optimiser, but I'm suggesting that it doesn't only have value
> in terms of optimisation. It could improve language usability if performed in
> the front end.

I understand.

> In that context, unless the analysis is very powerful(/mature) and can
> statically determine 'expr', then x would need to be relaxed to unlimited at
> that point, since it obviously can't know the limits within (or after) that loop.
> In this case, the compiler would complain that it can't determine x and you
> would need an explicit cast as usual.
>
> I can see the effective forward reference issue here, which could be addressed
> in time, but even just reasonably simple limits tracking (ie, within sequential
> code) would make a massive difference. I think programmers would intuitively
> understand this situation and wouldn't expect magic from the compiler.
> The places where it matters the most are places where it's dead obvious what the
> numeric limits are, and it should be equally obvious to the compiler.
>
> It's the sort of thing that could be improved with time, but even a very simple
> start would be a big help in many cases.

It's true you could do a sequential only, very conservative form of data flow analysis.

But be aware there are other problems, such as:

  int *p = ...;
  ...
  x &= 0xFFFF;
  ++(*p);      // hmm, is x affected?
  short s = x;

May 13, 2013
On 13 May 2013 13:46, Walter Bright <newshound2@digitalmars.com> wrote:

> On 5/12/2013 6:23 PM, Manu wrote:
>
>> It is done by the optimiser, but I'm suggesting that it doesn't only have
>> value
>> in terms of optimisation. It could improve language usability if
>> performed in
>> the front end.
>>
>
> I understand.
>
>
>  In that context, unless the analysis is very powerful(/mature) and can
>> statically determine 'expr', then x would need to be relaxed to unlimited
>> at
>> that point, since it obviously can't know the limits within (or after)
>> that loop.
>> In this case, the compiler would complain that it can't determine x and
>> you
>> would need an explicit cast as usual.
>>
>> I can see the effective forward reference issue here, which could be
>> addressed
>> in time, but even just reasonably simple limits tracking (ie, within
>> sequential
>> code) would make a massive difference. I think programmers would
>> intuitively
>> understand this situation and wouldn't expect magic from the compiler.
>> The places where it matters the most are places where it's dead obvious
>> what the
>> numeric limits are, and it should be equally obvious to the compiler.
>>
>> It's the sort of thing that could be improved with time, but even a very
>> simple
>> start would be a big help in many cases.
>>
>
> It's true you could do a sequential only, very conservative form of data flow analysis.
>
> But be aware there are other problems, such as:
>
>   int *p = ...;
>   ...
>   x &= 0xFFFF;
>   ++(*p);      // hmm, is x affected?
>   short s = x;
>

Yes indeed, there are certainly many ways to interfere with it, but I
certainly hope that most lines of code you deal with on a daily basis
aren't like this ;)
Obviously it would grow in maturity over time. In that example, just revert
to unlimited assumptions when it encounters the pointer operation. Limits
may then be refined again in following code.
I can imagine things like proper escaping analysis as planned could
eventually help in cases like this.

Anyway, it's just something to think about. I think it would offer additional safety/correctness to the language; anywhere you type a blind cast, you are potentially hiding a future bug. I'd rather never cast when I don't absolutely need/intend to.


May 13, 2013
On Monday, 13 May 2013 at 00:00:54 UTC, Manu wrote:
> Has this been discussed? It seems simple, and it's invisible to the language.

There is no such thing as a language feature invisible to the language – or at least there should not be. Even a feature, such as this, which only makes the language more permissive needs exact specifications, for all D compilers have to implement it in the same way.

David
May 13, 2013
On Monday, 13 May 2013 at 12:37:34 UTC, David Nadlinger wrote:
> On Monday, 13 May 2013 at 00:00:54 UTC, Manu wrote:
>> Has this been discussed? It seems simple, and it's invisible to the language.
>
> There is no such thing as a language feature invisible to the language – or at least there should not be. Even a feature, such as this, which only makes the language more permissive needs exact specifications, for all D compilers have to implement it in the same way.
>
> David

is(typeof()) say that any permissive change can also break code in D. This is our little specificity ;)
« First   ‹ Prev
1 2