July 07, 2009
Robert Jacques wrote:
> On Tue, 07 Jul 2009 03:33:24 -0400, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:
>> Robert Jacques wrote:
>>>  That's really cool. But I don't think that's actually happening (Or are these the bugs you're talking about?):
>>>      byte x,y;
>>>     short z;
>>>     z = x+y;  // Error: cannot implicitly convert expression (cast(int)x + cast(int)y) of type int to short
>>>      // Repeat for ubyte, bool, char, wchar and *, -, /
>>
>> http://d.puremagic.com/issues/show_bug.cgi?id=3147 You may want to add to it.
> 
> Added. In summary, + * - / % >> >>> don't work for types 8-bits and under. << is inconsistent (x<<1 errors, but x<<y compiles). All the op assigns (+= *= -= /= %= >>= <<= >>>=) and pre/post increments (++ --) compile which is maddeningly inconsistent, particularly when the spec defines ++x as sugar for x = x + 1, which doesn't compile.
> 
>>> And by that logic shouldn't the following happen?
>>>      int x,y;
>>>     int z;
>>>     z = x+y;  // Error: cannot implicitly convert expression (cast(long)x + cast(long)y) of type long to int
>>
>> No. Int remains "special", i.e. arithmetic operations on it don't automatically grow to become long.
>>
>>> i.e. why the massive inconsistency between byte/short and int/long? (This is particularly a pain for generic i.e. templated code)
>>
>> I don't find it a pain. It's a practical decision.
> 
> Andrei, I have a short vector template (think vec!(byte,3), etc) where I've had to wrap the majority lines of code in cast(T)( ... ), because I support bytes and shorts. I find that both a kludge and a pain.

Well suggestions for improving things are welcome. But I don't think it will fly to make int+int yield a long.

>>> BTW: this means byte and short are not closed under arithmetic operations, which drastically limit their usefulness.
>>
>> I think they shouldn't be closed because they overflow for relatively small values.
> 
> Andrei, consider anyone who want to do image manipulation (or computer vision, video, etc). Since images are one of the few areas that use bytes extensively, and have to map back into themselves, they are basically sorry out of luck.

I understand, but also keep in mind that making small integers closed is the less safe option. So we'd be hurting everyone for the sake of the image manipulation folks.


Andrei
July 07, 2009
Andrei Alexandrescu, el  7 de julio a las 00:48 me escribiste:
> Robert Jacques wrote:
> >On Mon, 06 Jul 2009 01:05:10 -0400, Walter Bright <newshound1@digitalmars.com> wrote:
> >>Something for everyone here.
> >>
> >>
> >>http://www.digitalmars.com/d/1.0/changelog.html http://ftp.digitalmars.com/dmd.1.046.zip
> >>
> >>
> >>http://www.digitalmars.com/d/2.0/changelog.html http://ftp.digitalmars.com/dmd.2.031.zip
> >Thanks for another great release.
> >Also, I'm not sure if this is a bug or a feature with regard to the new
> >integer rules:
> >   byte x,y,z;
> >   z = x+y;    // Error: cannot implicitly convert expression (cast(int)x +
> >cast(int)y) of type int to byte
> >which makes sense, in that a byte can overflow, but also doesn't make sense,
> >since integer behaviour is different.
> 
> Walter has implemented an ingenious scheme for disallowing narrowing conversions while at the same time minimizing the number of casts required. He hasn't explained it, so I'll sketch an explanation here.
> 
> The basic approach is "value range propagation": each expression is associated with a minimum possible value and a maximum possible value. As complex expressions are assembled out of simpler expressions, the ranges are computed and propagated.
> 
> For example, this code compiles:
> 
> int x = whatever();
> bool y = x & 1;
> 
> The compiler figures that the range of x is int.min to int.max, the
> range of 1 is 1 to 1, and (here's the interesting part), the range of
> x & 1 is 0 to 1. So it lets the code go through. However, it won't allow
> this:
> 
> int x = whatever();
> bool y = x & 2;
> 
> because x & 2 has range between 0 and 2, which won't fit in a bool.
> 
> The approach generalizes to arbitrary complex expressions. Now here's the trick though: the value range propagation is local, i.e. all ranges are forgotten beyond one expression. So as soon as you move on to the next statement, the ranges have been forgotten.
> 
> Why? Simply put, increased implementation difficulties and increased
> compiler memory footprint for diminishing returns. Both Walter and
> I noticed that expression-level value range propagation gets rid of all
> dangerous cases and the vast majority of required casts. Indeed, his
> test suite, Phobos, and my own codebase required surprisingly few
> changes with the new scheme. Moreover, we both discovered bugs due to
> the new feature, so we're happy with the status quo.
> 
> Now consider your code:
> 
> byte x,y,z;
> z = x+y;
> 
> The first line initializes all values to zero. In an intra-procedural value range propagation, these zeros would be propagated to the next statement, which would range-check. However, in the current approach, the ranges of x, y, and z are forgotten at the first semicolon. Then, x+y has range -byte.min-byte.min up to byte.max+byte.max as far as the type checker knows. That would fit in a short (and by the way I just found a bug with that occasion) but not in a byte.

This seems nice. I think it would be nice if this kind of things are commented in the NG before a compiler release, to allow community input and discussion.

I think this kind of things are the ones that deserves some kind of RFC (like Python PEPs) like someone suggested a couple of days ago.

-- 
Leandro Lucarella (luca) | Blog colectivo: http://www.mazziblog.com.ar/blog/
----------------------------------------------------------------------------
GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145  104C 949E BFB6 5F5A 8D05)
----------------------------------------------------------------------------
July 07, 2009
Leandro Lucarella wrote:
> This seems nice. I think it would be nice if this kind of things are
> commented in the NG before a compiler release, to allow community input
> and discussion.

Yup, that's what happened to case :o).

> I think this kind of things are the ones that deserves some kind of RFC
> (like Python PEPs) like someone suggested a couple of days ago.

I think that's a good idea. Who has the time and resources to set that up?


Andrei

July 07, 2009
On Tue, Jul 7, 2009 at 11:33 AM, Andrei Alexandrescu<SeeWebsiteForEmail@erdani.org> wrote:
>
> Well 32-bit architectures may be a historical relic but I don't think 32-bit integers are. And I think it would be too disruptive a change to promote results of arithmetic operation between integers to long.
>
> ...
>
> This is a different beast. We simply couldn't devise a satisfactory scheme within the constraints we have. No simple solution we could think of has worked, nor have a number of sophisticated solutions. Ideas would be welcome, though I need to warn you that the devil is in the details so the ideas must be fully baked; too many good sounding high-level ideas fail when analyzed in detail.

Hm.  Just throwing this out there, as a possible solution for both problems.

Suppose you kept the current set of integer types, but made all of them "open" (i.e. byte+byte=short, int+int=long etc.).  Furthermore, you made it impossible to implicitly convert between the signed and unsigned types of the same size (the int<>uint hole disappears).

But then you introduce two new native-size integer types.  Well, we already have them - ptrdiff_t and size_t - but give them nicer names, like word and uword.  Unlike the other integer types, these would be implicitly convertible to one another.  They'd more or less take the place of 'int' and 'uint' in most code, since most of the time, the size of the integer isn't that important.
July 07, 2009
On Tue, 07 Jul 2009 08:53:49 +0200, Lars T. Kyllingstad wrote:

> Ary Borenszweig wrote:
>> のしいか (noshiika) escribió:
>>> Thank you for the great work, Walter and all the other contributors.
>>>
>>> But I am a bit disappointed with the CaseRangeStatement syntax. Why is
>>> it
>>>    case 0: .. case 9:
>>> instead of
>>>    case 0 .. 9:
>>>
>>> With the latter notation, ranges can be easily used together with
>>> commas, for example:
>>>    case 0, 2 .. 4, 6 .. 9:
>>>
>>> And CaseRangeStatement, being inconsistent with other syntaxes using
>>> the .. operator, i.e. slicing and ForeachRangeStatement, includes the
>>> endpoint.
>>> Shouldn't D make use of another operator to express ranges that
>>> include the endpoints as Ruby or Perl6 does?
>> 
>> I agree.
>> 
>> I think this syntax is yet another one of those things people looking at D will say "ugly" and turn their heads away.
> 
> 
> When the discussion first came up in the NG, I was a bit sceptical about Andrei's suggestion for the case range statement as well. Now, I definitely think it's the best choice, and it's only because I realised it can be written like this:
> 
>      case 1:
>      ..
>      case 4:
>          // do stuff
> 
[snip]

I think it looks much better that way and users are more likely to be
comfortable with the syntax.
I hope it will be displayed in the examples that way.

Still, the syntax at all looks a bit alien because it's a syntax addition.
July 07, 2009
On Tue, 07 Jul 2009 11:36:26 -0400, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:
> Robert Jacques wrote:
>>  Andrei, I have a short vector template (think vec!(byte,3), etc) where I've had to wrap the majority lines of code in cast(T)( ... ), because I support bytes and shorts. I find that both a kludge and a pain.
>
> Well suggestions for improving things are welcome. But I don't think it will fly to make int+int yield a long.

Suggestion 1:
Loft the right hand of the expression (when lofting is valid) to the size of the left hand. i.e.

byte a,b,c;
c = a + b;  => c = a + b;

short d;
d = a + b;  => d = cast(short) a + cast(short) b;

int e, f;
e = a + b;  => e = cast(short) a + cast(short) b;
e = a + b + d; => e = cast(int)(cast(short) a + cast(short) b) + cast(int) d; Or e = cast(int) a + (cast(int) b + cast(int)d);

long g;
g = e + f;  => d = cast(long) e + cast(long) f;

When choosing operator overloads or auto, prefer the ideal lofted interpretation (as per the new rules, but without the exception for int/long), over truncated variants. i.e.
auto h = a + b; => short h = cast(short) a + cast(short) b;

This would also properly handled some of the corner/inconsistent cases with the current rules:
ubyte  i;
ushort j;
j = -i;    => j = -cast(short)i; (This currently evaluates to j = cast(short)(-i);

And
a += a;
is equivalent to
a = a + a;
and is logically consistent with
byte[] k,l,m;
m[] = k[] + l[];

Essentially, instead of trying to prevent overflows, except for those from int and long, this scheme attempts to minimize the risk of overflows, including those from int (and long, once cent exists. Maybe long+long=>bigInt?)


Suggestion 2:
Enable the full rules as part of SafeD and allow non-promotion in un-safe D. Note this could be synergistically combined with Suggestion 1.




>>>> BTW: this means byte and short are not closed under arithmetic operations, which drastically limit their usefulness.
>>>
>>> I think they shouldn't be closed because they overflow for relatively small values.
>>  Andrei, consider anyone who want to do image manipulation (or computer vision, video, etc). Since images are one of the few areas that use bytes extensively, and have to map back into themselves, they are basically sorry out of luck.
>
> I understand, but also keep in mind that making small integers closed is the less safe option. So we'd be hurting everyone for the sake of the image manipulation folks.
>
> Andrei

Well, how often does everyone else use bytes?
July 07, 2009
Andrei Alexandrescu wrote:
> Derek Parnell wrote:
>> It seems that D would benefit from having a standard syntax format for
>> expressing various range sets;
>>  a. Include begin Include end, i.e. []
>>  b. Include begin Exclude end, i.e. [)
>>  c. Exclude begin Include end, i.e. (]
>>  d. Exclude begin Exclude end, i.e. ()
> 
> I'm afraid this would majorly mess with pairing of parens.
> 
	I think Derek's point was to have *some* syntax to mean this, not
necessarily the one he showed (which he showed because I believe
that's the "standard" mathematical way to express it for English
speakers). For example, we could say that [] is always inclusive and
have another character which makes it exclusive like:
  a. Include begin Include end, i.e. [  a .. b  ]
  b. Include begin Exclude end, i.e. [  a .. b ^]
  c. Exclude begin Include end, i.e. [^ a .. b  ]
  d. Exclude begin Exclude end, i.e. [^ a .. b ^]


		Jerome

PS: If you *really* want messed parens pairing, try it with the French convention:   []   [[   ]]   ][       ;)
-- 
mailto:jeberger@free.fr
http://jeberger.free.fr
Jabber: jeberger@jabber.fr



July 07, 2009
Jérôme M. Berger wrote:
> Andrei Alexandrescu wrote:
>> Derek Parnell wrote:
>>> It seems that D would benefit from having a standard syntax format for
>>> expressing various range sets;
>>>  a. Include begin Include end, i.e. []
>>>  b. Include begin Exclude end, i.e. [)
>>>  c. Exclude begin Include end, i.e. (]
>>>  d. Exclude begin Exclude end, i.e. ()
>>
>> I'm afraid this would majorly mess with pairing of parens.
>>
>     I think Derek's point was to have *some* syntax to mean this, not necessarily the one he showed (which he showed because I believe that's the "standard" mathematical way to express it for English speakers). For example, we could say that [] is always inclusive and have another character which makes it exclusive like:
>  a. Include begin Include end, i.e. [  a .. b  ]
>  b. Include begin Exclude end, i.e. [  a .. b ^]
>  c. Exclude begin Include end, i.e. [^ a .. b  ]
>  d. Exclude begin Exclude end, i.e. [^ a .. b ^]

I think Walter's message really rendered the whole discussion moot. Post of the year:

=========================
I like:

   a .. b+1

to mean inclusive range.
=========================

Consider "+1]" a special symbol that means the range is to be closed to the right :o).


Andrei
July 07, 2009
Robert Jacques wrote:
> On Tue, 07 Jul 2009 11:36:26 -0400, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:
>> Robert Jacques wrote:
>>>  Andrei, I have a short vector template (think vec!(byte,3), etc) where I've had to wrap the majority lines of code in cast(T)( ... ), because I support bytes and shorts. I find that both a kludge and a pain.
>>
>> Well suggestions for improving things are welcome. But I don't think it will fly to make int+int yield a long.
> 
> Suggestion 1:
> Loft the right hand of the expression (when lofting is valid) to the size of the left hand. i.e.

What does loft mean in this context?

> byte a,b,c;
> c = a + b;  => c = a + b;

Unsafe.

> short d;
> d = a + b;  => d = cast(short) a + cast(short) b;

Should work today modulo bugs.

> int e, f;
> e = a + b;  => e = cast(short) a + cast(short) b;

Why cast to short? e has type int.

> e = a + b + d; => e = cast(int)(cast(short) a + cast(short) b) + cast(int) d; Or e = cast(int) a + (cast(int) b + cast(int)d);

I don't understand this.

> long g;
> g = e + f;  => d = cast(long) e + cast(long) f;

Works today.

> When choosing operator overloads or auto, prefer the ideal lofted interpretation (as per the new rules, but without the exception for int/long), over truncated variants. i.e.
> auto h = a + b; => short h = cast(short) a + cast(short) b;

This would yield semantics incompatible with C expressions.

> This would also properly handled some of the corner/inconsistent cases with the current rules:
> ubyte  i;
> ushort j;
> j = -i;    => j = -cast(short)i; (This currently evaluates to j = cast(short)(-i);

That should not compile, sigh. Walter wouldn't listen...

> And
> a += a;
> is equivalent to
> a = a + a;

Well not quite equivalent. In D2 they aren't. The former clarifies that you want to reassign the expression to a, and no cast is necessary. The latter would not compile if a is shorter than int.

> and is logically consistent with
> byte[] k,l,m;
> m[] = k[] + l[];
> 
> Essentially, instead of trying to prevent overflows, except for those from int and long, this scheme attempts to minimize the risk of overflows, including those from int (and long, once cent exists. Maybe long+long=>bigInt?)

But if you close operations for types smaller than int, you end up with a scheme even more error-prone that C!

> Suggestion 2:
> Enable the full rules as part of SafeD and allow non-promotion in un-safe D. Note this could be synergistically combined with Suggestion 1.

Safe D is concerned with memory safety only.


Andrei
July 07, 2009
Robert Jacques wrote:
> On Tue, 07 Jul 2009 03:33:24 -0400, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:
>> Robert Jacques wrote:
>>> BTW: this means byte and short are not closed under arithmetic operations, which drastically limit their usefulness.
>>
>> I think they shouldn't be closed because they overflow for relatively small values.
> 
> Andrei, consider anyone who want to do image manipulation (or computer vision, video, etc). Since images are one of the few areas that use bytes extensively, and have to map back into themselves, they are basically sorry out of luck.
> 
	Wrong example: in most cases, when doing image manipulations, you
don't want the overflow to wrap but instead to be clipped. Having
the compiler notify you when there is a risk of an overflow and
require you to be explicit in how you want it to be handled is
actually a good thing IMO.

		Jerome
-- 
mailto:jeberger@free.fr
http://jeberger.free.fr
Jabber: jeberger@jabber.fr