February 13, 2007
Walter Bright wrote:
> Derek Parnell wrote:
>> On Mon, 12 Feb 2007 16:03:14 -0800, Andrei Alexandrescu (See Website For
>> Email) wrote:
>>
>>> http://erdani.org/d-implicit-conversions.pdf
>>  
>>> Did I forget something?
>>
>> Characters are not numbers. 
> 
> That's an enticing point of view, and it sounds good. But Pascal has that view, and my experience with it is it's one of the reasons Pascal sucks.
> 
> Examples:
> 
> 1) converting text <=> integers
> 2) converting case
> 3) doing compression/encryption code
> 4) using characters as indices (isspace() for example)
> 
> Take away the implicit conversions, and such code gets littered with ugly casts.

Ionno. Probably instead of dealing with data as a stream/string of characters, you handle it as integers, and that's just one cast. Pascal didn't offer you that.

How about the infamous automatic bool -> int conversion? Now that's a sucker that caused a ton of harm to C++.


Andrei
February 13, 2007
Walter Bright wrote:
> Derek Parnell wrote:
>> On Mon, 12 Feb 2007 16:03:14 -0800, Andrei Alexandrescu (See Website For
>> Email) wrote:
>>
>>> http://erdani.org/d-implicit-conversions.pdf
>> 
>>> Did I forget something?
>>
>> Characters are not numbers.
> 
> That's an enticing point of view, and it sounds good. But Pascal has that view, and my experience with it is it's one of the reasons Pascal sucks.
> 
> Examples:
> 
> 1) converting text <=> integers
> 2) converting case
> 3) doing compression/encryption code
> 4) using characters as indices (isspace() for example)
> 
> Take away the implicit conversions, and such code gets littered with ugly casts.

I don't find that; for reasons we don't need to go into
(except to say that I'm glad C++09 will have better Unicode
support than C++03), I've been using a separate type for
characters in a significant body of C++ code, and find very
little need for casts.  Certainly not enough to dispense
with the advantages of type safety.  When the code gets
low level enough to need integral values, I don't mind
doing the conversion manually as there will typically be
a need to handle byte ordering issues or similar too.  But
that's just in the cases I've seen in the last {mumble}
years.

The examples you give are real, make up a tiny fraction
of code that handles characters, and aren't, in my experience,
significantly adversely affected by the elimination of
these implicit conversions.

-- James
February 13, 2007
On Mon, 12 Feb 2007 18:59:49 -0800, Walter Bright wrote:

> Derek Parnell wrote:
>> On Mon, 12 Feb 2007 16:03:14 -0800, Andrei Alexandrescu (See Website For
>> Email) wrote:
>> 
>>> http://erdani.org/d-implicit-conversions.pdf
>> 
>>> Did I forget something?
>> 
>> Characters are not numbers.
> 
> That's an enticing point of view, and it sounds good. But Pascal has that view, and my experience with it is it's one of the reasons Pascal sucks.
> 
> Examples:
> 
> 1) converting text <=> integers
> 2) converting case
> 3) doing compression/encryption code
> 4) using characters as indices (isspace() for example)
> 
> Take away the implicit conversions, and such code gets littered with ugly casts.

D has a neat property sub-system already. For example, it is used to get at the underlying implementation data for arrays. So why not call spade a spade and stop helping bug-making. You have recently done this with implicit conversion from array pointers and arrays. If characters had a property called, for example, ".numval" then ugly casts would not be needed *and* such special character usage will be documented.

In the examples above, (1) and (2) are really far to complex in the unicode world to simply perform arithmetic on the implementation value of a specific character to get a result. They really need table look ups or similar to do it well. As you know, not all strings are ASCII.

Compression/encryption is best done using unsigned bytes so I would cast the 'string' to that. And by doing so, it highlights to the code reader that something special is going on here.

   ubyte[] res = encrypt(cast(ubyte[]) stringdata );

Note the result of encryption/compression is most certainly not going to be a valid UTF string so a ubyte[] would be a better choice.

Finally, the fourth example lends itself to the .numval property very nicely ...

  ulong a = char_prop[ somechar.numval ];


If our aim is to make writing and reading D code as easy as possible, while also helping the coder to implement their algorithms correctly, then the compiler should at least highlight inappropriate implicit conversions such as ...

   return lowchar + 'A' - 'a';

If one really feels that they must do this then at least let the coder reader know that this is odd.

  return lowchar + 'A'.numval - 'a'.numval;

-- 
Derek
(skype: derek.j.parnell)
Melbourne, Australia
"Justice for David Hicks!"
13/02/2007 2:27:39 PM
February 13, 2007
James Dennett wrote:
> Walter Bright wrote:
>> Derek Parnell wrote:
>>> On Mon, 12 Feb 2007 16:03:14 -0800, Andrei Alexandrescu (See Website For
>>> Email) wrote:
>>>
>>>> http://erdani.org/d-implicit-conversions.pdf
>>>  
>>>> Did I forget something?
>>> Characters are not numbers. 
>> That's an enticing point of view, and it sounds good. But Pascal has
>> that view, and my experience with it is it's one of the reasons Pascal
>> sucks.
>>
>> Examples:
>>
>> 1) converting text <=> integers
>> 2) converting case
>> 3) doing compression/encryption code
>> 4) using characters as indices (isspace() for example)
>>
>> Take away the implicit conversions, and such code gets littered with
>> ugly casts.
> 
> I don't find that; for reasons we don't need to go into
> (except to say that I'm glad C++09 will have better Unicode
> support than C++03), I've been using a separate type for
> characters in a significant body of C++ code, and find very
> little need for casts.  Certainly not enough to dispense
> with the advantages of type safety.  When the code gets
> low level enough to need integral values, I don't mind
> doing the conversion manually as there will typically be
> a need to handle byte ordering issues or similar too.  But
> that's just in the cases I've seen in the last {mumble}
> years.
> 
> The examples you give are real, make up a tiny fraction
> of code that handles characters, and aren't, in my experience,
> significantly adversely affected by the elimination of
> these implicit conversions.

I agree. To add insult to injury, the inverse automated conversion would allow me to call toupper(a * b /c) without a cast in sight. What the hell is that needed for? Dammit.

Andrei
February 13, 2007
Walter Bright wrote:
> Derek Parnell wrote:
>> Characters are not numbers. 
> 
> That's an enticing point of view, and it sounds good. But Pascal has that view, and my experience with it is it's one of the reasons Pascal sucks.
> 
> Examples:
> 
> 1) converting text <=> integers
> 2) converting case

Neither are pointers numbers, but
	&a - &b
yields a usable number.  So long as
	x += c - 'a'
works, I don’t care if
	'a' * '3'
breaks.

> 4) using characters as indices (isspace() for example)

Is there a way to declare the index type of an array?

--Joel
February 13, 2007
Joel C. Salomon wrote:
> Walter Bright wrote:
>> Derek Parnell wrote:
>>> Characters are not numbers. 
>>
>> That's an enticing point of view, and it sounds good. But Pascal has that view, and my experience with it is it's one of the reasons Pascal sucks.
>>
>> Examples:
>>
>> 1) converting text <=> integers
>> 2) converting case
> 
> Neither are pointers numbers, but
>     &a - &b
> yields a usable number.  So long as
>     x += c - 'a'
> works, I don’t care if
>     'a' * '3'
> breaks.

Thank $DEITY. If nobody made this point before I read all of this subthread I would have to write a long post explaining this.

Let me reiterate that: characters are to integers as pointers are to integers.
The difference between two pointers is an integer, and you can add integers to pointers. These are the only arithmetic operations allowed on pointers.
The same should hold if you substitute 'character' for 'pointer' everywhere in previous two sentences.


Now, a short comment for each of the cases:

Walter Bright wrote:
> Examples:
>
> 1) converting text <=> integers

I don't see any reason why disallowing conversions from characters to integers would disallow one to add or subtract integers from characters.
So (for c of type char/wchar/dchar) c - '0' can still be an integer, for example.
But it makes absolutely no sense to be able to say c + '0'. Or c * '0'.

> 2) converting case

As above, (c - 'A') + 'a' can still be allowed. (c - 'A') is an integer, add 'a' to get a character again.

> 3) doing compression/encryption code

These should probably use void[] for input and ubyte[] for output.

> 4) using characters as indices (isspace() for example)

If you're using Unicode this is a bad idea anyway. Except perhaps if you use a sparce associative array, and then this isn't a problem anyway.

If you insist on using a regular array (and make sure the character value is suitably small) you don't necessarily have to use a cast, you can also subtract '\0' if you prefer.


Back to Joel:
>> 4) using characters as indices (isspace() for example)
> 
> Is there a way to declare the index type of an array?

Only if you use an associative array.
February 13, 2007
Frits van Bommel wrote:
> Joel C. Salomon wrote:
>> Walter Bright wrote:
>>> Derek Parnell wrote:
>>>> Characters are not numbers. 
>>>
>>> That's an enticing point of view, and it sounds good. But Pascal has that view, and my experience with it is it's one of the reasons Pascal sucks.
>>>
>>> Examples:
>>>
>>> 1) converting text <=> integers
>>> 2) converting case
>>
>> Neither are pointers numbers, but
>>     &a - &b
>> yields a usable number.  So long as
>>     x += c - 'a'
>> works, I don’t care if
>>     'a' * '3'
>> breaks.
> 
> Thank $DEITY. If nobody made this point before I read all of this subthread I would have to write a long post explaining this.
> 
> Let me reiterate that: characters are to integers as pointers are to integers.
> The difference between two pointers is an integer, and you can add integers to pointers. These are the only arithmetic operations allowed on pointers.
> The same should hold if you substitute 'character' for 'pointer' everywhere in previous two sentences.
> 
> 
> Now, a short comment for each of the cases:
> 
> Walter Bright wrote:
>  > Examples:
>  >
>  > 1) converting text <=> integers
> 
> I don't see any reason why disallowing conversions from characters to integers would disallow one to add or subtract integers from characters.
> So (for c of type char/wchar/dchar) c - '0' can still be an integer, for example.
> But it makes absolutely no sense to be able to say c + '0'. Or c * '0'.
> 
>  > 2) converting case
> 
> As above, (c - 'A') + 'a' can still be allowed. (c - 'A') is an integer, add 'a' to get a character again.
> 
>  > 3) doing compression/encryption code
> 
> These should probably use void[] for input and ubyte[] for output.
> 
>  > 4) using characters as indices (isspace() for example)
> 
> If you're using Unicode this is a bad idea anyway. Except perhaps if you use a sparce associative array, and then this isn't a problem anyway.
> 
> If you insist on using a regular array (and make sure the character value is suitably small) you don't necessarily have to use a cast, you can also subtract '\0' if you prefer.
> 
> 
> Back to Joel:
>>> 4) using characters as indices (isspace() for example)
>>
>> Is there a way to declare the index type of an array?
> 
> Only if you use an associative array.

I think these are great ideas that could help us rethink the whole character handling business.


Andrei
February 13, 2007
Derek Parnell wrote:
> On Mon, 12 Feb 2007 18:59:49 -0800, Walter Bright wrote:
> 
>> Derek Parnell wrote:
>>> On Mon, 12 Feb 2007 16:03:14 -0800, Andrei Alexandrescu (See Website For
>>> Email) wrote:
>>>
>>>> http://erdani.org/d-implicit-conversions.pdf
>>>  
>>>> Did I forget something?
>>> Characters are not numbers. 
>> That's an enticing point of view, and it sounds good. But Pascal has that view, and my experience with it is it's one of the reasons Pascal sucks.
>>
>> Examples:
>>
>> 1) converting text <=> integers
>> 2) converting case
>> 3) doing compression/encryption code
>> 4) using characters as indices (isspace() for example)
>>
>> Take away the implicit conversions, and such code gets littered with ugly casts.
> 
> D has a neat property sub-system already. For example, it is used to get at
> the underlying implementation data for arrays. So why not call spade a
> spade and stop helping bug-making. You have recently done this with
> implicit conversion from array pointers and arrays. If characters had a
> property called, for example, ".numval" then ugly casts would not be needed
> *and* such special character usage will be documented.
> 

I was thinking pretty much the same. That (the ".num" property) together with the idea that Joel Salomon presented (that we could still allow subtraction of characters without casts) would neatly solve any problems in disallowing the implicit conversion of char to numbers.

-- 
Bruno Medeiros - MSc in CS/E student
http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D
February 13, 2007
Andrei Alexandrescu (See Website For Email) wrote:
> Walter Bright wrote:
>> Andrei Alexandrescu (See Website For Email) wrote:
>>> So the way things should be is: all meaning-preserving integral promotions should be kept; then, all implicit integral->floating point promotions should be severed; then, all implicit floating point->complex  should go.
>>>
>>> Right?
>>
>> Yes. Also disallow implicit conversion of Object to void*.
> 
> How iz zis:
> 
> http://erdani.org/d-implicit-conversions.pdf
> 
> I put Object and void* in there for your sake :o).
> 
> Did I forget something?
> 
> 
> Andrei

That chart looks nice. Perhaps it should be put in the official D doc (when it is finished)?

-- 
Bruno Medeiros - MSc in CS/E student
http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D
February 14, 2007
Andrei Alexandrescu (See Website For Email) wrote:
> Henning Hasemann wrote:
>> I know this is a more general questions as it applies to C and C++ as well,
>> but somewhere I have to ask and actually D is what Im coding in:
>>
>> Should one try to use uint in favor of int whenever one knows for sure the value
>> wont be negative? That whould be a bit more expressive but on the other hand
>> sometimes leads to type problems.
>> For example, when having things like this:
>>
>> T min(T)(T a, T b) {
>>   return a < b ? a : b;
>> }
>>
>> Here you whould need to ensure to cast values so they share a common type.
>>
>> How do you code? Do you use uint whenever it suitable reflects the data to
>> store (eg a x-y-position on the screen) or only when necessary?
> 
> Current D botches quite a few of the arithmetic conversions. Basically all conversions that may lose value, meaning, or precision should not be allowed implicitly. Walter is willing to fix D in accordance to that rule, which would yield an implicit conversion graph as shown in:
> 
> http://erdani.org/d-implicit-conversions.pdf
> 
> Notice that there is no arrow e.g. between int and uint (loss of meaning), or between int and float (loss of precision). But there is an arrow from int and uint to double, because double is able to represent them faithfully.
> 
> If we are nice, we may convince Walter to implement that soon (maybe in 1.006?) but it must be understood that the tighter rules will prompt changes in existing code.
> 
> To answer your question, with the new rules in hand, using unsigned types will considerably increase your expressiveness and your ability to detect bugs statically. Also, by the new rules ordering comparisons between mixed-sign types will be disallowed.

When this change occurs (since it seems like it will) is there any chance that the default opEquals method in Object could have its signature changed from:

    int opEquals(Object o);

to:

    bool opEquals(Object o);

Not doing so would disallow the following seemingly legal statement:

    bool c = a == b;

This issue has come up before, and it was shown that the bool rval case can be made no less efficient than the int rval case, so the only remaining problem is all the code that would break.  However, since a lot of code will probably break anyway with the tighter implicit conversion rules, perhaps it would be a good time to address this issue as well?


Sean