May 19, 2022
On 5/19/2022 7:33 AM, Steven Schveighoffer wrote:
> I hope we are not depending on the type system to the degree where a bool must be an integer in order to have this kind of optimization.


Does that mean you prefer:

    a = 3 + cast(int)(b < c) * 5;

? If so, I don't see what is gained by that.
May 19, 2022
On 5/19/2022 12:46 AM, ab wrote:
> This can be solved by a cast with explicit source and destination type, e.g.

This indeed can work, but when people complain about adding attributes (valid complaints), how are they going to react to having to do this?
May 19, 2022
On 5/19/2022 12:57 AM, bauss wrote:
> On Thursday, 19 May 2022 at 03:46:42 UTC, Walter Bright wrote:
>>
>> Casts are a common source of bugs, not correctness. This is because it is forced override of the type system. If the types change due to refactoring, the cast may no longer be correct, but the programmer will have no way of knowing.
> 
> I'd argue that implicit casts are more so in some cases.

D's rules added some constraints to C's rules to prevent loss of data with implicit casting. I don't see how D's implicit casts are a dangerous source of bugs.


> And also you shouldn't really do arithmetic operations on chars anyway, at least not with unicode and D is supposed to be a unicode language.

It turns out that for performance reasons, you definitely want to treat UTF-8 as individual code units. Autodecode taught us that the hard way.


> Upper-casing in unicode is not as simple as an addition, because the rules for doing so are language specific.

I'm painfully aware that the Unicode consortium made it impossible to do "correct" Unicode without a megabyte library.

> Even with ASCII you can't just rely on a mathematic computation, because not all characters can change case, such as symbols.

Yes, you can. I posted the code in another post in this thread. ASCII hasn't changed in my professional lifetime, and I seriously doubt it will change in yours.

> If D is to ever attract more users, then it must not surprise new users.

The only problem we've had with D chars is autodecoding, which ironically does what you propose - treat everything as Unicode code points rather than code units.

It's a great idea, but it simply does not work, and it took us years to become convinced of that.
May 19, 2022
On 5/19/2022 1:05 AM, bauss wrote:
> On Thursday, 19 May 2022 at 04:10:04 UTC, Walter Bright wrote:
>>
>> There's nothing wrong with:
>>
>>     if ('A' <= c && c <= 'Z')
>>         c = c | 0x20;
>>
> 
> There is, this assumes that the character is ascii and not unicode.

It does not assume it, it tests for if it would be valid.


> What about say 'Å' -> 'å'?
> 
> It won't work for that.

I know. And for many applications (like dev tools), it is fine.
May 19, 2022
On Thursday, 19 May 2022 at 18:20:26 UTC, Walter Bright wrote:
> On 5/19/2022 7:33 AM, Steven Schveighoffer wrote:
>> I hope we are not depending on the type system to the degree where a bool must be an integer in order to have this kind of optimization.
>
>
> Does that mean you prefer:
>
>     a = 3 + cast(int)(b < c) * 5;
>
> ? If so, I don't see what is gained by that.

a = 3 + int(b < c) * 5;

avoids forcing it with an explicit cast, lower risk of writing a bug (or creating one later in a refactor).
May 19, 2022
On Thursday, 19 May 2022 at 18:35:42 UTC, Walter Bright wrote:
> On 5/19/2022 1:05 AM, bauss wrote:
>> On Thursday, 19 May 2022 at 04:10:04 UTC, Walter Bright wrote:
>>>
>>> There's nothing wrong with:
>>>
>>>     if ('A' <= c && c <= 'Z')
>>>         c = c | 0x20;
>>>
>> 
>> There is, this assumes that the character is ascii and not unicode.
>
> It does not assume it, it tests for if it would be valid.

"However, the assumption that setting bit 5 of the representation will convert uppercase letters to lowercase is not valid for EBCDIC." [1]

[1] Does C and C++ guarantee the ASCII of [a-f] and [A-F] characters?
    https://ogeek.cn/qa/?qa=669486/
May 19, 2022

On 5/19/22 2:20 PM, Walter Bright wrote:

>

On 5/19/2022 7:33 AM, Steven Schveighoffer wrote:

>

I hope we are not depending on the type system to the degree where a bool must be an integer in order to have this kind of optimization.

Does that mean you prefer:

    a = 3 + cast(int)(b < c) * 5;

? If so, I don't see what is gained by that.

No, I find that nearly unreadable. I prefer the original:

a = b < c ? 8 : 3;

And let the compiler come up with whatever funky stuff it wants to in order to make it fast.

-Steve

May 19, 2022
On Thu, May 19, 2022 at 10:33:14AM -0400, Steven Schveighoffer via Digitalmars-d wrote:
> On 5/19/22 12:35 AM, Walter Bright wrote:
[...]
> > The APL language relies on bools being integers so conditional operations can be carried out without branching :-)
> > 
> > I.e.
> >      a = (b < c) ? 8 : 3;
> > 
> > becomes:
> > 
> >      a = 3 + (b < c) * 5;   // I know this is not APL syntax
> > 
> > That works in D, too!
> 
> I hope we are not depending on the type system to the degree where a bool must be an integer in order to have this kind of optimization.
[...]

IME, gcc and ldc2 are well able to convert the above ?: expression into the latter, without uglifying the code.  Why are we promoting (or even allowing) this kind of ugly code just because dmd's optimizer is so lackluster you have to manually spell things out this way?


T

-- 
Without outlines, life would be pointless.
May 19, 2022
On 5/19/22 12:13, kdevel wrote:
> On Thursday, 19 May 2022 at 18:35:42 UTC, Walter Bright wrote:
>> On 5/19/2022 1:05 AM, bauss wrote:
>>> On Thursday, 19 May 2022 at 04:10:04 UTC, Walter Bright wrote:
>>>>
>>>> There's nothing wrong with:
>>>>
>>>>     if ('A' <= c && c <= 'Z')
>>>>         c = c | 0x20;
>>>>
>>>
>>> There is, this assumes that the character is ascii and not unicode.
>>
>> It does not assume it, it tests for if it would be valid.
>
> "However, the assumption that setting bit 5 of the representation will
> convert uppercase letters to lowercase is not valid for EBCDIC." [1]
>
> [1] Does C and C++ guarantee the ASCII of [a-f] and [A-F] characters?
>      https://ogeek.cn/qa/?qa=669486/

In D, char is UTF-8 and ASCII is a subset of UTF-8. Walter's code above is valid without making any ASCII assumption.

Ali

May 19, 2022
On 5/19/2022 12:13 PM, kdevel wrote:
> [1] Does C and C++ guarantee the ASCII of [a-f] and [A-F] characters?
>      https://ogeek.cn/qa/?qa=669486/

No. But D does.