May 18, 2022
On 5/18/2022 5:57 PM, H. S. Teoh wrote:
> How is that any different from the current situation where arithmetic
> involving short ints require casts all over the place?  Even things
> like this require a cast:
> 
> 	short s = 123;
> 	//s = -s; // NG
> 	s = cast(short)-s; // required excess verbiage

I generally avoid using shorts. I agree the situation is hardly ideal, but there is no ideal way I've ever seen. The various schemes just shift the deck chairs around.


> It got so out of hand that I wrote nopromote.d, specifically to "poison"
> expressions involving short ints with a custom struct with overloaded
> ops that always truncate, just so I don't have to litter my code with
> casts in just about every expression involving short ints.

The only reason to ever use shorts is to save memory in a frequently allocated data structure. Short local variables do not save memory or time (in fact, they're larger and slower). If you're doing all these casts, perhaps look into using ints instead.


> In the case of char + int arithmetic, my opinion is that usually people
> do *not* (or *should* not) do char arithmetic directly -- with Unicode,
> it makes much less sense than the bad ole days of ASCII. These days,
> you'd call one of the std.uni functions for proper case mapping instead
> of a slipshod hack job of adding or subtracting some magic constant
> (which is wrong in anything except ASCII anyway).  In today's day and
> age, strings are best treated as opaque data that are manipulated by
> properly-implemented string functions in the standard library.  Having a
> few extra char/int casts in std.uni isn't the end of the world.  It
> shouldn't usually be done in user code anyway.  (And having to write
> lots of casts may motivate people to actually use proper string
> manipulation functions instead of winging it themselves with wrong
> implementations involving char arithmetic.)

There's nothing wrong with:

    if ('A' <= c && c <= 'Z')
        c = c | 0x20;

D doesn't have C's problems with optionally signed chars, 10 bit chars, EBCDIC, RADIX50 and other dead technologies.
May 18, 2022
On 5/18/2022 5:47 PM, Steven Schveighoffer wrote:
> But I have little hope for it, as Walter treats a boolean as an integer.

They *are* integers.

The APL language relies on bools being integers so conditional operations can be carried out without branching :-)

I.e.
    a = (b < c) ? 8 : 3;

becomes:

    a = 3 + (b < c) * 5;   // I know this is not APL syntax

That works in D, too!

Branchless code is a thing, it is used in GPUs, and in security code to make it resistant to timing attacks.

You'll also see this in the SIMD instructions, although they set all bits instead of just 1, because & is faster than *.

    a = 3 + (-(b < c) & 5);
May 19, 2022
On Thursday, 19 May 2022 at 03:46:42 UTC, Walter Bright wrote:
> On 5/18/2022 5:55 PM, max haughton wrote:
>> People do indeed (I'd question whether it's routine in a good D program, I'd flag it in code review) manipulate characters as integers, but I think there's something to be said for forcing people to go char -> suitable integer -> char.
>
> Casts are a common source of bugs, not correctness. This is because it is forced override of the type system. If the types change due to refactoring, the cast may no longer be correct, but the programmer will have no way of knowing.

This can be solved by a cast with explicit source and destination type, e.g.


auto cast_from(From,To,Real)(Real a)
{
	static if (is (From==Real))
		return cast(To) a;
	else
		pragma(msg, "Wrong types");
}

void main()
{
    import std.range, std.stdio;
	
	short a=1;
	int b=cast_from!(short,int)(a);

	bool c=1;
	// int d=cast_from!(short,int)(a);	// compile time error
	
    writeln("Test: ", b);
}

May 19, 2022
On Thursday, 19 May 2022 at 03:46:42 UTC, Walter Bright wrote:
>
> Casts are a common source of bugs, not correctness. This is because it is forced override of the type system. If the types change due to refactoring, the cast may no longer be correct, but the programmer will have no way of knowing.

I'd argue that implicit casts are more so in some cases.

This is one of those cases.

And also you shouldn't really do arithmetic operations on chars anyway, at least not with unicode and D is supposed to be a unicode language.

Upper-casing in unicode is not as simple as an addition, because the rules for doing so are language specific.

Changing case in one language isn't always the same as in another language.

Even with ASCII you can't just rely on a mathematic computation, because not all characters can change case, such as symbols.

That's why string/char manipulation should __always__ be a library solution, not a user-code solution. The library should handle all these rules.

The user should absolutely not be able/have to to mess this up by accident, unless they really really want to.

Sure a char might be represented by an integer type, but so is every single data type you can ever think of since they all convert to bytes.

If D is to ever attract more users, then it must not surprise new users.
May 19, 2022
On Thursday, 19 May 2022 at 04:10:04 UTC, Walter Bright wrote:
>
> There's nothing wrong with:
>
>     if ('A' <= c && c <= 'Z')
>         c = c | 0x20;
>

There is, this assumes that the character is ascii and not unicode.

What about say 'Å' -> 'å'?

It won't work for that.

So your code is wrong in D because D isn't an ascii langauge, but a unicode language.

As specified by the spec:

char	'\xFF'	unsigned 8 bit (UTF-8 code unit)
wchar	'\uFFFF'	unsigned 16 bit (UTF-16 code unit)
dchar	'\U0000FFFF'	unsigned 32 bit (UTF-32 code unit)
May 19, 2022

On Thursday, 19 May 2022 at 00:27:24 UTC, Walter Bright wrote:

>

People routinely manipulate chars as integer types, for example, in converting case. Making them not integer types means lots of casting will become necessary, and overall that's a step backwards.

it is indeed but let's be honest, having builtin char, wchar, and dchar, is only usefull for overload resolution and string literals.

ubyte c = 's'; // OK
ubyte[] a = "s".dup; // NG

without the string literal problem, there's only the overload resolution one and for this one builtin character types could be library types, e.g struct wrapping ubyte, ushort, uint.

May 19, 2022

On 5/19/22 12:35 AM, Walter Bright wrote:

>

On 5/18/2022 5:47 PM, Steven Schveighoffer wrote:

>

But I have little hope for it, as Walter treats a boolean as an integer.

They are integers.

The APL language relies on bools being integers so conditional operations can be carried out without branching :-)

I.e.
    a = (b < c) ? 8 : 3;

becomes:

    a = 3 + (b < c) * 5;   // I know this is not APL syntax

That works in D, too!

I hope we are not depending on the type system to the degree where a bool must be an integer in order to have this kind of optimization.

-Steve

May 19, 2022

On Thursday, 19 May 2022 at 14:33:14 UTC, Steven Schveighoffer wrote:

>

On 5/19/22 12:35 AM, Walter Bright wrote:

>

On 5/18/2022 5:47 PM, Steven Schveighoffer wrote:

>

But I have little hope for it, as Walter treats a boolean as an integer.

They are integers.

The APL language relies on bools being integers so conditional operations can be carried out without branching :-)

I.e.
    a = (b < c) ? 8 : 3;

becomes:

    a = 3 + (b < c) * 5;   // I know this is not APL syntax

That works in D, too!

I hope we are not depending on the type system to the degree where a bool must be an integer in order to have this kind of optimization.

-Steve

That and you can have the underlying type without exposing it to the programmer. "Bools are integers", as opposed to bools not having a memory representation at all?

Basically any discussion of these peephole optimizations (if this is more than just a nice to have) is a bit silly in the age where GCC and LLVM will both reach this kind of code anywhere because they want to eliminate branches like the plague (even if they couldn't do it in the first place given that you'd need to tell it what a bool is)

May 19, 2022

On Thursday, 19 May 2022 at 14:33:14 UTC, Steven Schveighoffer wrote:

>

On 5/19/22 12:35 AM, Walter Bright wrote:

>

On 5/18/2022 5:47 PM, Steven Schveighoffer wrote:

>

But I have little hope for it, as Walter treats a boolean as an integer.

They are integers.

The APL language relies on bools being integers so conditional operations can be carried out without branching :-)

I.e.
    a = (b < c) ? 8 : 3;

becomes:

    a = 3 + (b < c) * 5;   // I know this is not APL syntax

That works in D, too!

I hope we are not depending on the type system to the degree where a bool must be an integer in order to have this kind of optimization.

-Steve

if you use bool as integer types in LLVM

true + true

overflows and you basically get 0 because only 1 bit is read.

May 19, 2022
On Thursday, 19 May 2022 at 04:35:45 UTC, Walter Bright wrote:
> On 5/18/2022 5:47 PM, Steven Schveighoffer wrote:
>> But I have little hope for it, as Walter treats a boolean as an integer.
>
> They *are* integers.
> 

I always thought them as integers, yesterday I was adding some new features do addam_d_ruppes' IRC client and I did:

   auto pgdir = (ev.key == Keyboard.Key.PageDown)-(ev.key == KeyboardEvent.Key.PageUp);

So to get: -1, 0 or 1, and do the next action according the input given from the user.

Matheus.