February 06, 2018
On Tuesday, 6 February 2018 at 00:18:08 UTC, Jonathan M Davis wrote:
> On Monday, February 05, 2018 15:27:45 H. S. Teoh via Digitalmars-d wrote:
>> On Mon, Feb 05, 2018 at 01:56:33PM -0800, Walter Bright via Digitalmars-d
> wrote:
>> > The idea is a byte can be implicitly converted to a dchar, [...]
>>
>> This is the root of the problem.  Character types should never have been implicitly convertible to/from arithmetic integral types in the first place.
>
> +1
>
> Occasionally, it's useful, but in most cases, it just causes bugs - especially when you consider stuff like appending to a string.
>
> - Jonathan M Davis

I remember a fairly old defect, or maybe it was just a post in the Learn forum. Doing "string" ~ 0 would append '\0' to a string, because the int was auto-converted to a char. This still works today:

import std.stdio;
void main()
{
    string msg = "Hello" ~ 0 ~ " D" ~ '\0';
    writeln(msg);
    writeln(cast(ubyte[])msg);
    writeln(cast(ubyte[])"Hello D");
}
February 05, 2018
On 2/5/18 7:47 PM, Adam D. Ruppe wrote:
> On Tuesday, 6 February 2018 at 00:08:12 UTC, Steven Schveighoffer wrote:
>> I think the CPU has to do extra work to throw away that high bit, no?
> 
> No, the x86 has never had any trouble with this, and I don't think ARM does either (worst case you load it as int, then save it as byte).

But you are saving to an int, not a byte. Anyway, I don't really know the specifics of it. I'm not enough of an assembly buff to know the differences between what instructions are used or how much they cost.

-Steve
February 06, 2018
On Tuesday, 6 February 2018 at 01:07:21 UTC, Meta wrote:
> On Tuesday, 6 February 2018 at 00:18:08 UTC, Jonathan M Davis wrote:
>> On Monday, February 05, 2018 15:27:45 H. S. Teoh via Digitalmars-d wrote:
>>> On Mon, Feb 05, 2018 at 01:56:33PM -0800, Walter Bright via Digitalmars-d
>> wrote:
>>> > The idea is a byte can be implicitly converted to a dchar, [...]
>>>
>>> This is the root of the problem.  Character types should never have been implicitly convertible to/from arithmetic integral types in the first place.
>>
>> +1
>>
>> Occasionally, it's useful, but in most cases, it just causes bugs - especially when you consider stuff like appending to a string.
>>
>> - Jonathan M Davis
>
> I remember a fairly old defect, or maybe it was just a post in the Learn forum. Doing "string" ~ 0 would append '\0' to a string, because the int was auto-converted to a char. This still works today:
>
> import std.stdio;
> void main()
> {
>     string msg = "Hello" ~ 0 ~ " D" ~ '\0';
>     writeln(msg);
>     writeln(cast(ubyte[])msg);
>     writeln(cast(ubyte[])"Hello D");
> }

Yah that functionality should be deprecated. Idk how realistic that is, there might be a lot of code that relies on the implicit conversion, but at the very least the conversion should be disabled when it is part of a concat with a string. One of the reasons for the "~" operator (iirc) was so remove confusion with the "+" operator. This goes backwards to that.

Maybe in the future this could be possible:

static assert("Hello " ~ 10 ~ " world" == "Hello 10 world");

It'd help with CTFE code, I find having to convert integers to strings a lot. This is cleaner syntax and don't rely on std.conv.to. So it doesn't need to be imported, I'm not sure if DMD uses an intrinsic for that as well. I hope it does but can't be certain.
February 05, 2018
On 2/5/2018 3:18 PM, Timon Gehr wrote:
> Neither byte nor dchar are C types.

"byte" is a C "signed char". On Posix systems, a dchar maps to wchar_t, although wchar_t is a typedef not a distinct type. It's a bit complicated :-)


> The overloading rules are fine, but byte should not implicitly convert to char/dchar, and char should not implicitly convert to byte.

Maybe not, but casting back and forth between them is ugly. Pascal works this way, and it was one of the things I wound up hating about Pascal, all those ORD and CHR casts.

A reasonable case could be made for getting rid of all implicit conversions. But those are there for a reason - it makes writing code more natural and easy. It allows generic code to work without special casing. And the cost of it is sometimes you might make a mistake.
February 06, 2018
On Tuesday, 6 February 2018 at 02:30:03 UTC, Walter Bright wrote:
> Maybe not, but casting back and forth between them is ugly. Pascal works this way, and it was one of the things I wound up hating about Pascal, all those ORD and CHR casts.

It is a bit ironic hearing this from the guy who made a language where you have to cast a byte back to a byte! LOL
February 05, 2018
On 02/05/2018 04:21 PM, H. S. Teoh wrote:
> On Mon, Feb 05, 2018 at 09:20:16PM +0000, Nick Sabalausky via Digitalmars-d wrote:
>> But still, I thought we had value range propagation rules to avoid
>> this sort of nonsense when possible (such as the example above)?
> 
> VRP doesn't help when the code doesn't have compile-time known values,
> such as in the non-reduced code my example snippet was reduced from.
> 

Right, it wouldn't always get rid of the message, but I would think it should when it's known the value cannot be -128, such as the specific example you posted.
February 05, 2018
On Monday, February 05, 2018 18:30:03 Walter Bright via Digitalmars-d wrote:
> On 2/5/2018 3:18 PM, Timon Gehr wrote:
> > Neither byte nor dchar are C types.
>
> "byte" is a C "signed char". On Posix systems, a dchar maps to wchar_t, although wchar_t is a typedef not a distinct type. It's a bit complicated :-)
> > The overloading rules are fine, but byte should not implicitly convert
> > to
> > char/dchar, and char should not implicitly convert to byte.
>
> Maybe not, but casting back and forth between them is ugly. Pascal works this way, and it was one of the things I wound up hating about Pascal, all those ORD and CHR casts.
>
> A reasonable case could be made for getting rid of all implicit conversions. But those are there for a reason - it makes writing code more natural and easy. It allows generic code to work without special casing. And the cost of it is sometimes you might make a mistake.

In my experience, relatively little code needs to do arithmetic on characters. Some definitely does, but far more code does not, and that code ends up with bugs far too often because of integral types converting to characters. That's particularly true when stuff like string concatenation gets involved. I don't know how big a deal it is that characters can implicitly convert to integral types, and maybe that's okay, but I'm quite convinced that having integral types implicitly convert to characters was a mistake.

And as for generic code, my experience is that implicit conversions are an utter disaster there. It's way too easy to do something like have is(T : U) in your template constraint and have the code work great with a U but fail with types that implicitly convert to U. Ideally, all conversions would be done before the function is called, and if not, the conversion needs to be forced internally. Otherwise, you either get compilation errors with types that implicitly convert, or you get subtle bugs.

There are times when implicit conversions can be really nice, but IMHO, they have no business in generic code.

Also, the fact that (u)bytes and (u)shorts get promoted to (u)ints with arithmetic is the sort of thing that does not play at all nicely with generic code. That doesn't necessarily mean that it's a mistake for them to work that way, but it is a case where you're likely going to be forced to use explicit casts in generic code just to make those smaller integral types work when the code would work just fine for other types without the casts.

So, I really don't see arguments with regards to the implicit conversion of integral types to characters in generic code as holding much water given what we're doing with the smaller integral types and how the implicit conversion to character types is something that seems to keep resulting in folks posting about bugs caused by them in D.Learn - especially when string appending and concatenation get involved. Pretty much no one wants something like str ~= 0 to work, but it does, and particularly when you start throwing in stuff like the ternary operator, folks screw up and end up appending integers to strings.

- Jonathan M Davis

February 05, 2018
On 02/05/2018 09:30 PM, Walter Bright wrote:
> On 2/5/2018 3:18 PM, Timon Gehr wrote:
>> The overloading rules are fine, but byte should not implicitly convert to char/dchar, and char should not implicitly convert to byte.
> 
> Maybe not, but casting back and forth between them is ugly.

It *should* be ugly, it's conflating numerics with partial-characters.

Which, depending on the situation, you should either A. not be doing at all, or B. Be really freaking explicit about the fact that "yes, I know I'm mixing numerics with partial-characters here and it's for this very good reason XYZ." This isn't the age of ASCII. I can see how it could've been a pain in ASCII-land, but D doesn't live there.
February 05, 2018
On 2/5/18 10:05 PM, Nick Sabalausky (Abscissa) wrote:

> Right, it wouldn't always get rid of the message, but I would think it should when it's known the value cannot be -128, such as the specific example you posted.

VRP is only for one statement. That is, once you get to the next statement it "forgets" what the value range is.

-Steve
February 05, 2018
I had filed that last week:

https://issues.dlang.org/show_bug.cgi?id=18346
Issue 18346 - implicit conversion from int to char in `"foo" ~ 255`
should be illegal

we should deprecate it.


On Mon, Feb 5, 2018 at 7:14 PM, Nick Sabalausky (Abscissa) via Digitalmars-d <digitalmars-d@puremagic.com> wrote:
> On 02/05/2018 09:30 PM, Walter Bright wrote:
>>
>> On 2/5/2018 3:18 PM, Timon Gehr wrote:
>>>
>>> The overloading rules are fine, but byte should not implicitly convert to char/dchar, and char should not implicitly convert to byte.
>>
>>
>> Maybe not, but casting back and forth between them is ugly.
>
>
> It *should* be ugly, it's conflating numerics with partial-characters.
>
> Which, depending on the situation, you should either A. not be doing at all, or B. Be really freaking explicit about the fact that "yes, I know I'm mixing numerics with partial-characters here and it's for this very good reason XYZ." This isn't the age of ASCII. I can see how it could've been a pain in ASCII-land, but D doesn't live there.