Thread overview
Concatenation of ubyte[] to char[] works, but assignation doesn't
Oct 04, 2015
skilion
Oct 04, 2015
Jonathan M Davis
Oct 05, 2015
skilion
Oct 05, 2015
Marc Schütz
Oct 05, 2015
Jonathan M Davis
Oct 05, 2015
Marc Schütz
Oct 06, 2015
Jonathan M Davis
Oct 06, 2015
Marc Schütz
Oct 06, 2015
Jonathan M Davis
October 04, 2015
Is this allowed by the language or it is a compiler bug ?

void main() {
   char[] a = "abc".dup;
   ubyte[] b = [1, 2, 3];
   a = b;   // cannot implicitly convert expression (b) of type ubyte[] to char[]
   a ~= b;  // works
}
October 04, 2015
On Sunday, October 04, 2015 16:13:47 skilion via Digitalmars-d-learn wrote:
> Is this allowed by the language or it is a compiler bug ?
>
> void main() {
>     char[] a = "abc".dup;
>     ubyte[] b = [1, 2, 3];
>     a = b;   // cannot implicitly convert expression (b) of type
> ubyte[] to char[]
>     a ~= b;  // works
> }

When appending, b to a, the elements in b are being copied onto the end of a, and presumably it works in this case, because a ubyte is implicitly convertible to char. But all it's doing is converting the individual elements. It's not converting the array.

On other hand, assigning b to a would require converting the array, and array types don't implicitly convert to one another, even if their elements do.

Honestly, I think that the fact that the character types implicitly convert to and from the integral types of the corresponding size is problematic at best and error-prone at worst, since it almost never makes sense to do something like append a ubyte to string. However, if it didn't work, then you'd have to do a lot more casting when you do math on characters, which would cause its own set of potential bugs. So, we're kind of screwed either way.

- Jonathan M Davis

October 05, 2015
On Sunday, 4 October 2015 at 21:57:44 UTC, Jonathan M Davis wrote:
> When appending, b to a, the elements in b are being copied onto the end of a, and presumably it works in this case, because a ubyte is implicitly convertible to char. But all it's doing is converting the individual elements. It's not converting the array.
>
> ...
>

It make sense now, thanks.
October 05, 2015
On Sunday, 4 October 2015 at 21:57:44 UTC, Jonathan M Davis wrote:
> On Sunday, October 04, 2015 16:13:47 skilion via Digitalmars-d-learn wrote:
>> Is this allowed by the language or it is a compiler bug ?
>>
>> void main() {
>>     char[] a = "abc".dup;
>>     ubyte[] b = [1, 2, 3];
>>     a = b;   // cannot implicitly convert expression (b) of type
>> ubyte[] to char[]
>>     a ~= b;  // works
>> }
>
> When appending, b to a, the elements in b are being copied onto the end of a, and presumably it works in this case, because a ubyte is implicitly convertible to char. But all it's doing is converting the individual elements. It's not converting the array.
>
> On other hand, assigning b to a would require converting the array, and array types don't implicitly convert to one another, even if their elements do.
>
> Honestly, I think that the fact that the character types implicitly convert to and from the integral types of the corresponding size is problematic at best and error-prone at worst, since it almost never makes sense to do something like append a ubyte to string. However, if it didn't work, then you'd have to do a lot more casting when you do math on characters, which would cause its own set of potential bugs. So, we're kind of screwed either way.

I don't think math would be a problem. There are some obvious rules that would likely just work with most existing code:

char + int = char
char - int = char
char - char = int
char + char = ERROR
October 05, 2015
On Monday, October 05, 2015 09:07:34 Marc Schütz via Digitalmars-d-learn wrote:
> On Sunday, 4 October 2015 at 21:57:44 UTC, Jonathan M Davis wrote:
> > On Sunday, October 04, 2015 16:13:47 skilion via Digitalmars-d-learn wrote:
> >> Is this allowed by the language or it is a compiler bug ?
> >>
> >> void main() {
> >>     char[] a = "abc".dup;
> >>     ubyte[] b = [1, 2, 3];
> >>     a = b;   // cannot implicitly convert expression (b) of
> >> type
> >> ubyte[] to char[]
> >>     a ~= b;  // works
> >> }
> >
> > When appending, b to a, the elements in b are being copied onto the end of a, and presumably it works in this case, because a ubyte is implicitly convertible to char. But all it's doing is converting the individual elements. It's not converting the array.
> >
> > On other hand, assigning b to a would require converting the array, and array types don't implicitly convert to one another, even if their elements do.
> >
> > Honestly, I think that the fact that the character types implicitly convert to and from the integral types of the corresponding size is problematic at best and error-prone at worst, since it almost never makes sense to do something like append a ubyte to string. However, if it didn't work, then you'd have to do a lot more casting when you do math on characters, which would cause its own set of potential bugs. So, we're kind of screwed either way.
>
> I don't think math would be a problem. There are some obvious rules that would likely just work with most existing code:
>
> char + int = char
> char - int = char
> char - char = int
> char + char = ERROR

That depends on whether VRP can figure out that the result will fit. Otherwise, you'd be stuck with int. As it stands, all of these would just end up with int, I believe, though if they're assigned to a char and the int is a constant, then VRP may kick in and make a cast unnecessary.

But discussions on whether types like char, wchar, dchar or bool should be treated like integral types can get interesting. Walter clearly favors treating them as integral types where as a number of us would much prefer that they weren't. But both sides of the argument can show where the other side's desired behavior would be problematic.

- Jonathan M Davis


October 05, 2015
On Monday, 5 October 2015 at 10:30:02 UTC, Jonathan M Davis wrote:
> On Monday, October 05, 2015 09:07:34 Marc Schütz via Digitalmars-d-learn wrote:
>> I don't think math would be a problem. There are some obvious rules that would likely just work with most existing code:
>>
>> char + int = char
>> char - int = char
>> char - char = int
>> char + char = ERROR
>
> That depends on whether VRP can figure out that the result will fit. Otherwise, you'd be stuck with int. As it stands, all of these would just end up with int, I believe, though if they're assigned to a char and the int is a constant, then VRP may kick in and make a cast unnecessary.
>

I think Walter's argument for allowing the int <-> char conversions was that they are necessary to allow arithmetic. My rules show that it works without these implicit conversions.

VRP is a different problem, though. AFAICS, in the following code, VRP either is smart enough, or it isn't, no matter whether char implicitly converts to int.

int diff = 'a' - 'A';
char c = 'A';
char d = c + diff;
October 06, 2015
On Monday, October 05, 2015 11:48:51 Marc Schütz via Digitalmars-d-learn wrote:
> On Monday, 5 October 2015 at 10:30:02 UTC, Jonathan M Davis wrote:
> > On Monday, October 05, 2015 09:07:34 Marc Schütz via Digitalmars-d-learn wrote:
> >> I don't think math would be a problem. There are some obvious rules that would likely just work with most existing code:
> >>
> >> char + int = char
> >> char - int = char
> >> char - char = int
> >> char + char = ERROR
> >
> > That depends on whether VRP can figure out that the result will fit. Otherwise, you'd be stuck with int. As it stands, all of these would just end up with int, I believe, though if they're assigned to a char and the int is a constant, then VRP may kick in and make a cast unnecessary.
> >
>
> I think Walter's argument for allowing the int <-> char conversions was that they are necessary to allow arithmetic. My rules show that it works without these implicit conversions.
>
> VRP is a different problem, though. AFAICS, in the following code, VRP either is smart enough, or it isn't, no matter whether char implicitly converts to int.
>
> int diff = 'a' - 'A';
> char c = 'A';
> char d = c + diff;

Your suggestion only works by assuming that the result will fit in a char, which doesn't fit at all with how coversions are currently done in D. It would allow for narrowing conversions which lost data. And there's no way that Walter would go for that (and I don't think that he should). VRP solves the problem insofar as it can guarantee that the result will fit in the target type and thus reduces the need for casting, but simply assuming that char + int will fit in a char just doesn't work unless we're going to allow narrowing conversions to lose data, which we aren't.

If we were to allow the specific conversions that you're suggesting but only when VRP was used, then that could work, though it does make the implicit rules even screwier, becauses it becomes very dependent on how the int that you're trying to assign to a char was generated in the first place (straight assignment wouldn't work, but '0' - 40 would, whereas 'a' + 500 wouldn't, etc.). VRP already makes it a bit funky as it is, though mostly in a straightforward manner.

- Jonathan M Davis


October 06, 2015
On Tuesday, 6 October 2015 at 05:38:36 UTC, Jonathan M Davis wrote:
> Your suggestion only works by assuming that the result will fit in a char, which doesn't fit at all with how coversions are currently done in D. It would allow for narrowing conversions which lost data. And there's no way that Walter would go for that (and I don't think that he should). VRP solves the problem insofar as it can guarantee that the result will fit in the target type and thus reduces the need for casting, but simply assuming that char + int will fit in a char just doesn't work unless we're going to allow narrowing conversions to lose data, which we aren't.
>
> If we were to allow the specific conversions that you're suggesting but only when VRP was used, then that could work, though it does make the implicit rules even screwier, becauses it becomes very dependent on how the int that you're trying to assign to a char was generated in the first place (straight assignment wouldn't work, but '0' - 40 would, whereas 'a' + 500 wouldn't, etc.). VRP already makes it a bit funky as it is, though mostly in a straightforward manner.

I see, this is a new problem introduced by `char + int = char`. But at least the following could be disallowed without introducing problems:

    int a = 'a';
    char b = 32;

But strictly speaking, we already accept overflow (i.e. loss of precision) for ints, so it's a bit inconsistent to disallow it for chars.
October 06, 2015
On Tuesday, 6 October 2015 at 09:28:29 UTC, Marc Schütz wrote:
> I see, this is a new problem introduced by `char + int = char`. But at least the following could be disallowed without introducing problems:
>
>     int a = 'a';
>     char b = 32;
>
> But strictly speaking, we already accept overflow (i.e. loss of precision) for ints, so it's a bit inconsistent to disallow it for chars.

Yes, D does not have overflow, it has modular arithmetics. So the same argument would hold for an enumeration (like character ranges), do you want them to be modular (a circle) or monotonic (a line). Neither is a good fit for unicode. It probably would make most sense to split the unicode universe into multiple typed ranges, some enumerations, some non-enumerations and avoid char altogether.

October 06, 2015
On Tuesday, October 06, 2015 09:28:27 Marc Schütz via Digitalmars-d-learn wrote:
> I see, this is a new problem introduced by `char + int = char`. But at least the following could be disallowed without introducing problems:
>
>      int a = 'a';
>      char b = 32;

Sure, it would be nice, but I rather doubt that Walter would go for it. He seems to be fully in the camp of folks that think that life is better if charater types and bool always are treated as integral types - which unfortunately, creates fun problems like this

void foo(bool b) { writeln("bool"); }
void foo(long l) { writeln("long"); }

foo(1); // prints bool

In this case, adding on overload that takes int fixes the problem, because integer literals default to int, but in general, it's just stupid IMHO. But when it was brought up last, Walter didn't think that there was any problem with it and that it was just fine to require that the int overload be added to fix it.

> But strictly speaking, we already accept overflow (i.e. loss of precision) for ints, so it's a bit inconsistent to disallow it for chars.

Overflow is only permitted when doing arithmetic operations (when you really can't do anything about it except maybe throw an exception, which would be too expensive to be worth it) or when casting explicitly (in which case, you're telling the compiler that you don't care). Overflow is never allowed via implicit conversions.

- Jonathan M Davis