Jump to page: 1 2 3
Thread overview
Strange implicit conversion integers on concatenation
Nov 05, 2018
uranuz
Nov 05, 2018
Adam D. Ruppe
Nov 05, 2018
uranuz
Nov 05, 2018
H. S. Teoh
Nov 05, 2018
Jonathan M Davis
Nov 05, 2018
bachmeier
Nov 05, 2018
Jonathan M Davis
Nov 06, 2018
Neia Neutuladh
Nov 05, 2018
12345swordy
Nov 05, 2018
Jonathan M Davis
Nov 06, 2018
12345swordy
Nov 06, 2018
Jonathan M Davis
Nov 05, 2018
H. S. Teoh
Nov 06, 2018
Jonathan M Davis
Nov 05, 2018
lngns
Nov 05, 2018
Paul Backus
Nov 05, 2018
lithium iodate
Nov 12, 2018
12345swordy
Nov 12, 2018
Adam D. Ruppe
Nov 12, 2018
12345swordy
Nov 13, 2018
bachmeier
Nov 13, 2018
Jonathan M Davis
November 05, 2018
Hello to everyone! By mistake I typed some code like the following without using [std.conv: to] and get strange result. I believe that following code shouldn't even compile, but it does and gives non-printable symbol appended at the end of string.
The same problem is encountered even without [enum]. Just using plain integer value gives the same. Is it a bug or someone realy could rely on this behaviour?

import std.stdio;

enum TestEnum: ulong {
   Item1 = 2,
   Item3 = 5
}

void main()
{
    string res = `Number value: ` ~ TestEnum.Item1;
    writeln(res);
}

Output:
Number value: 
November 05, 2018
On Monday, 5 November 2018 at 15:36:31 UTC, uranuz wrote:
> I believe that following code shouldn't even compile, but it does and gives non-printable symbol appended at the end of string.

Me too, this is a design flaw in the language. Following C's example, int and char can convert to/from each other. So string ~ int will convert int to char (as in reinterpret cast) and append that.

It is just the way it is, alas.
November 05, 2018
It adds the equivalent char value, which is interpreted as ASCII when printing.
You can append an object of its underlying (or a compatible) type to an array.

```
void main()
{
    import std.stdio : writeln;

    writeln("hello " ~ 42); //hello *
    writeln([42, 56] ~ 7); //[42, 56, 7]
}
```
November 05, 2018
On Monday, 5 November 2018 at 15:36:31 UTC, uranuz wrote:
> Hello to everyone! By mistake I typed some code like the following without using [std.conv: to] and get strange result. I believe that following code shouldn't even compile, but it does and gives non-printable symbol appended at the end of string.
> The same problem is encountered even without [enum]. Just using plain integer value gives the same. Is it a bug or someone realy could rely on this behaviour?
>
> import std.stdio;
>
> enum TestEnum: ulong {
>    Item1 = 2,
>    Item3 = 5
> }
>
> void main()
> {
>     string res = `Number value: ` ~ TestEnum.Item1;
>     writeln(res);
> }
>
> Output:
> Number value: 

It seems like the integer 2 is being implicitly converted to a char--specifically, the character U+0002 Start of Text.

Normally, a ulong wouldn't be implicitly convertible to a char, but compile-time constants appear to get special treatment as long as their values are in the correct range. If you try it with a number too big to fit in a char, you get an error:

void main()
{
    import std.stdio;
    writeln("test " ~ 256);
}

Error: incompatible types for ("test ") ~ (256): string and int
November 05, 2018
On Monday, 5 November 2018 at 15:36:31 UTC, uranuz wrote:
> Hello to everyone! By mistake I typed some code like the following without using [std.conv: to] and get strange result. I believe that following code shouldn't even compile, but it does and gives non-printable symbol appended at the end of string.
> The same problem is encountered even without [enum]. Just using plain integer value gives the same. Is it a bug or someone realy could rely on this behaviour?

As long as the integral value is statically known to be a valid code point and fits into the numerical rangle of the char-type (which plain 'char' in this case), automatic conversion is done. If you replace the values inside your enum with something bigger (> 255) or negative, you will see that the compiler doesn't universally allow all such automatic integral->char conversions. You can also see this effect when you declare a mutable vs an immutable integer and try to append it to a regular string, the mutable one will fail. (anything that can be larger than 1114111 will always fail, as far as I can tell)
Some consider this useful behavior, but it's not uncontroversial.
November 05, 2018
On Monday, 5 November 2018 at 15:58:40 UTC, Adam D. Ruppe wrote:
> On Monday, 5 November 2018 at 15:36:31 UTC, uranuz wrote:
>> I believe that following code shouldn't even compile, but it does and gives non-printable symbol appended at the end of string.
>
> Me too, this is a design flaw in the language. Following C's example, int and char can convert to/from each other. So string ~ int will convert int to char (as in reinterpret cast) and append that.
>
> It is just the way it is, alas.

Ok. It's because string is array of char. And int could be implicitly converted to char if it fits the range.
November 05, 2018
On Mon, Nov 05, 2018 at 03:58:40PM +0000, Adam D. Ruppe via Digitalmars-d wrote:
> On Monday, 5 November 2018 at 15:36:31 UTC, uranuz wrote:
> > I believe that following code shouldn't even compile, but it does and gives non-printable symbol appended at the end of string.
> 
> Me too, this is a design flaw in the language. Following C's example, int and char can convert to/from each other. So string ~ int will convert int to char (as in reinterpret cast) and append that.
> 
> It is just the way it is, alas.

I have said before, and will continue to say, that I think implicit conversion between char and non-char types in D does not make sense.

In C, converting between char and int is very common because of the conflation of char with byte, but in D we have explicit types for byte and ubyte, which should take care of any of those kinds of use cases, and char is explicitly defined to be a UTF8 code unit.  Now sure, there are cases where you want to get at the numerical value of a char -- that's what cast(int) and cast(char) is for.  But *implicitly* converting between char and int, especially when we went through the trouble of defining a separate type for char that stands apart from byte/ubyte, does not make any sense to me.

This problem is especially annoying with function overloads that take char vs. byte: because of implicit conversion, often the wrong overload ends up getting called WITHOUT ANY WARNING.  Once, while refactoring some code, I changed a representation of an object from char to a byte ID, but in order to do the refactoring piecemeal, I needed to overload between byte and char so that older code will continue to compile while the refactoring is still in progress.  Bad idea.  All sorts of random problems and runtime crashes happened because C's stupid int conversion rules were liberally applied to D types, causing a gigantic mess where you never know which overload will get called. (Well OK, it's predictable if you sit down and work it out, but it's just plain annoying when a lousy char literal calls the byte overload whereas a char variable calls the char overload.)  I ended up having to wrap the type in a struct just to stop the implicit conversion from tripping me up.


T

-- 
Some days you win; most days you lose.
November 05, 2018
n Monday, November 5, 2018 9:31:56 AM MST H. S. Teoh via Digitalmars-d wrote:
> On Mon, Nov 05, 2018 at 03:58:40PM +0000, Adam D. Ruppe via Digitalmars-d
wrote:
> > On Monday, 5 November 2018 at 15:36:31 UTC, uranuz wrote:
> > > I believe that following code shouldn't even compile, but it does and gives non-printable symbol appended at the end of string.
> >
> > Me too, this is a design flaw in the language. Following C's example, int and char can convert to/from each other. So string ~ int will convert int to char (as in reinterpret cast) and append that.
> >
> > It is just the way it is, alas.
>
> I have said before, and will continue to say, that I think implicit conversion between char and non-char types in D does not make sense.
>
> In C, converting between char and int is very common because of the conflation of char with byte, but in D we have explicit types for byte and ubyte, which should take care of any of those kinds of use cases, and char is explicitly defined to be a UTF8 code unit.  Now sure, there are cases where you want to get at the numerical value of a char -- that's what cast(int) and cast(char) is for.  But *implicitly* converting between char and int, especially when we went through the trouble of defining a separate type for char that stands apart from byte/ubyte, does not make any sense to me.
>
> This problem is especially annoying with function overloads that take char vs. byte: because of implicit conversion, often the wrong overload ends up getting called WITHOUT ANY WARNING.  Once, while refactoring some code, I changed a representation of an object from char to a byte ID, but in order to do the refactoring piecemeal, I needed to overload between byte and char so that older code will continue to compile while the refactoring is still in progress.  Bad idea.  All sorts of random problems and runtime crashes happened because C's stupid int conversion rules were liberally applied to D types, causing a gigantic mess where you never know which overload will get called. (Well OK, it's predictable if you sit down and work it out, but it's just plain annoying when a lousy char literal calls the byte overload whereas a char variable calls the char overload.)  I ended up having to wrap the type in a struct just to stop the implicit conversion from tripping me up.

+1

Unfortunately, I don't know how reasonable it is to fix it at this point, much as I would love to see it fixed. Historically, I don't think that Walter could have been convinced, but based on some of the stuff he's said in recent years, I think that he'd be much more open to the idea now. However, even if he could now be convinced that ideally the conversion wouldn't exist, I don't know how easy it would be to get a DIP through when you consider the potential code breakage. But maybe it's possible to do it in a smooth enough manner that it could work - especially when many of the kind of cases where you might actually _want_ such a conversion already require casting anyway thanks to the rules about integer promotions and narrowing conversions (e.g. when adding or subtracting from chars). Regardless, it would have to be well-written DIP with a clean transition scheme. Having that DIP on removing the implicit conversion of integer and character literals to bool be accepted would be a start in the right direction though. If that gets rejected (which I sure hope that it isn't), then there's probably no hope for a DIP fixing the char situation.

- Jonathan M Davis



November 05, 2018
On Monday, 5 November 2018 at 21:11:27 UTC, Jonathan M Davis wrote:

> I don't know how reasonable it is to fix it at this point, much as I would love to see it fixed.

It's hard for me to see how it would be reasonable to not fix it. This is one of those ugly parts of the language that need to be evolved out of the language. If there's a reason to support this, it should be done with a compiler switch. I'm pretty sure that this was one of the weird things that hit me when I started with the language, it was frustrating, and it didn't make a good impression.

November 05, 2018
On Monday, 5 November 2018 at 21:11:27 UTC, Jonathan M Davis wrote:
> n Monday, November 5, 2018 9:31:56 AM MST H. S. Teoh via Digitalmars-d wrote:
>> On Mon, Nov 05, 2018 at 03:58:40PM +0000, Adam D. Ruppe via Digitalmars-d
> wrote:
>> > On Monday, 5 November 2018 at 15:36:31 UTC, uranuz wrote:
>> > > I believe that following code shouldn't even compile, but it does and gives non-printable symbol appended at the end of string.
>> >
>> > Me too, this is a design flaw in the language. Following C's example, int and char can convert to/from each other. So string ~ int will convert int to char (as in reinterpret cast) and append that.
>> >
>> > It is just the way it is, alas.
>>
>> I have said before, and will continue to say, that I think implicit conversion between char and non-char types in D does not make sense.
>>
>> In C, converting between char and int is very common because of the conflation of char with byte, but in D we have explicit types for byte and ubyte, which should take care of any of those kinds of use cases, and char is explicitly defined to be a UTF8 code unit.  Now sure, there are cases where you want to get at the numerical value of a char -- that's what cast(int) and cast(char) is for.  But *implicitly* converting between char and int, especially when we went through the trouble of defining a separate type for char that stands apart from byte/ubyte, does not make any sense to me.
>>
>> This problem is especially annoying with function overloads that take char vs. byte: because of implicit conversion, often the wrong overload ends up getting called WITHOUT ANY WARNING.
>>  Once, while refactoring some code, I changed a representation of an object from char to a byte ID, but in order to do the refactoring piecemeal, I needed to overload between byte and char so that older code will continue to compile while the refactoring is still in progress.  Bad idea.  All sorts of random problems and runtime crashes happened because C's stupid int conversion rules were liberally applied to D types, causing a gigantic mess where you never know which overload will get called. (Well OK, it's predictable if you sit down and work it out, but it's just plain annoying when a lousy char literal calls the byte overload whereas a char variable calls the char overload.)  I ended up having to wrap the type in a struct just to stop the implicit conversion from tripping me up.
>
> +1
>
> Unfortunately, I don't know how reasonable it is to fix it at this point, much as I would love to see it fixed. Historically, I don't think that Walter could have been convinced, but based on some of the stuff he's said in recent years, I think that he'd be much more open to the idea now. However, even if he could now be convinced that ideally the conversion wouldn't exist, I don't know how easy it would be to get a DIP through when you consider the potential code breakage. But maybe it's possible to do it in a smooth enough manner that it could work - especially when many of the kind of cases where you might actually _want_ such a conversion already require casting anyway thanks to the rules about integer promotions and narrowing conversions (e.g. when adding or subtracting from chars). Regardless, it would have to be well-written DIP with a clean transition scheme. Having that DIP on removing the implicit conversion of integer and character literals to bool be accepted would be a start in the right direction though. If that gets rejected (which I sure hope that it isn't), then there's probably no hope for a DIP fixing the char situation.
>
> - Jonathan M Davis
We need to avoid the situation where we have to create a DIP for every unwanted implicit conversion with regards to calling the wrong overload function, we need
better way of doing this. No one wants to wait a year for a DIP approval for something that is very minor such as deprecating a implicit conversion for native data types.

I think a better course of action is to introduce the keywords explicit and implicit. Not as attributes though! I don't want to see functions with @nogc @nothrow safe pure @explicit as that is too much verbiage and hard to read! Which brings up the question of which parameter exactly is explicit?

It much easier to read: void example(int bar, explicit int bob)

The explicit keyword will become very important if we are to introduce the implicit keyword, as both of them are instrumental in creating types with structs.

I don't mind writing a DIP regarding this, as I think this is much easier for the DIP to be accepted then the other one that I currently have.

-Alexander
« First   ‹ Prev
1 2 3