Proposal for Implicit Conversion of Types

April 04, 2006

Posted by Rioshin an'Harthen

Permalink

Rioshin an'Harthen

Permalink

(Note: This post is best viewed with a fixed-width font)

Proposal for Implicit Conversion of Types =========================================

This proposal spawned from the discussion "No more implicit conversion real->complex?!" between myself and Don Clugston after the change introduced in D version 0.150.


References
----------

See the thread "No more implicit conversion real->complex?!" http://www.digitalmars.com/d/archives/digitalmars/D/35512.html


Rationale
---------

D tries, and makes a good job of, being mathematically correct when it comes to eg. floating point variables initialized to NaN's. Although this has raised some controversy on the newsgroup, Walter's take on it has remained constant - if we could, we would have a corresponding initialization for integral values.

Mathematical correctness (at least for me) implies implicit conversion between real values and complex values, which was removed in version 0.150. Thus, the discussion between me and Don Clugston in the aforementioned thread, during which this proposal was mainly hashed out.


Problem
-------

The problem which 0.150 fixed was the following:

real sin(real x);
creal sin(creal c);

sin(3.0); // error - multiple matching overloads


Proposal
--------

The D language has 24 basic data types, divisible into 8 type families or types with the same archetype. These are:

 archetype | types (in order of smallest to largest)
-----------+-----------------------------------------
 void      | void
 bool      | bool
 cent      | byte, short, int, long, cent
 ucent     | ubyte, ushort, uint, ulong, ucent
 real      | float, double, real
 ireal     | ifloat, idouble, ireal
 creal     | cfloat, cdouble, creal
 dchar     | char, wchar, dchar

On a function call with multiple overloaded versions matching, do the following:

1. Try an exact match

   Look for a function with the signature (disregarding return type)
   exactly matching the call to said function.

   In the problem above, the 3.0 in the call to sin is double. We do not
   have a version of sin for double, so we can't match exactly.

2. Try a widening conversion

   Look for a function with the signature (disregarding return type)
   with the smallest size larger than the current within the same
   family of types.

   In the problem above, since 3.0 is a double, and we can't match an
   exact double, try looking for a real version, which we find. Use it.

3. Try a semantic changing conversion

   Look for a function with a signature where the semantics changes
   according to rules specified below. This time we switch to a type
   not having the same archetype as the current type. We still prefer
   the smallest type possible, eg. double cannot be converted into
   ifloat (loss of data), but idouble is possible.

   In the example above, remove the function real sin(real x). Now
   call it with the same 3.0 double. We can't match a double parameter,
   nor a widened floating point value, so try with a change in semantics,
   and since complex numbers are a superset of real numbers, try complex
   numbers. We match creal sin(creal c).

Semantic Changing Conversion

Allow the following semantic changing conversions implicitly (trying to remember what D currently allows), as well as those that are good to have:

 original | new archetypes
----------+----------------
 void     | -
 bool     | -
 cent     | bool, real, creal
 ucent    | bool, real, creal
 real     | creal
 ireal    | creal
 creal    | -
 dchar    | bool, cent

The main feature of this table is that narrowing conversions must be specified explicitly, while widening ones should go through as is. However, boolean conversions work the opposite, see below for more information.

For example, if the type float is to be converted, the preferred is to convert withing the family, first trying double, then real. If this is not possible, then try converting into complex numbers, first ifloat, second idouble, and finally ireal.

Another example, for double to be converted, we prefer trying for a real. If this fails, we try complex numbers. ifloat is too small a type to be able to handle the double, so we can't convert to it implicitly, so we try idouble first, and then ireal.

Boolean Conversions

The conversion from bool to integral values are not a part of
this proposal. The conversion of integers and unsigned integers
to bools is allowed as implicit, mainly because a lot of code
uses the feature in question. This proposal also decided to
treat character types for implicit conversion into bools,
to allow easier testing of the null character when handling
reading characters from a stream.

This is because a boolean is not an integer or a character, but integers and characters can be treated as booleans. Real numbers can easily be treated as booleans as well, but it is much less common to see code written treating reals as bools, and treating imaginary and complex numbers as booleans make less sense. Thus, the decision was made to not allow implicit conversion of all floating point types, including the real, imaginary, and complex numbers, into boolean true/false values.

Multi-Argument Functions

Multi-argument functions pose a problem for implicit
conversion. Given (example by Don Clugston):

1: func(real, creal);
2: func(creal, real);
3: func(creal, creal);

which alternative should

func(3.0, 2.0)

match, or should it match at all? Originally, I said the third one; now I'm not so sure about that. After thinking about this some more for the purpose of writing this proposal, I'm more inclined to having the compiler give an error in this case.

However, giving an error is not an ideal solution - some scheme to select a function to call might be better. At least an error forces the developer to think which version he wants to call, and add explicit casts where necessary, or writing a wrapper like

real func(real a, real b)
{
   return cast(real) func(cast(creal) a, cast(creal) b);
}

for it.


Errors and Warnings
-------------------

I am going to describe command line flags passed to the compiler
as if the compiler was gcc, which I know best. This is just
for my own ease, and it should be easily understood and possible
to convert to dmd-specific flags.

The compiler should generate an error if steps 1 to 3 in the above proposal fail.

The compiler should also generate an error if there are multiple legal matches to a multi-argument function, see the topic above.

It would be good to have a compiler flag, eg. -Wsemantic, to optionally display warnings for semantic-changing implicit conversions. This is so that the author of software can, if necessary, spot possible bugs relating to implicit conversions allowed between semantic borders.

-- 
  Mikael Segercrantz
  software engineer

"Don Clugston" <dac@nospam.com.au> wrote in message news:e104fr$1hgk$1@digitaldaemon.com...
> Rioshin an'Harthen wrote:
>> Proposal for Implicit Conversion of Types =========================================
>>
>> This proposal spawned from the discussion "No more implicit conversion real->complex?!" between myself and Don Clugston after the change introduced in D version 0.150.
>
>> archetype | types (in order of smallest to largest)
>> -----------+-----------------------------------------
>>  void      | void
>>  bool      | bool
>>  cent      | byte, short, int, long, cent
>>  ucent     | ubyte, ushort, uint, ulong, ucent
>>  real      | float, double, real
>>  ireal     | ifloat, idouble, ireal
>>  creal     | cfloat, cdouble, creal
>>  dchar     | char, wchar, dchar
>
>
> Very well presented!

Thank you. :)

> There's one aspect that I think could be a problem -- conversions between
> signed and unsigned types.
> As written, that would mean that ushort -> ulong is preferred over
> ushort ->short. Since the language currently allows short and ushort to be
> interchanged without error (unless you enable warnings), I don't think the
> lookup rules can be different to that.
>
> So I think that right now, the cent/ucent categories will need to be combined: if more than one such conversion is possible, it's an error. Otherwise we end up in the 'should signed/unsigned conversions be an error' debate which has historically been unfruitful.

True, it is a problem with this proposal. It is not enough to add implicit semantic-changing conversions between the cent and ucent archetypes, since it would still prefer the widening conversions.

However, I feel doubtful about combining the archetypes together. I would
actually prefer to force signed to unsigned conversions to be explicit,
since
negative numbers can be "lost". Unsigned to signed is not that much of a
problem, since (unless the type is ucent), it is possible to convert to a
wider
type, eg. ushort to int.

Still, I am willing to allow the combination of those archetypes, even
though
I feel the doubt I mentioned. It would allow existing code to continue
working,
which is one of the main intents of the proposal, although I took some
liberty
with the boolean type in this case.

Forums