May 29, 2014
Okay. That seriously got munged. Let's try that again...

On Tue, 27 May 2014 06:42:41 -1000
Andrei Alexandrescu via Digitalmars-d-announce
<digitalmars-d-announce@puremagic.com> wrote:

> http://www.reddit.com/r/programming/comments/26m8hy/scott_meyers_dconf_2014_keynote_the_last_thing_d/
>
> https://news.ycombinator.com/newest (search that page, if not found
> click "More" and search again)
>
> https://www.facebook.com/dlang.org/posts/855022447844771
>
> https://twitter.com/D_Programming/status/471330026168651777

Fortunately, for the most part, I think that we've avoided the types of inconsistencies that Scott describes for C++, but we do definitely have some of our own. The ones that come to mind at the moment are:

1. The order of the dimensions of multi-dimensional static arrays is backwards in comparison to what most everyone expects.

    int[4][5][6] foo;

is the same as

    int foo[6][5][4];

and has the same dimensions as

    auto bar = new int[][][](6, 5, 4);

The reasons for it stem from the fact that the compiler reads types outward from the variable name (which is very important to understand in C because of its function pointer syntax but not so important in D). However, once we did

    const(int)* foo;

and didn't allow

    (int)const* foo;

I think that we threw that particular bit of consistency with C/C++ out the window, and we really should have just made static array dimensions be read from left-to-right. Unfortunately, I don't think that we can fix that at this point, because doing so would cause silent breakage (or at minimum, would be silent until RangeErrors were thrown at runtime).


2. We're inconsistent with dynamic array dimensions.

    auto foo = new int[5];

is the same as

    auto foo = new int[](5);

but once you get into multi-dimensional arrays, it's just confusing, because

    auto foo = new int[4][5][6];

does _not_ declare a multi-dimensional dynamic array but rather a dynamic array of length 6 which contains a multi-dimensonal static array of dimensions 4 and 5. Instead, what you need to do is

    auto foo = new int[][][](4, 5, 6);

IMHO, we should have made it illegal to have dynamic array dimensions inside of the brackets rather than the parens, but I don't know if we can change that. It wouldn't be silent breakage, but it _would_ make it so that a lot of existing code would be broken - especially because so many people put the array dimensions between the brackets for single-dimension dynamic arrays.


3. const, immutable, and inout on the left-hand side of a function declaration are unfortunately legal. This inevitably trips people up, because they think that the attribute applies to the return type, when it applies to the function itself. This is to make the function attributes consistent, because all of the others can go on either side, but the result is that it's essentially bad practice to ever put any attribute on the left-hand side which could apply to the return type, because it looks like a bug. If we just made it illegal for those attributes to go on the left, the problem would be solved, and the result would be far less confusing and bug-prone. I think that we can make that change with minimal breakage (since it's already bad practice to put them no the left-hand side), but AFAIK, Walter is against the idea.


4. There are some cases (such as with static constructors and unittest blocks) that the attributes have to go on the left for some reason. I don't remember the reasons for it, but it's an inconsistency which definitely trips up even seasoned D programmers from time to time.


5. The fact that pure is called pure is very problematic at this point as far as explaining things to folks goes. We should probably consider renaming it to something like @noglobal, but I'm not sure that that would go over very well given the amount of breakage involved. It _does_ require a lot of explaining though.


6. The situation with ranges and string is kind of ugly, with them being treated as ranges of code points. I don't know what the correct solution to this is, since treating them as ranges of code units promotes efficiency but makes code more error-prone, whereas treating them as ranges of graphemes would just cost too much. Ranges of code points is _mostly_ correct but still incorrect and _more_ efficient than graphemes but still quite a bit less efficient than code units. So, it's kind of like it's got the best and worst of both worlds. The current situation causes inconsistencies with everything else (forcing us to use isNarrowString all over the place) and definitely requires frequent explaining, but it does prevent some classes of problems. So, I don't know. I used to be in favor of the current situation, but at this point, if we could change it, I think that I'd argue in faver of just treating them as ranges of code units and then have wrappers for ranges of code points or graphemes. It seems like the current situation promotes either using ubyte[] (if you care about efficiency) or the new grapheme facilities in std.uni if you care about correctness, whereas just using strings as ranges of dchar is probably a bad idea unless you just don't want to deal with any of the Unicode stuff, don't care all that much about efficiency, and are willing have bugs in the areas where operating at the code point level is incorrect.


7. There are several minor inconsistencies with local imports and nested functions in comparison to module-level imports or free functions, and I think that some of those should be fixed, but I'm not sure that all of them can be.


That's what I can think of at the moment (though I'm sure that there are others, and this post is already probbaly too long). So, we definitely have our own consistency issues, but I do think that we're still far better off than C++ in that regard. Fortunately, while Phobos still has some naming issues, a lot of the naming inconsistencies were sorted out a couple of years ago, and we have solved a number of other inconsistencies in the language and library over time, so if anything, we've probably been _reducing_ the number of inconsistencies that we have rather than increasing them. But we should look at reducing them further if we can and should _definitely_ keep an eye out for areas where more inconsistencies could creep in.

- Jonathan M Davis
May 29, 2014
On Thursday, 29 May 2014 at 03:29:31 UTC, Jonathan M Davis via Digitalmars-d-announce wrote:
> 1. The order of the dimensions of multi-dimensional static arrays is backwards
> in comparison to what most everyone expects.
>
>     int[4][5][6] foo;
>
> is the same as
>
>     int foo[6][5][4];
>
> and has the same dimensions as
>
>     auto bar = new int[][][](6, 5, 4);
>
> The reasons for it stem from the fact that the compiler reads types outward
> from the variable name (which is very important to understand in C because of
> its function pointer syntax but not so important in D). However, once we did
>
>     const(int)* foo;
>
> and didn't allow
>
>     (int)const* foo;
>
> I think that we threw that particular bit of consistency with C/C++ out the
> window, and we really should have just made static array dimensions be read
> from left-to-right. Unfortunately, I don't think that we can fix that at this
> point, because doing so would cause silent breakage (or at minimum, would be
> silent until RangeErrors were thrown at runtime).

I don't see this as an inconsistency. Just read it as follows:

    int[6][5]* foo;

- start with the type int
- make an array from it
- make an array from that
- and finally, turn it into a pointer.

    const(int)* bar;

Just read `const(int)` as one entity here (as its form suggests, some kind of "function call"):

- start with a const(int)
- make a pointer from it

> 3. const, immutable, and inout on the left-hand side of a function declaration are unfortunately legal.

Agreed. At least it's possible to do it by convention (but see 4.).

> 4. There are some cases (such as with static constructors and unittest blocks)
> that the attributes have to go on the left for some reason. I don't remember
> the reasons for it, but it's an inconsistency which definitely trips up even
> seasoned D programmers from time to time.

I don't know these cases, but the reason might be is that function declarations and unittests need to be followed by braces (or a semicolon in the case of functions), whereas some other keywords also allow non-compound statements. This could therefore lead to ambiguities as to whether the type qualifier applies to the declaration or the following statement.

> 5. The fact that pure is called pure is very problematic at this point as far
> as explaining things to folks goes. We should probably consider renaming it to
> something like @noglobal, but I'm not sure that that would go over very well
> given the amount of breakage involved. It _does_ require a lot of explaining
> though.

Well, it's just a name, and it's for hysterical raisins ;-) I don't think it's so bad, because the purity concept already differs from language to language.

> 6. The situation with ranges and string is kind of ugly, with them being
> treated as ranges of code points. I don't know what the correct solution to
> this is, since treating them as ranges of code units promotes efficiency but
> makes code more error-prone, whereas treating them as ranges of graphemes
> would just cost too much. Ranges of code points is _mostly_ correct but still
> incorrect and _more_ efficient than graphemes but still quite a bit less
> efficient than code units. So, it's kind of like it's got the best and worst
> of both worlds. The current situation causes inconsistencies with everything
> else (forcing us to use isNarrowString all over the place) and definitely
> requires frequent explaining, but it does prevent some classes of problems.
> So, I don't know. I used to be in favor of the current situation, but at this
> point, if we could change it, I think that I'd argue in faver of just treating
> them as ranges of code units and then have wrappers for ranges of code points
> or graphemes. It seems like the current situation promotes either using
> ubyte[] (if you care about efficiency) or the new grapheme facilities in
> std.uni if you care about correctness, whereas just using strings as ranges of
> dchar is probably a bad idea unless you just don't want to deal with any of
> the Unicode stuff, don't care all that much about efficiency, and are willing
> have bugs in the areas where operating at the code point level is incorrect.

My preferred solution would be to disallow iterating over bare char/wchar/dchar ranges, but require an explicit .byCodeUnit, .byCodePoint or .byGrapheme. Probably not going to happen, though...