Updating D beyond Unicode 2.0 (page 5)

On 9/22/2018 6:01 PM, Jonathan M Davis wrote: > For better or worse, English is the international language of science and > engineering, and that includes programming. In the earlier days of D, I put on the web pages a google widget what would automatically translate the page into any language google supported. This was eventually removed (not by me) because nobody wanted it. Nobody (besides me) even noticed it was removed. And the D community is a very international one. Supporting Unicode in identifiers gives users a false sense that it's a good idea to use them. Lots of programming tools don't work well with Unicode. Even Windows doesn't by default - you've got to run "chcp 65001" each time you open a console window. Filesystems don't work reliably with Unicode. Heck, the reason module names should be lower case in D is because mixed case doesn't work reliably across filesystems. D supports Unicode in identifiers because C and C++ do, and we want to be able to interoperate with them. Extending Unicode identifier support off into other directions, especially ones that break such interoperability, is just doing a disservice to users.

On Sunday, 23 September 2018 at 21:12:13 UTC, Walter Bright wrote: > D supports Unicode in identifiers because C and C++ do, and we want to be able to interoperate with them. Extending Unicode identifier support off into other directions, especially ones that break such interoperability, is just doing a disservice to users. I always thought D supported Unicode with the goal of going forward with it while C was stuck with ASCII: http://www.drdobbs.com/cpp/time-for-unicode/228700405 "The D programming language has already driven stakes in the ground, saying it will not support 16 bit processors, processors that don't have 8 bit bytes, and processors with crippled, non-IEEE floating point. Is it time to drive another stake in and say the time for Unicode has come? " Have you changed your mind since?

On 9/23/2018 3:23 PM, Neia Neutuladh wrote: > Okay, that's why you previously selected C99 as the standard for what characters to allow. Do you want to update to match C11? It's been out for the better part of a decade, after all. I wasn't aware it changed in C11.

On 23/09/18 15:38, sarn wrote: > On Sunday, 23 September 2018 at 06:53:21 UTC, Shachar Shemesh wrote: >> On 23/09/18 04:29, sarn wrote: >>> You can find a lot more Japanese D code on this blogging platform: >>> https://qiita.com/tags/dlang >>> >>> Here's the most recent post to save you a click: >>> https://qiita.com/ShigekiKarita/items/9b3aa8f716848278ef62 >> >> Comments in Japanese. Identifiers in English. Not advancing your point, I think. >> >> Shachar > > Well, I knew that when I posted, so I honestly have no idea what point you assumed I was making. I don't know what point you were trying to make. That's precisely why I posted. I don't think D currently or ever enforces what type of (legal UTF-8) text you could use in comments or strings. This thread is about what's legal to use in identifiers. The example you brought does not use Unicode in identifiers, and is, therefor, irrelevant to the discussion we're having. That was the point *I* was trying to make. Shachar

On Monday, 24 September 2018 at 01:39:43 UTC, Walter Bright wrote: > On 9/23/2018 3:23 PM, Neia Neutuladh wrote: >> Okay, that's why you previously selected C99 as the standard for what characters to allow. Do you want to update to match C11? It's been out for the better part of a decade, after all. > > I wasn't aware it changed in C11. http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf page 522 (PDF numbering) or 504 (internal numbering). Outside the BMP, almost everything is allowed, including many things that are not currently mapped to any Unicode value. Within the BMP, a heck of a lot of stuff is allowed, including a lot that D doesn't currently allow. GCC hasn't even updated to the C99 standard here, as far as I can tell, but clang-5.0 is up to date.

On Monday, 24 September 2018 at 01:32:38 UTC, Walter Bright wrote: > D the language is well suited to the development of Unicode apps. D source code is another matter. But in the article you specifically talk about the use of Unicode in the context of source code instead of apps: "With the D programming language, we continuously run up against the problem that ASCII has reached its expressivity limits." "There are the chevrons « and » which serve as another set of brackets to lighten the overburdened ambiguities of ( ). There are the dot-product and cross-product characters · and × which would make lovely infix operator tokens for math libraries. The greek letters would be great for math variable names."

September 24, 2018

Re: Updating D beyond Unicode 2.0

Posted by Jonathan M Davis
in reply to Dennis

Permalink

Jonathan M Davis

Posted in reply to Dennis

Permalink

On Monday, September 24, 2018 4:19:31 AM MDT Dennis via Digitalmars-d wrote:
> On Monday, 24 September 2018 at 01:32:38 UTC, Walter Bright wrote:
> > D the language is well suited to the development of Unicode apps. D source code is another matter.
>
> But in the article you specifically talk about the use of Unicode in the context of source code instead of apps:
>
> "With the D programming language, we continuously run up against the problem that ASCII has reached its expressivity limits."
>
> "There are the chevrons « and » which serve as another set of brackets to lighten the overburdened ambiguities of ( ). There are the dot-product and cross-product characters · and × which would make lovely infix operator tokens for math libraries. The greek letters would be great for math variable names."

Given that the typical keyboard has none of those characters, maintaining code that used any of them would be a royal pain. It's one thing if they're used in the occasional string as data, but it's quite another if they're used as identifiers or operators. I don't see how that would be at all maintainable. You'd be forced to constantly copy and paste rather than type.

- Jonathan M Davis

On Monday, 24 September 2018 at 10:36:50 UTC, Jonathan M Davis wrote: > Given that the typical keyboard has none of those characters, maintaining code that used any of them would be a royal pain. Note that I'm not trying to argue either way, it's just that I used to think of Walter's stance on D and Unicode as: "D would fully embrace Unicode if only editors/debuggers etc. would embrace it too" But now I read: > D supports Unicode in identifiers because C and C++ do, and we want to be able to interoperate with them." So I wonder what changed. I guess it's mostly answered in the first reply: > When I originally started with D, I thought non-ASCII identifiers with Unicode was a good idea. I've since slowly become less and less enthusiastic about it.

Forums