Why not extend do to allow unicode in ID's? (page 2)

On Monday, July 1, 2019 8:56:55 PM MDT rikki cattermole via Digitalmars-d wrote: > On 02/07/2019 2:17 PM, Jonathan M Davis wrote: > > On Saturday, June 29, 2019 4:38:06 PM MDT Bert via Digitalmars-d wrote: > >> It would greatly expand the coverage. > >> > >> It would be nice to use certain characters that are truly meaningful. > > > > ... > > > > Like most major languages, D supports identifiers with alphanumeric characters plus underscore with the first character not being allowed to be numeric. However, unlike most languages, it expands that to include Unicode alpha characters, meaning that quite a lot of Unicode is supported in identifiers. So, it already goes far beyond what most languages do. > > > > That being said, I think that you'll find that most folks will not be in favor of using Unicode in identifiers outside of code intended for people of a specific language who actually use those characters normally (e.g. Japanese characters when all of the programmers involved read and write Japanese and have keyboards that support it). The fact that a character is not a key on a typical keyboard means that anyone using an identifier with that charater in it will almost certainly have to copy-paste it, and that's really not going to over well with most people. If you really feel strongly about the matter, you can always create a DIP to propose a language change to allow more Unicode characters in identifiers, but I would not expect it to be accepted. > > > > - Jonathan M Davis > > No DIP is required. The lexer just needs updating to match to the > (current) Unicode spec. > > https://github.com/dlang/dmd/blob/master/src/dmd/lexer.d#L1082 If a character is a Unicode alpha character, then yes. However if it's not, then that would definitely be a language change and would require a DIP. The spec is quite specific about it requiring Unicode alpha characters, and the code does the same. Without looking at the Unicode spec, I have no clue which characters are alpha characters, but I'd be extremely surprised if a character like ± or ♥ qualified. - Jonathan M Davis

On Monday, 1 July 2019 at 23:52:25 UTC, Bert wrote: > It's time to grow up? How can progress be made if we don't progress. 99% of all modern text editors support UTF-8... with your logic we could say that ascii characters only complicate Editors maybe, but what about keyboards ? Can you easily write my name (Krejčiřík) without copy and paste or character selector tool ?

On 02.07.19 11:10, Martin Krejcirik wrote: > On Monday, 1 July 2019 at 23:52:25 UTC, Bert wrote: >> It's time to grow up? How can progress be made if we don't progress. 99% of all modern text editors support UTF-8... with your logic we could say that ascii characters only complicate > > Editors maybe, but what about keyboards ? Can you easily write my name (Krejčiřík) without copy and paste or character selector tool ? Of course. I can easily write this even on a US keyboard. My editor is set up to translate Krej\vci\vr\'ik to Krejčiřík as I type. This is not a hard problem.

On Tuesday, 2 July 2019 at 04:34:42 UTC, Jonathan M Davis wrote: > > a character like ± A good example of a character that should not be allowed in identifiers, because it has a meaning of operator (and in general in theory we may want to reserve it for such future use). ISO or Unicode define what, not all, characters are letters or alphanumeric: https://dlang.org/spec/lex.html#identifiers https://docs.microsoft.com/en-us/dotnet/api/system.char.isletter#remarks

On Tuesday, 2 July 2019 at 18:28:06 UTC, Timon Gehr wrote: > > Of course. I can easily write this even on a US keyboard. My editor is set up to translate Krej\vci\vr\'ik to Krejčiřík as I type. > > This is not a hard problem. auto Krejčiřík = 0; static assert(is(typeof(Krejčiřík))); Already supported :)

July 05, 2019

Re: Why not extend do to allow unicode in ID's?

Posted by Bert
in reply to XavierAP

Permalink

Bert

Posted in reply to XavierAP

Permalink

On Wednesday, 3 July 2019 at 23:21:19 UTC, XavierAP wrote:
> On Tuesday, 2 July 2019 at 04:34:42 UTC, Jonathan M Davis wrote:
>>
>> a character like ±
>
> A good example of a character that should not be allowed in identifiers, because it has a meaning of operator (and in general in theory we may want to reserve it for such future use).
>
> ISO or Unicode define what, not all, characters are letters or alphanumeric:
>
> https://dlang.org/spec/lex.html#identifiers
>
> https://docs.microsoft.com/en-us/dotnet/api/system.char.isletter#remarks

Maybe, maybe not. It could be useful in some contexts... probably could be more confusing but -, +, ± can be very useful as sub or superscripts for special mathematical situations(I've seen it used many times, such as representing the even and odd sets of things or for lower and raising operations that are encoded in symbolic form(such as momentum operations that can be computed by multiplication)).

It may not be worth allowing because s_-*s_++3 would be very ambiguous... as would s±4+3. Specially if ± is also defined as an operator...

But ± should be allowed to be used as an operator as that is the most useful case.

4 ± 3

could be a mathematical object containing two values.

a ± b could be a mathematical object containing 2(m+n) values depend on how many values a and b contains.

(4 ± 3)*(±6) contains 4 values = 42, -42, 6, -6.

So D could go through the unicode list and determine which symbols are best suited for operators and which for identifiers and then enable their usage. Many symbols that are not appropriate for id's would be appropriate for operators: ▌╚█

These are ugly in some sense but they could have good meaning in relation to operations. █ could mean boxing: █a means box a.

But they could also be useful for Id's...  █ could mean rectangle.

Symbols are arbitrary. We know millions of symbols. Our brain has no issues decoding them after we learn the meaning. The only problem is that it's nice to have consistency so we don't have to learn many different purposes for the same symbol(but we already do, it's not a huge deal, it does slow us down a little but  usually context is clear).

I think having it more open ended is better. It might require people exercising their neurons little bit but it is a good thing in the long run. Obviously people could make it very difficult by making code very terse but I doubt that would happen much. People don't code in D to make their life more difficult, they do it to make it less. Virtually everyone will choose the symbols in a logical way that will make sense.

What could be done is that any unicode character in an id could have some ascii equivalent.

someÆx is also

some::432::x

or whatever. If a good symbol could be found instead of ::. Then IDE's could learn to support the syntax and convert between them. A simple hotkey could work between the two and code pages could be flipped to change the keyboard. a pragma(codepage, 43) could inform the IDE to use use a codepage. These might have issues but without trying different things the optimal solution can't be found.

Forums