Why not extend do to allow unicode in ID's? - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » Why not extend do to allow unicode in ID's?

Thread overview

Why not extend do to allow unicode in ID's?
Jun 29, 2019 Bert
Jun 30, 2019 sarn
Jun 30, 2019 Bert
Jun 30, 2019 Dennis
Jun 30, 2019 Bert
Jul 01, 2019 Dennis
Jul 01, 2019 Martin Krejcirik
Jul 01, 2019 Bert
Jul 02, 2019 Martin Krejcirik
Jul 02, 2019 Timon Gehr
Jul 03, 2019 XavierAP
Jul 02, 2019 Jonathan M Davis
Jul 02, 2019 rikki cattermole
Jul 02, 2019 Jonathan M Davis
Jul 03, 2019 XavierAP
Jul 05, 2019 Bert

June 29, 2019

Why not extend do to allow unicode in ID's?

Posted by Bert

Bert

It would greatly expand the coverage.

It would be nice to use certain characters that are truly meaningful.

In fact, it would be nice for ops too.

░▒▓│┤╡╢╖╕╣║╗╝╜╛┐└┴┬├─┼
∩ε≡φ±≥≤⌠⌡÷≈∙·√ⁿ²■☺♥♦♣♠•◘○◙♂♀♪♫☼►◄↕‼

I realize the excuse is going to be "It makes the code look ugly or hard to read", not all editors will support it, etc...

So? Those are lame excuses. People can abuse anything, you can't police the world. Stopping all legitimate uses because someone might use it illegitimately is ignorant and harmful(which is why it is ignorant).

It is best to enable unicode support and then have standards and guidelines and let some people shoot themselves in the foot if they want... that is the best way to learn not to do it again.

Imagine being able to write proper mathematical formula ID's:

∞
δ
Ω
Θ
Φ
τ
µ
σ
ε
φ

or using valid mathematical operators:

∩
≡
±
≥
≤
÷
≈
∙
√
ⁿ
²

or when you write a card game:

♥
♦
♣
♠


These are much better than the verbose words that we have to use now. I know some will say the opposite, but they can say it and be wrong. Trying to stop me from shooting myself in the foot when I don't own a gun is abusive to me and just like shooting me in the foot! I don't write any code that anyone else reads so let me make the choice for myself rather than create arbitrary rules that limit expressiveness. There is a reason the first amendment exists in the US, because the founders knew what limitations of expression would do. The same applies to all things.

Maybe one could use a switch to enable such a language with a compiler warning about such use. Maybe we can have a special D code page for useful symbols so there is a standard code for each that one could properly map using their editor of choice?

For example, we could have each symbol map to a long name that one could use to replace the source:

♥ = Symbol_Heart_0x2660    // or even just __Symbol__0x2660
♦ = Symbol_Diamond_0x2661
♣ = Symbol_Club_0x2662
♠ = Symbol_Spade_0x2663

And one could then change any source code between the symbolic form and the verbose form using a command line utility.

E.g.,


int ♥ = 3;

Can be converted to

int Symbol_Heart_0x2660 = 3;

and back without issue(99.9999999999% of code).

This would potentially cause issue with meta programming when comparing string of the id names but this is a minor issue. In fact, internally D could just use the long symbol name and require the programmer to use them.

E.g.,

static if (id == "♥") // invalid if id gets converted to long name internally.

static if (id == #"♥") // valid: #(or whatever) converts the symbol to it's long name

There are solutions to the problems... let's work on finding one to make D better.

June 30, 2019

Re: Why not extend do to allow unicode in ID's?

Posted by sarn
in reply to Bert

sarn

Posted in reply to Bert

D already allows non-latin characters in identifiers, just not arbitrary symbols:

import std.stdio;

void main()
{
	double φ = 1.61803398874989484820;
	writeln(φ);
}

June 30, 2019

Re: Why not extend do to allow unicode in ID's?

Posted by Dennis
in reply to Bert

Dennis

Posted in reply to Bert

On Saturday, 29 June 2019 at 22:38:06 UTC, Bert wrote:
> Imagine being able to write proper mathematical formula ID's:

Currently D allows "universal alphas" in identifiers, so Greek letters are allowed already. See: https://dlang.org/spec/lex.html#identifiers

> or using valid mathematical operators:

In D you can't add operators, but if you want math notation on existing ones, you might be interested in fonts with programming ligatures such as:
https://github.com/tonsky/FiraCode

> or when you write a card game:

Custom literals can be added with templates, for example:
octal!377
(https://github.com/dlang/phobos/blob/d57be4690fc923a1974a4ef4d8b84a951131d219/std/conv.d#L4062)
tok!"if"
(https://github.com/dlang-community/libdparse/blob/5270739bcd1962418784c7760773e24d28b6009b/src/dparse/lexer.d#L115)

Since in strings any Unicode is allowed, you can do something similar:
suit!"♥"

> I don't write any code that anyone else reads so let me make the choice for myself rather than create arbitrary rules that limit expressiveness.

If it's only for yourself, you can add a build step that substitutes your custom symbols with valid identifiers before compiling. Or use your own fork of the compiler, you probably only need to remove this line:
https://github.com/dlang/dmd/blob/2599559d624275bfcff298b3a8b31f9d82ae534f/src/dmd/lexer.d#L524

Finally, if you truly long for ultimate freedom in how you write code, then Nim might be the right language for you since it aligns more with your "putting full trust in the programmer" view than D. In Nim, any non-ascii character is valid for identifiers, so even invalid Unicode characters are allowed.
https://nim-lang.org/docs/manual.html#lexical-analysis-identifiers-amp-keywords

June 30, 2019

Re: Why not extend do to allow unicode in ID's?

Posted by Bert
in reply to sarn

Bert

Posted in reply to sarn

On Sunday, 30 June 2019 at 03:12:55 UTC, sarn wrote:
> D already allows non-latin characters in identifiers, just not arbitrary symbols:
>
> import std.stdio;
>
> void main()
> {
> 	double φ = 1.61803398874989484820;
> 	writeln(φ);
> }

Yeah, I noticed some work but many do not and I'm not even sure what does or doesn't ;/

June 30, 2019

Re: Why not extend do to allow unicode in ID's?

Posted by Bert
in reply to Dennis

Bert

Posted in reply to Dennis

On Sunday, 30 June 2019 at 10:10:41 UTC, Dennis wrote:
> On Saturday, 29 June 2019 at 22:38:06 UTC, Bert wrote:
>> Imagine being able to write proper mathematical formula ID's:
>
> Currently D allows "universal alphas" in identifiers, so Greek letters are allowed already. See: https://dlang.org/spec/lex.html#identifiers
>
>> or using valid mathematical operators:
>
> In D you can't add operators, but if you want math notation on existing ones, you might be interested in fonts with programming ligatures such as:
> https://github.com/tonsky/FiraCode
>
>> or when you write a card game:
>
> Custom literals can be added with templates, for example:
> octal!377
> (https://github.com/dlang/phobos/blob/d57be4690fc923a1974a4ef4d8b84a951131d219/std/conv.d#L4062)
> tok!"if"
> (https://github.com/dlang-community/libdparse/blob/5270739bcd1962418784c7760773e24d28b6009b/src/dparse/lexer.d#L115)
>
> Since in strings any Unicode is allowed, you can do something similar:
> suit!"♥"
>
>> I don't write any code that anyone else reads so let me make the choice for myself rather than create arbitrary rules that limit expressiveness.
>
> If it's only for yourself, you can add a build step that substitutes your custom symbols with valid identifiers before compiling. Or use your own fork of the compiler, you probably only need to remove this line:
> https://github.com/dlang/dmd/blob/2599559d624275bfcff298b3a8b31f9d82ae534f/src/dmd/lexer.d#L524
>

Thanks. I guess I could create a small routine that hacks the binary that reverses the if check. This would be easiest to maintain as I woudln't have to recompile dmd every release, just install the new one and patch.

> Finally, if you truly long for ultimate freedom in how you write code, then Nim might be the right language for you since it aligns more with your "putting full trust in the programmer" view than D. In Nim, any non-ascii character is valid for identifiers, so even invalid Unicode characters are allowed.
> https://nim-lang.org/docs/manual.html#lexical-analysis-identifiers-amp-keywords

I've heard of nim but never really looked in to it much.... but every time I hear about it I am more and more enticed.

It seems well put together but the syntax is a little off putting. I'm sure I could get used to it.

I have a few questions:

1. There doesn't seem to be good IDE support. I mainly use Visual Studio and I see a nim for VSC which I don't use ;/  Is there any really good IDE support?

2. How does meta programming of Nim compare to D's? The main reason I use D is it's meta programming.

3. Nim seems to be have somewhat of a strong categorical and functional foundation. Is it more like Haskell than D? (In the sense of catering to strongly structured programming(functors, natural transformations, etc))

I'll try to read over the manual. Maybe my next program will be in Nim.

July 01, 2019

Re: Why not extend do to allow unicode in ID's?

Posted by Dennis
in reply to Bert

Dennis

Posted in reply to Bert

On Sunday, 30 June 2019 at 23:27:56 UTC, Bert wrote:
> I have a few questions:
>
> 1. There doesn't seem to be good IDE support. I mainly use Visual Studio and I see a nim for VSC which I don't use ;/  Is there any really good IDE support?

I don't have much Nim experience myself, so maybe you should ask on the Nim forum.

> 2. How does meta programming of Nim compare to D's? The main reason I use D is it's meta programming.

It also has static if ('when'), CTFE, type reflection ('typedesc') and templates. In addition, it has AST macros which D will not have. (You can find long past discussions why, or Google 'The Lisp Curse' for something related).

> 3. Nim seems to be have somewhat of a strong categorical and functional foundation. Is it more like Haskell than D? (In the sense of catering to strongly structured programming(functors, natural transformations, etc))

Both are system programming languages that support mutation, loops and pointers, so you can write C-style procedural code in either language. Whether Nim's higher level constructs are similar to Haskell is something I cannot judge.

July 01, 2019

Re: Why not extend do to allow unicode in ID's?

Posted by Martin Krejcirik
in reply to Bert

Martin Krejcirik

Posted in reply to Bert

On Saturday, 29 June 2019 at 22:38:06 UTC, Bert wrote:
> It would greatly expand the coverage.
>
> It would be nice to use certain characters that are truly meaningful.

I think a source code should be easily editable by anyone using a keyborad and plain editor. Extended characters only complicate things.

July 01, 2019

Re: Why not extend do to allow unicode in ID's?

Posted by Bert
in reply to Martin Krejcirik

Bert

Posted in reply to Martin Krejcirik

On Monday, 1 July 2019 at 17:14:08 UTC, Martin Krejcirik wrote:
> On Saturday, 29 June 2019 at 22:38:06 UTC, Bert wrote:
>> It would greatly expand the coverage.
>>
>> It would be nice to use certain characters that are truly meaningful.
>
>
> I think a source code should be easily editable by anyone using a keyborad and plain editor. Extended characters only complicate things.

It's time to grow up? How can progress be made if we don't progress. 99% of all modern text editors support UTF-8... with your logic we could say that ascii characters only complicate things, why not just force everyone to code in binary? That would be the simplest thing to do, right?

What you are telling me is that you want too force me to use your view but you don't want me to force you to use mine.

What you are actually doing is assuming it would be a problem without actually knowing or having any evidence it would be. You should ponder that a little.

July 01, 2019

Re: Why not extend do to allow unicode in ID's?

Posted by Jonathan M Davis
in reply to Bert

Jonathan M Davis

Posted in reply to Bert

On Saturday, June 29, 2019 4:38:06 PM MDT Bert via Digitalmars-d wrote:
> It would greatly expand the coverage.
>
> It would be nice to use certain characters that are truly meaningful.
...

Like most major languages, D supports identifiers with alphanumeric characters plus underscore with the first character not being allowed to be numeric. However, unlike most languages, it expands that to include Unicode alpha characters, meaning that quite a lot of Unicode is supported in identifiers. So, it already goes far beyond what most languages do.

That being said, I think that you'll find that most folks will not be in favor of using Unicode in identifiers outside of code intended for people of a specific language who actually use those characters normally (e.g. Japanese characters when all of the programmers involved read and write Japanese and have keyboards that support it). The fact that a character is not a key on a typical keyboard means that anyone using an identifier with that charater in it will almost certainly have to copy-paste it, and that's really not going to over well with most people. If you really feel strongly about the matter, you can always create a DIP to propose a language change to allow more Unicode characters in identifiers, but I would not expect it to be accepted.

- Jonathan M Davis

July 02, 2019

Re: Why not extend do to allow unicode in ID's?

Posted by rikki cattermole
in reply to Jonathan M Davis

rikki cattermole

Posted in reply to Jonathan M Davis

On 02/07/2019 2:17 PM, Jonathan M Davis wrote:
> On Saturday, June 29, 2019 4:38:06 PM MDT Bert via Digitalmars-d wrote:
>> It would greatly expand the coverage.
>>
>> It would be nice to use certain characters that are truly
>> meaningful.
> ...
> 
> Like most major languages, D supports identifiers with alphanumeric
> characters plus underscore with the first character not being allowed to be
> numeric. However, unlike most languages, it expands that to include Unicode
> alpha characters, meaning that quite a lot of Unicode is supported in
> identifiers. So, it already goes far beyond what most languages do.
> 
> That being said, I think that you'll find that most folks will not be in
> favor of using Unicode in identifiers outside of code intended for people of
> a specific language who actually use those characters normally (e.g.
> Japanese characters when all of the programmers involved read and write
> Japanese and have keyboards that support it). The fact that a character is
> not a key on a typical keyboard means that anyone using an identifier with
> that charater in it will almost certainly have to copy-paste it, and that's
> really not going to over well with most people. If you really feel strongly
> about the matter, you can always create a DIP to propose a language change
> to allow more Unicode characters in identifiers, but I would not expect it
> to be accepted.
> 
> - Jonathan M Davis

No DIP is required. The lexer just needs updating to match to the (current) Unicode spec.

https://github.com/dlang/dmd/blob/master/src/dmd/lexer.d#L1082

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation