Updating D beyond Unicode 2.0 (page 4)

On 23/09/18 04:29, sarn wrote: > On Sunday, 23 September 2018 at 00:18:06 UTC, Adam D. Ruppe wrote: >> I have seen Japanese D code before on twitter, but cannot find it now (surely because the search engines also share this bias). > > You can find a lot more Japanese D code on this blogging platform: > https://qiita.com/tags/dlang > > Here's the most recent post to save you a click: > https://qiita.com/ShigekiKarita/items/9b3aa8f716848278ef62 Comments in Japanese. Identifiers in English. Not advancing your point, I think. Shachar

On 09/21/2018 04:18 PM, Adam D. Ruppe wrote: > Well, for example, with a Chinese company, they may very well find > forced English identifiers to be an annoyance. Fully aggreed but as far as I know, Turkish companies use English in source code. Turkish alphabet is Latin based where dotted and undotted versions of Latin letters are distinct and produce different meanings. Quick examples: sık: dense (n), squeeze (v), ... sik: penis (n), f*ck (v) [1] şık: one of multiple choices (1), swanky (2) döndür: return dondur: make frozen sök: disassemble, dismantle, ... sok: insert, install, ... şok: shock Hence, non-Unicode is unacceptable in Turkish code unless we reserve programming to English speakers only, which is unacceptable because it would be exclusionary and would produce English identifiers that are frequently amusing. I've seen the latter in code of English learners. :) Ali [1] https://gizmodo.com/382026/a-cellphones-missing-dot-kills-two-people-puts-three-more-in-jail

On 09/22/2018 09:27 AM, Neia Neutuladh wrote: > Logographic writing systems. There is one logographic writing system > still in common use, and it's the standard writing system for Chinese > and Japanese. I had the misconception of each Chinese character meaning a word until I read "The Chinese Language, Fact and Fantasy" by John DeFrancis. One thing I learned was that Chinese is not purely logographic. Ali

On Friday, 21 September 2018 at 23:17:42 UTC, Seb wrote: > A: Wait. Using emojis as identifiers is not a good idea? > B: Yes. > A: But the cool kids are doing it: > > https://codepen.io/andresgalante/pen/jbGqXj It's not like we have a lot of good fonts (I know only one), and even fewer of them are suitable for code, and they can't be realistically expected to do everything, monospace fonts are even often ascii-only.

On Sunday, 23 September 2018 at 11:18:42 UTC, Ali Çehreli wrote: > Hence, non-Unicode is unacceptable in Turkish code You even contributed to http://code.google.com/p/trileri/source/browse/trunk/tr/yazi.d

On Sunday, 23 September 2018 at 06:53:21 UTC, Shachar Shemesh wrote: > On 23/09/18 04:29, sarn wrote: >> You can find a lot more Japanese D code on this blogging platform: >> https://qiita.com/tags/dlang >> >> Here's the most recent post to save you a click: >> https://qiita.com/ShigekiKarita/items/9b3aa8f716848278ef62 > > Comments in Japanese. Identifiers in English. Not advancing your point, I think. > > Shachar Well, I knew that when I posted, so I honestly have no idea what point you assumed I was making.

On Friday, 21 September 2018 at 20:25:54 UTC, Walter Bright wrote: > When I originally started with D, I thought non-ASCII identifiers with Unicode was a good idea. I've since slowly become less and less enthusiastic about it. > > First off, D source text simply must (and does) fully support Unicode in comments, characters, and string literals. That's not an issue. > > But identifiers? I haven't seen hardly any use of non-ascii identifiers in C, C++, or D. In fact, I've seen zero use of it outside of test cases. I don't see much point in expanding the support of it. If people use such identifiers, the result would most likely be annoyance rather than illumination when people who don't know that language have to work on the code. Not seeing identifiers in languages you don't program in or can read in is expected. If it's supported it will be used: Japanese Swift: https://speakerdeck.com/codelynx/programming-swift-in-japanese > > Extending it further will also cause problems for all the tools that work with D object code, like debuggers, disassemblers, linkers, filesystems, etc. > > Absent a much more compelling rationale for it, I'd say no. More compelling than: "there're 6 billion people in this world who don't speak english?" Allowing people to program in their own language while reducing the cognitive friction for people who want to learn programming in the majority of the world seems like a no-brainer thing to do.

On Saturday, 22 September 2018 at 19:59:42 UTC, Erik van Velzen wrote: > If there was a contingent of Japanese or Chinese users doing that then surely they would speak up here or in Bugzilla to advocate for this feature? https://forum.dlang.org/post/piwvbtetcwyxlalocxkw@forum.dlang.org

September 23, 2018

Re: Updating D beyond Unicode 2.0

Posted by Abdulhaq
in reply to Jonathan M Davis

Permalink

Abdulhaq

Posted in reply to Jonathan M Davis

Permalink

On Saturday, 22 September 2018 at 08:52:32 UTC, Jonathan M Davis wrote:

>
> Honestly, I was horrified to find out that emojis were even in Unicode. It makes no sense whatsover. Emojis are supposed to be sequences of characters that can be interepreted as images. Treating them like Unicode symbols is like treating entire words like Unicode symbols. It's just plain stupid and a clear sign that Unicode has gone completely off the rails (if it was ever on them). Unfortunately, it's the best tool that we have for the job.

According to the Unicode website, http://unicode.org/standard/WhatIsUnicode.html,

"""
Support of Unicode forms the foundation for the representation of languages and symbols in all major operating systems, search engines, browsers, laptops, and smart phones—plus the Internet and World Wide Web (URLs, HTML, XML, CSS, JSON, etc.)"""

Note, unicode supports symbols, not just characters.

The smiley face symbol predates its ':-)' usage in ascii text, https://www.smithsonianmag.com/arts-culture/who-really-invented-the-smiley-face-2058483/. It's fundamentally a symbol, not a sequence of characters. Therefore it is not unreasonable for it to be encoded with a unicode number. I do agree though, of course, that it would seem bizarre to use an emoji as a D identifier.

The early history of computer science is completely dominated by cultures who use latin script based characters, and hence, quiet reasonably, text encoding and its automated visual representation by compute based devices is dominated by the requirements of latin script languages. However, the world keeps turning and, despite DT's best efforts, China et al. look to become dominant. Even if not China, the chances are that eventually a non-latin script based language will become very important. Parochial views like "all open source code should be in ASCII" will look silly.

However, until that time D developers have to spend their time where it can be most useful. Hence the condition of whether to apply Neia's patch / ideas or not mainly depends on how much effort the donwstream effort will be (debuggers etc. as Walter pointed out), and how much the gain is. As unicode 2.0 is already supported I would take a guess that the vast majority of people with access to a computer can already enter identifiers in D that are rich enough for them. As Adam said though, it would be a good idea to at least ask!

On 9/23/2018 9:52 AM, aliak wrote: > Not seeing identifiers in languages you don't program in or can read in is expected. On the other hand, I've been programming for 40 years. I've customized my C++ compiler to emit error messages in various languages: https://github.com/DigitalMars/Compiler/blob/master/dm/src/dmc/msgsx.c I've implemented SHIFT-JIS encodings, along with .950 (Chinese) and .949 (Korean) code pages in the C++ compiler. I've worked in Japan writing software for Japanese companies. I've sold compilers internationally for 30 years (mostly to Germany and Japan). I did the tech support, meaning I'd see their code. --- There's a reason why dmd doesn't have international error messages. My experience with it is that international users don't want it. They prefer the english messages. I'm sure if you look hard enough you'll find someone using non-ASCII characters in identifiers. --- When I visited Remedy Games in Finland a few years back, I was surprised that everyone in the company was talking in english. I asked if they were doing that out of courtesy to me. They laughed, and said no, they talked in English because they came from all over the world, and english was the only language they had in common.

Forums