September 22, 2018
On Friday, September 21, 2018 10:54:59 PM MDT Joakim via Digitalmars-d wrote:
> I'm torn. I completely agree with Adam and others that people should be able to use any language they want. But the Unicode spec is such a tire fire that I'm leery of extending support for it.

Unicode identifiers may make sense in a code base that is going to be used solely by a group of developers who speak a particular language that uses a number a of non-ASCII characters (especially languages like Chinese or Japanese), but it has no business in any code that's intended for international use. It just causes problems. At best, a particular, regional keyboard may be able to handle a particular symbol, but most other keyboards won't be able too. So, using that symbol causes problems for all of the developers from other parts of the world even if those developers also have Unicode symbols in their native languages.

> Someone linked this Swift chapter on Unicode handling in an earlier forum thread, read the section on emoji in particular:
>
> https://oleb.net/blog/2017/11/swift-4-strings/
>
> I was laughing out loud when reading about composing "family" emojis with zero-width joiners. If you told me that was a tech parody, I'd have believed it.

Honestly, I was horrified to find out that emojis were even in Unicode. It makes no sense whatsover. Emojis are supposed to be sequences of characters that can be interepreted as images. Treating them like Unicode symbols is like treating entire words like Unicode symbols. It's just plain stupid and a clear sign that Unicode has gone completely off the rails (if it was ever on them). Unfortunately, it's the best tool that we have for the job.

- Jonathan M Davis



September 22, 2018
On Saturday, 22 September 2018 at 01:08:26 UTC, Neia Neutuladh wrote:
> ...you *do* know that not every codebase has people working on it who only know English, right?

This topic boils down to diversity vs. productivity.

If supporting diversity in this case is questionable.

I work in a German speaking company and we have no developers who are not speaking German for now. In fact all are native speakers.
Still we write our code, comments and commit messages in English.
Even at university you learn that you should use English to code.

The reasoning is simple. You never know who will work on your code in the future.
If a company writes code in Chinese, they will have a hard time to expand the development of their codebase even though Chinese is spoken by that many people.

So even though you could use all sorts of characters, in a productive environment you better choose not to do so.
You might end up shooting yourself in the foot in the long run.

Diversity is important in other areas but I don't see much advantage here.
At least for now because the spoken languages of today don't differ tremendously in what they are capable of expressing.

This is also true for todays programming languages. Most of them are just different syntax for the very same ideas and concepts. That's not very helpful to bring people together and advance.

My understanding is that even life with it's great diversity just has one language (DNA) to define it.

September 22, 2018
On 22/09/18 11:52, Jonathan M Davis wrote:
> 
> Honestly, I was horrified to find out that emojis were even in Unicode. It
> makes no sense whatsover. Emojis are supposed to be sequences of characters
> that can be interepreted as images. Treating them like Unicode symbols is
> like treating entire words like Unicode symbols. It's just plain stupid and
> a clear sign that Unicode has gone completely off the rails (if it was ever
> on them). Unfortunately, it's the best tool that we have for the job.
> 
> - Jonathan M Davis

Thank Allah that someone said it before I had to. I could not agree more. Encoding whole words as single Unicode code points makes no sense.

U+FDF2

Shachar
September 22, 2018
On Saturday, 22 September 2018 at 10:24:48 UTC, Shachar Shemesh wrote:
> Thank Allah that someone said it before I had to. I could not agree more. Encoding whole words as single Unicode code points makes no sense.

The goal of Unicode is to support diversity, if you argue against that you don't need Unicode at all.
What you are saying is basically that you would remove Chinese too.

Emojis are not my world either but it is an expression system / language.

September 22, 2018
On Saturday, September 22, 2018 4:51:47 AM MDT Thomas Mader via Digitalmars- d wrote:
> On Saturday, 22 September 2018 at 10:24:48 UTC, Shachar Shemesh
>
> wrote:
> > Thank Allah that someone said it before I had to. I could not agree more. Encoding whole words as single Unicode code points makes no sense.
>
> The goal of Unicode is to support diversity, if you argue against
> that you don't need Unicode at all.
> What you are saying is basically that you would remove Chinese
> too.
>
> Emojis are not my world either but it is an expression system / language.

Unicode is supposed to be a universal way of representing every character in every language. Emojis are not characters. They are sequences of characters that people use to represent images. I do not understand how an argument can even be made that they belong in Unicode. As I said, it's exactly the same as arguing that words should be represented in Unicode. Unfortunately, however, at least some of them are in there. :|

- Jonathan M Davis



September 22, 2018
On 22/09/18 14:28, Jonathan M Davis wrote:
> As I said, it's exactly the same
> as arguing that words should be represented in Unicode. Unfortunately,
> however, at least some of them are in there. :|
> 
> - Jonathan M Davis

To be fair to them, that word is part of the "Arabic-representation forms" section. The "Presentation forms" sections are meant as backwards compatibility toward code points that existed before, and are not meant to be generated by Unicode aware applications.

Shachar
September 22, 2018
On Saturday, 22 September 2018 at 11:28:48 UTC, Jonathan M Davis wrote:
> Unicode is supposed to be a universal way of representing every character in every language. Emojis are not characters. They are sequences of characters that people use to represent images. I do not understand how an argument can even be made that they belong in Unicode. As I said, it's exactly the same as arguing that words should be represented in Unicode. Unfortunately, however, at least some of them are in there. :|

At least since the incorporation of Emojis it's not supposed to be a universal way of representing characters anymore. :-)
Maybe there was a time when that was true I don't know but I think they see Unicode as a way to express all language symbols.
And Emojis is nothing else than a language were each symbol stands for an emotion/word/sentence.
If Unicode only allows languages with characters which are used to form words it's excluding languages which use other ways of expressing something.

Would you suggest to remove such writing systems out of Unicode?
What should a museum do which is in need of a software to somehow manage Egyptian hieroglyphs?

Unicode was made to support all sorts of writing systems and using multiple characters per word is just one system to form a writing system.
September 22, 2018
On 22/09/18 15:13, Thomas Mader wrote:
> Would you suggest to remove such writing systems out of Unicode?
> What should a museum do which is in need of a software to somehow manage Egyptian hieroglyphs?

If memory serves me right, hieroglyphs actually represent consonants (vowels are implicit), and as such, are most definitely "characters".

The only language I can think of, off the top of my head, where words have distinct signs is sign language. It is a good question whether Unicode should include such a language (difficulty of representing motion in a font aside).

Shachar
September 22, 2018
On 9/21/18 9:08 PM, Neia Neutuladh wrote:
> On Friday, 21 September 2018 at 20:25:54 UTC, Walter Bright wrote:
>> But identifiers? I haven't seen hardly any use of non-ascii identifiers in C, C++, or D. In fact, I've seen zero use of it outside of test cases. I don't see much point in expanding the support of it. If people use such identifiers, the result would most likely be annoyance rather than illumination when people who don't know that language have to work on the code.
> 
> ....you *do* know that not every codebase has people working on it who only know English, right?
> 
> If I took a software development job in China, I'd need to learn Chinese. I'd expect the codebase to be in Chinese. Because a Chinese company generally operates in Chinese, and they're likely to have a lot of employees who only speak Chinese.
> 
> And no, you can't just transcribe Chinese into ASCII.
> 
> Same for Spanish, Norwegian, German, Polish, Russian -- heck, it's almost easier to list out the languages you *don't* need non-ASCII characters for.
> 
> Anyway, here's some more D code using non-ASCII identifiers, in case you need examples: https://git.ikeran.org/dhasenan/muzikilo

But aren't we arguing about the wrong thing here? D already accepts non-ASCII identifiers. What languages need an upgrade to unicode symbol names? In other words, what symbols aren't possible with the current support?

Or maybe I'm misunderstanding something.

-Steve
September 22, 2018
On 9/22/18 4:52 AM, Jonathan M Davis wrote:
>> I was laughing out loud when reading about composing "family"
>> emojis with zero-width joiners. If you told me that was a tech
>> parody, I'd have believed it.
> 
> Honestly, I was horrified to find out that emojis were even in Unicode. It
> makes no sense whatsover. Emojis are supposed to be sequences of characters
> that can be interepreted as images. Treating them like Unicode symbols is
> like treating entire words like Unicode symbols. It's just plain stupid and
> a clear sign that Unicode has gone completely off the rails (if it was ever
> on them). Unfortunately, it's the best tool that we have for the job.

But aren't some (many?) Chinese/Japanese characters representing whole words?

-Steve