Updating D beyond Unicode 2.0 (page 9)

On 9/26/2018 5:46 AM, Steven Schveighoffer wrote: > This is a non-starter. We can't break people's code, especially for trivial reasons like 'you shouldn't code that way because others don't like it'. I'm pretty sure Walter would be against removing Unicode support for identifiers. We're not going to remove it, because there's not much to gain from it. But expanding it seems of vanishingly little value. Note that each thing that gets added to D adds weight to it, and it needs to pull its weight. Nothing is free. I don't see a scenario where someone would be learning D and not know English. Non-English D instructional material is nearly non-existent. dlang.org is all in English. Don't most languages have a Romanji-like representation? C/C++ have made efforts in the past to support non-ASCII coding - digraphs, trigraphs, and alternate keywords. They've all failed miserably. The only people who seem to know those features even exist are language lawyers.

On 9/26/2018 5:46 AM, Steven Schveighoffer wrote: >>> Does this need a DIP? Feel free to write one, but its chances of getting incorporated are remote and would require a pretty strong rationale that I haven't seen yet.

On Wednesday, 26 September 2018 at 20:43:47 UTC, Walter Bright wrote: > I don't see a scenario where someone would be learning D and not know English. Non-English D instructional material is nearly non-existent. http://ddili.org/ders/d/

On 9/26/18 4:43 PM, Walter Bright wrote: > But expanding it seems of vanishingly little value. Note that each thing that gets added to D adds weight to it, and it needs to pull its weight. Nothing is free. It may be the weight is already there in the form of unicode symbol support, just the range of the characters supported isn't good enough for some languages. It might be like replacing your refrigerator -- you get an upgrade, but it's not going to take up any more space because you get rid of the old one. I would like to see the PR before passing judgment on the heft of the change. The value is simply in the consistency -- when some of the words for your language can be valid symbols but others can't, then it becomes a weird guessing game as to what is supported. It would be like saying all identifiers can have any letters except `q`. Sure, you can get around that, but it's weirdly exclusive. I claim complete ignorance as to what is required, it hasn't been technically laid out what is at stake, and I'm not bilingual anyway. It could be true that I'm completely misunderstanding the positions of others. -Steve

On 09/26/2018 01:43 PM, Walter Bright wrote: > Don't most languages have a Romanji-like representation? Yes, a lot of languages that don't use the Latin alphabet have standard transcriptions into the Latin alphabet. Standard transcriptions into ASCII are much less common, and newer Unicode versions include more Latin characters to better support languages (and other use cases) using the Latin alphabet.

On Sunday, September 23, 2018 2:49:39 PM MDT Walter Bright via Digitalmars-d wrote: > There's a reason why dmd doesn't have international error messages. My experience with it is that international users don't want it. They prefer the english messages. It reminds me of one of the reasons that Bryan Cantrill thinks that many folks use Linux - they want to be able to google their stack traces. Of course, that same argument would be a reason to use C/C++ rather than switching to D, but having an error be in a format that's more common and therefore more likely to have been posted somewhere where you might be able to find a discussion on it and therefore maybe be able to find the solution for it can be valuable - and that's without even getting into all of the translation issues discussed elsewher in this thread. And it's not like compiler error messages - or programming speak in general - are really traditional English anyway. - Jonathan M Davis

A delicious Turkish desert is "kabak tatlısı", made of squash. Now, it so happens that "kabak" also means "zucchini" in Turkish. Imagine my shock when I came across that desert recipe in English that used zucchini as the ingredient! :) Ali

On Wednesday, September 26, 2018 11:15:01 PM MDT Ali Çehreli via Digitalmars-d wrote: > A delicious Turkish desert is "kabak tatlısı", made of squash. Now, it so happens that "kabak" also means "zucchini" in Turkish. Imagine my shock when I came across that desert recipe in English that used zucchini as the ingredient! :) Was it any good? ;) - Jonathan M Davis

On Thursday, 27 September 2018 at 05:15:01 UTC, Ali Çehreli wrote: > A delicious Turkish desert is "kabak tatlısı", made of squash. Now, it so happens that "kabak" also means "zucchini" in Turkish. Imagine my shock when I came across that desert recipe in English that used zucchini as the ingredient! :) > > Ali You can't even imagine how many italian words and recipes are distorted... Andrea

September 27, 2018

Re: Updating D beyond Unicode 2.0

Posted by aliak
in reply to Walter Bright

Permalink

aliak

Posted in reply to Walter Bright

Permalink

On Wednesday, 26 September 2018 at 20:43:47 UTC, Walter Bright wrote:
> On 9/26/2018 5:46 AM, Steven Schveighoffer wrote:
>> This is a non-starter. We can't break people's code, especially for trivial reasons like 'you shouldn't code that way because others don't like it'. I'm pretty sure Walter would be against removing Unicode support for identifiers.
>
> We're not going to remove it, because there's not much to gain from it.
>
> But expanding it seems of vanishingly little value. Note that each thing that gets added to D adds weight to it, and it needs to pull its weight. Nothing is free.
>
> I don't see a scenario where someone would be learning D and not know English. Non-English D instructional material is nearly non-existent. dlang.org is all in English. Don't most languages have a Romanji-like representation?

It's not that they don't know English. It's that non-English speakers can process words and sentences in non-English much more efficiently than in English. Knowing a language is not binary.

Here's an example from this years spring semester and NTNU (norwegian uni): http://folk.ntnu.no/frh/grprog/eksempel/eks_20.cpp

... That's the basic programming course. Whether the professor would use that I guess would depend on ratio of English/non-English speakers. But it's there nonetheless.

Of course Norway is a bad example because the English level here is, arguably, higher than many English countries :p But it's a great example because even if you're great at English, still sometimes people are more comfortable/confident/efficient/ in their own native language.

Some tech meetups from different countries try and do things in English and mostly it works. But it's been seen consistently with non-English audiences that presentations given in English result in silence whereas if it's in their native language you have actual engagement.

I fail to understand how supporting a version of unicode from (not sure when it was released) 3 billion decades ago should just be left as is and also cannot be removed when there's someone who's willing to update it.

>
> C/C++ have made efforts in the past to support non-ASCII coding - digraphs, trigraphs, and alternate keywords. They've all failed miserably. The only people who seem to know those features even exist are language lawyers.

This is not relevant. Trigraphs and digraphs did indeed fail miserably but they do not represent any non-ascii characters. The existential reasons for those abominations were different.

Anyway, on a related note: D itself (not identifiers, but std) also supports unicode 6 or something. That's from 2010. That's a decade ago. We're at unicode 11 now. And I've already had someone tell me (while trying to get them to use D) - "hold on it supports unicode from a decade ago? Nah I'm not touching it". Not that it's the same as supporting identifiers in code, but still the reaction is relevant.

Cheers,
- Ali

Forums