Why UTF-8/16 character encodings? (page 18) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » Why UTF-8/16 character encodings? (page 18)

May 30, 2013

Re: Why UTF-8/16 character encodings?

Posted by Marco Leise
in reply to Joakim

Marco Leise

Posted in reply to Joakim

Am Thu, 30 May 2013 09:19:32 +0200
schrieb "Joakim" <joakim@airpost.net>:

> Your point is?  121 results, including false positives like "utf-8 is the best guess."  If you look at the results, almost all make the pragmatic recommendation that UTF-8 is the best _for now_, because it is better supported than other multi-language formats.  That's like saying Windows is the best OS because it's easier to find one in your local computer store.
> 
> Yet again, the fact that even this somewhat ambiguous search string has only 121 results is damning of anyone liking UTF-8, nothing else, given the many thousands of programmmers that are forced to use Unicode if they want to internationalize.

Alright, for me it said ~6.570.000 results, which I found funny. I'm not trying to make a point, but to troll. If there is a point to be made, then that the count of search results is a _very_ rough estimate.

-- 
Marco

May 31, 2013

Re: Why UTF-8/16 character encodings?

Posted by Peter Williams
in reply to Walter Bright

Peter Williams

Posted in reply to Walter Bright

On 31/05/13 05:07, Walter Bright wrote:
> On 5/30/2013 4:24 AM, Manu wrote:
>> We don't all know English. Plenty of people don't.
>> I've worked a lot with Sony and Nintendo code/libraries, for instance,
>> it almost
>> always looks like this:
>>
>> {
>>    // E: I like cake.
>>    // J: ケーキが好きです。
>>    player.eatCake();
>> }
>>
>> Clearly someone doesn't speak English in these massive codebases that
>> power an
>> industry worth 10s of billions.
>
> Sure, but the code itself is written using ASCII!

Because they had no choice.

Peter

May 31, 2013

Re: Why UTF-8/16 character encodings?

Posted by Manu
in reply to Walter Bright

Manu

Posted in reply to Walter Bright

Attachments:

text/html part

On 31 May 2013 05:07, Walter Bright <newshound2@digitalmars.com> wrote:

> On 5/30/2013 4:24 AM, Manu wrote:
>
>> We don't all know English. Plenty of people don't.
>> I've worked a lot with Sony and Nintendo code/libraries, for instance, it
>> almost
>> always looks like this:
>>
>> {
>>    // E: I like cake.
>>    // J: ケーキが好きです。
>>    player.eatCake();
>> }
>>
>> Clearly someone doesn't speak English in these massive codebases that
>> power an
>> industry worth 10s of billions.
>>
>
> Sure, but the code itself is written using ASCII!
>

But that doesn't make it English, or any more readable...
The only benefit to forcing users to use ASCII is that everyone can
physically type it.
But that comes with disadvantages:
 1. It's not natural to type a word that you don't know what it is or how
to spell, you'll end up copy-pasting anyway rather than trying to
remember/copy it letter by letter and risk misspelling.
 2. It's less natural for the people who CAN read it, because they have to
mentally transliterate too. (And if they're kids/amateurs who don't know
even know the latin alphabet?)

Ie, it serves neither party to force someone who doesn't speak English to
write ASCII.
Add that to the points I (and others) made earlier about education, or
children learning to code. There's no compelling reason to force
identifiers in ASCII.
Currently, D offers a unique advantage; leave it that way.

May 31, 2013

Re: Why UTF-8/16 character encodings?

Posted by Manu
in reply to Entry

Manu

Posted in reply to Entry

Attachments:

text/html part

On 31 May 2013 01:48, Entry <no@no.com> wrote:

> On Thursday, 30 May 2013 at 14:49:12 UTC, monarch_dodra wrote:
>
>> On Thursday, 30 May 2013 at 14:13:47 UTC, Entry wrote:
>>
>>> Take a minute to think about why we're all communicating in English here. Let's see if you can figure it out.
>>>
>>
>> Well that's condescending :/ and fallacious.
>>
>> To answer your question, it may have something to do with the fact that these are the English forums? Just a wild hunch. Oh. And because we *can* speak English? That could also have something to do with it.
>>
>> There are tons of non-English speaking programming forums out there. Maybe those that don't speak English are over there? Heck, there are a few non-English threads in learn.
>>
>> Oh. And did you know TDPL was published in Japanese? Why bother right?
>>
>>  I just think that it's better to focus on two very specific languages
>>> with two very specific purposes (D for programming and English for communication). 'Twas just an idea, I don't care if you write your code in hieroglyphs.
>>>
>>
>> I really really agree with you.
>>
>> Yet, I think they are orthogonal concepts, and that the D programming language has no business choosing which communication vector its users should use.
>>
>> It's not just a matter (imo) of "I wouldn't force it upon anyone", but "I think everyone should choose what's best for them".
>>
>> Yeah. I know. Same conclusion, but there is a nuance.
>>
>
> I'm glad you agree, though I believe that I never said anything about D 'choosing' which human languages are compatible with it. I just expressed my belief that should people choose to construct something, be it a ship or a computer program, the usage of a single language will greatly enhance their progress (ever heard the story of the Tower of Babel? wink wink). Sorry if my previous comment seemed hostile, that was not my intention.
>

This is the definition of a *convention*, not a rule.

May 31, 2013

Re: Why UTF-8/16 character encodings?

Posted by Manu
in reply to Entry

Manu

Posted in reply to Entry

Attachments:

text/html part

On 31 May 2013 03:08, Entry <no@no.com> wrote:

> On Thursday, 30 May 2013 at 16:05:13 UTC, Jakob Ovrum wrote:
>
>> On Thursday, 30 May 2013 at 15:48:12 UTC, Entry wrote:
>>
>>> I'm glad you agree, though I believe that I never said anything about D 'choosing' which human languages are compatible with it. I just expressed my belief that should people choose to construct something, be it a ship or a computer program, the usage of a single language will greatly enhance their progress (ever heard the story of the Tower of Babel? wink wink). Sorry if my previous comment seemed hostile, that was not my intention.
>>>
>>
>> If the programmers who are going to be working on that code don't understand the "Single Language", then what use is it?
>>
>
> Then there's no helping it. Though I wonder what kind of a programmer doesn't understand English enough to at least read the code and comments.
>

A child, or a student.

May 31, 2013

Re: Why UTF-8/16 character encodings?

Posted by Manu

Manu

Attachments:

text/html part

On 31 May 2013 10:00, Peter Williams <pwil3058@bigpond.net.au> wrote:

> On 31/05/13 05:07, Walter Bright wrote:
>
>> On 5/30/2013 4:24 AM, Manu wrote:
>>
>>> We don't all know English. Plenty of people don't.
>>> I've worked a lot with Sony and Nintendo code/libraries, for instance,
>>> it almost
>>> always looks like this:
>>>
>>> {
>>>    // E: I like cake.
>>>    // J: ケーキが好きです。
>>>    player.eatCake();
>>> }
>>>
>>> Clearly someone doesn't speak English in these massive codebases that
>>> power an
>>> industry worth 10s of billions.
>>>
>>
>> Sure, but the code itself is written using ASCII!
>>
>
> Because they had no choice.


Indeed, and believe me, the variable names can often make NO sense, or
worse, they're misunderstood and quite misleading.
Ie, you think a variable is something, but you realise it's the inverse, or
just something completely different.

May 31, 2013

Re: Why UTF-8/16 character encodings?

Posted by Walter Bright
in reply to Peter Williams

Walter Bright

Posted in reply to Peter Williams

On 5/30/2013 5:00 PM, Peter Williams wrote:
> On 31/05/13 05:07, Walter Bright wrote:
>> On 5/30/2013 4:24 AM, Manu wrote:
>>> We don't all know English. Plenty of people don't.
>>> I've worked a lot with Sony and Nintendo code/libraries, for instance,
>>> it almost
>>> always looks like this:
>>>
>>> {
>>>    // E: I like cake.
>>>    // J: ケーキが好きです。
>>>    player.eatCake();
>>> }
>>>
>>> Clearly someone doesn't speak English in these massive codebases that
>>> power an
>>> industry worth 10s of billions.
>>
>> Sure, but the code itself is written using ASCII!
>
> Because they had no choice.

Not true, D supports Unicode identifiers.

May 31, 2013

Re: Why UTF-8/16 character encodings?

Posted by Walter Bright
in reply to Manu

Walter Bright

Posted in reply to Manu

On 5/30/2013 5:04 PM, Manu wrote:
> Currently, D offers a unique advantage; leave it that way.

I am going to leave it that way based on the comments here, I only wanted to point out that the example didn't support Unicode identifiers.

May 31, 2013

Re: Why UTF-8/16 character encodings?

Posted by Simen Kjaeraas
in reply to Walter Bright

Simen Kjaeraas

Posted in reply to Walter Bright

On Fri, 31 May 2013 07:57:37 +0200, Walter Bright <newshound2@digitalmars.com> wrote:

> On 5/30/2013 5:00 PM, Peter Williams wrote:
>> On 31/05/13 05:07, Walter Bright wrote:
>>> On 5/30/2013 4:24 AM, Manu wrote:
>>>> We don't all know English. Plenty of people don't.
>>>> I've worked a lot with Sony and Nintendo code/libraries, for instance,
>>>> it almost
>>>> always looks like this:
>>>>
>>>> {
>>>>    // E: I like cake.
>>>>    // J: ケーキが好きです。
>>>>    player.eatCake();
>>>> }
>>>>
>>>> Clearly someone doesn't speak English in these massive codebases that
>>>> power an
>>>> industry worth 10s of billions.
>>>
>>> Sure, but the code itself is written using ASCII!
>>
>> Because they had no choice.
>
> Not true, D supports Unicode identifiers.

I doubt Sony and Nintendo use D extensively.

-- 
Simen

June 06, 2013

Re: Why UTF-8/16 character encodings?

Posted by Timothee Cour
in reply to Walter Bright

Timothee Cour

Posted in reply to Walter Bright

Attachments:

text/html part

On Thu, May 30, 2013 at 10:57 PM, Walter Bright <newshound2@digitalmars.com>wrote:

> On 5/30/2013 5:00 PM, Peter Williams wrote:
>
>> On 31/05/13 05:07, Walter Bright wrote:
>>
>>> On 5/30/2013 4:24 AM, Manu wrote:
>>>
>>>> We don't all know English. Plenty of people don't.
>>>> I've worked a lot with Sony and Nintendo code/libraries, for instance,
>>>> it almost
>>>> always looks like this:
>>>>
>>>> {
>>>>    // E: I like cake.
>>>>    // J: ケーキが好きです。
>>>>    player.eatCake();
>>>> }
>>>>
>>>> Clearly someone doesn't speak English in these massive codebases that
>>>> power an
>>>> industry worth 10s of billions.
>>>>
>>>
>>> Sure, but the code itself is written using ASCII!
>>>
>>
>> Because they had no choice.
>>
>
> Not true, D supports Unicode identifiers.
>


currently std.demangle.demangle doesn't work with unicode (see example
below)

If we decide to keep allowing unicode symbols (as opposed to just unicode
strings/comments), we must
address this issue. Will supporting this negatively impact performance (of
both compile time and runtime) ?

Likewise, will linkers + other tools (gdb etc) be happy with unicode in
mangled names?

----
struct A{
    int z;
    void foo(int x){}
    void さいごの果実(int x){}
    void ªå(int x){}
}
mangledName!(A.さいごの果実).demangle.writeln;=>
_D4util13demangle_funs1A18さいごの果実MFiZv
----

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation