September 22, 2006 Re: identifiers & "unialpha" | ||||
---|---|---|---|---|
| ||||
Posted in reply to Walter Bright Attachments: | Walter Bright schrieb am 2006-09-22:
> Thomas Kuehne wrote:
>> Walter Bright schrieb am 2006-09-22:
>>> What is CJK?
>>
>> CJK: Chinese, Japanese & Korean
>> 0x20000 .. 0x2A6D6 CJK Ideograph Extension B
>> 0x2F800 .. 0x2FA1D CJK COMPATIBILITY IDEOGRAPHS
>
> Thank-you.
>
>>> As it is now, it matches standard C's definition of identifiers, which is the intent of the reference. I haven't checked, but I think it matches Java's idea of an identifier character, too.
>>
>> ISO/IEC 9899:1999 (E) Appendix D
>> # 1) This clause lists the hexadecimal code values that are valid in
>> # universal character names in identi?ers.
>>
>> Whereas Appendix D defines valid characters in identifiers, D uses it as a source for "universal alpha". As a consequence std.uni.isUniAlpha claims that \u00B7 (MIDDLE DOT) is a letter...
>
> I guess I don't see why C99 would say . is a valid identifier character if it isn't an alpha. It's all confusing to me, and I think needlessly complicated. Is \u00B7 the only difference?
No, see attachment.
Format: "[first_in_range, last_in_range],"
Thomas
|
September 23, 2006 Re: identifiers & "unialpha" | ||||
---|---|---|---|---|
| ||||
Posted in reply to Walter Bright | Walter Bright wrote: > Thomas Kuehne wrote: >> Walter Bright schrieb am 2006-09-22: >>> What is CJK? >> >> CJK: Chinese, Japanese & Korean >> 0x20000 .. 0x2A6D6 CJK Ideograph Extension B >> 0x2F800 .. 0x2FA1D CJK COMPATIBILITY IDEOGRAPHS > > Thank-you. > ... >> >> Task at hand: Create a table of all characters used by humans all over >> the world and minimize friction due to political issues >> (e.g. characters' names). Except for bug fixes (typos...) the unicode people >> usually only extend previous versions of the standard. > > Chinese, Japanese, and Korean are hardly obscure so I don't see why the character sets for them seem to need large numbers of additions this late in the game. I think the big-alphabet languages tend to coin new letters somewhat like other languages do words (but maybe less frequently), but I'm not sure about that. I have heard, though, that Chinese was simplified to a smaller set with different appearances during the revolution and the various political upheavals since. They have been adding letters back since as they discover they are really needed -- so these get put into Unicode. If you've read "1984" by Orwell, it's something like the motivation for NewSpeak. Old literature is written in the old letters, and is disappearing because the public can't read it. It's a kind of history censorship - you can't translate the old Chinese literature because they want to destroy the old culture as it competes philosophically with Communism. Essentially, they didn't have to burn all the old books -- they just burned all the old printing presses. Kevin |
September 23, 2006 Re: identifiers & "unialpha" | ||||
---|---|---|---|---|
| ||||
Posted in reply to Kevin Bealer | On Sat, 23 Sep 2006 11:40:08 +0300, Kevin Bealer <kevinbealer@gmail.com> wrote: > Walter Bright wrote: >> Thomas Kuehne wrote: >>> Walter Bright schrieb am 2006-09-22: >>>> What is CJK? >>> >>> CJK: Chinese, Japanese & Korean >>> 0x20000 .. 0x2A6D6 CJK Ideograph Extension B >>> 0x2F800 .. 0x2FA1D CJK COMPATIBILITY IDEOGRAPHS >> Thank-you. >> > ... >>> >>> Task at hand: Create a table of all characters used by humans all over >>> the world and minimize friction due to political issues >>> (e.g. characters' names). Except for bug fixes (typos...) the unicode people >>> usually only extend previous versions of the standard. >> Chinese, Japanese, and Korean are hardly obscure so I don't see why the character sets for them seem to need large numbers of additions this late in the game. > > I think the big-alphabet languages tend to coin new letters somewhat like other languages do words (but maybe less frequently), but I'm not sure about that. > > I have heard, though, that Chinese was simplified to a smaller set with different appearances during the revolution and the various political upheavals since. They have been adding letters back since as they discover they are really needed -- so these get put into Unicode. > > If you've read "1984" by Orwell, it's something like the motivation for NewSpeak. Old literature is written in the old letters, and is disappearing because the public can't read it. > It's a kind of history censorship - you can't translate the old Chinese literature because they want to destroy the old culture as it competes philosophically with Communism. > > Essentially, they didn't have to burn all the old books -- they just burned all the old printing presses. > > Kevin If that's the case, I'm very sorry to hear that! :( |
September 23, 2006 [OT] Re: identifiers & "unialpha" | ||||
---|---|---|---|---|
| ||||
Posted in reply to Kristian | Kristian wrote:
> On Sat, 23 Sep 2006 11:40:08 +0300, Kevin Bealer <kevinbealer@gmail.com> wrote:
>
>> It's a kind of history censorship - you can't translate the old Chinese literature because they want to destroy the old culture as it competes philosophically with Communism.
>>
>> Essentially, they didn't have to burn all the old books -- they just burned all the old printing presses.
>
> If that's the case, I'm very sorry to hear that! :(
This is completely off-topic, but if you're interested in learning a bit about the Communist Revolution in China the fun way, go find the movie "To Live" in the foreign film section of your favorite video store. It's an excellent film that spans maybe 30 years of Chinese history, including the Communist Revolution.
Sean
|
September 25, 2006 Re: [OT] Re: identifiers & "unialpha" | ||||
---|---|---|---|---|
| ||||
Posted in reply to Sean Kelly | On Sat, 23 Sep 2006 19:01:36 +0300, Sean Kelly <sean@f4.ca> wrote:
> Kristian wrote:
>> On Sat, 23 Sep 2006 11:40:08 +0300, Kevin Bealer <kevinbealer@gmail.com> wrote:
>>
>>> It's a kind of history censorship - you can't translate the old Chinese literature because they want to destroy the old culture as it competes philosophically with Communism.
>>>
>>> Essentially, they didn't have to burn all the old books -- they just burned all the old printing presses.
>> If that's the case, I'm very sorry to hear that! :(
>
> This is completely off-topic, but if you're interested in learning a bit about the Communist Revolution in China the fun way, go find the movie "To Live" in the foreign film section of your favorite video store. It's an excellent film that spans maybe 30 years of Chinese history, including the Communist Revolution.
>
>
> Sean
Thanks for the tip.
|
September 26, 2006 Re: [OT] Re: identifiers & "unialpha" | ||||
---|---|---|---|---|
| ||||
Posted in reply to Kristian | Kristian wrote: > On Sat, 23 Sep 2006 19:01:36 +0300, Sean Kelly <sean@f4.ca> wrote: > >> Kristian wrote: >>> On Sat, 23 Sep 2006 11:40:08 +0300, Kevin Bealer <kevinbealer@gmail.com> wrote: >>> >>>> It's a kind of history censorship - you can't translate the old Chinese literature because they want to destroy the old culture as it competes philosophically with Communism. >>>> >>>> Essentially, they didn't have to burn all the old books -- they just burned all the old printing presses. >>> If that's the case, I'm very sorry to hear that! :( >> >> This is completely off-topic, but if you're interested in learning a bit about the Communist Revolution in China the fun way, go find the movie "To Live" in the foreign film section of your favorite video store. It's an excellent film that spans maybe 30 years of Chinese history, including the Communist Revolution. >> >> >> Sean > > Thanks for the tip. Yes - I really enjoyed that movie. The site where I got the history of this, which I tried to summarize above, was a unicode related article. What I wrote above is somewhat negative (intentionally) toward the PRC -- I don't take any of that back, but I thought I should post the link as well. It also has some interesting unicode related info (which is maybe marginally on-topic?) but the technical stuff might be out-dated. http://www.hastingsresearch.com/net/04-unicode-limitations.shtml Kevin |
September 26, 2006 Re: [OT] Re: identifiers & "unialpha" | ||||
---|---|---|---|---|
| ||||
Posted in reply to Kevin Bealer Attachments: | Kevin Bealer schrieb am 2006-09-26: > Kristian wrote: <snip> > It also has some interesting unicode related info (which is maybe marginally on-topic?) but the technical stuff might be out-dated. > > http://www.hastingsresearch.com/net/04-unicode-limitations.shtml The technical stuff is way outdated. The article is based on version 3, the current one is 5. Version 4 did fix most of the CJK issues, however the compatibility ideographs and variant selectors might turn out to be monsters like the infamous tags (0xE0001, 0xE0020 - 0xE007F). Thomas |
Copyright © 1999-2021 by the D Language Foundation