View mode: basic / threaded / horizontal-split · Log in · Help
January 25, 2011
Precomposed Character & Grapheme on wikipedia
Hello,

I stepped on wikipedia's article 
http://en.wikipedia.org/wiki/Precomposed_character which is, imo, excellent. 
(It does not (yet) cope with consequences in programming with Unicode that we 
debated on this list.)
A enigmatic point is "Precomposed characters are the legacy solution for 
representing many special letters in various character sets." I still fail to 
see how precomposed characters help in solving issues posed by texts encoded in 
legacy characters sets (since they need be decoded anyway). Explanation welcome.

This article brought me to http://en.wikipedia.org/wiki/Grapheme. Seems I was 
partially wrong in stating that using "grapheme" to denote what we commonly 
think as a character is an error. Possibly "grapheme" in english and "graphème" 
in french are not quite synonym. For instance, "ph" is commonly regarded as a 
single grapheme in french (<--> phoneme /f/ indeed), so that grapheme and 
chracter are not at all synonyms; while according to en-wikipedia's article it 
may be 2 in english. What do you think?
Still remains the point that the notion of grapheme only applies to elements of 
scripting systems (letters, syllables...), used to write 'words'. What we need 
is a term which, just like "character" in the context of computing, both for 
users and programmers, englobes thingies like tabulation or newline marks, 
copyright or paragraph signs, and much more... even the null character ;-).
"Grapheme" is usable provided it is clearly defined as meaning that, precisely, 
in the context of UCS/Unicode. What Unicode literature & and literature about 
Unicode do not do, AFAIK. Else, it is just adding confusion over confusion.

Denis
-- 
_________________
vita es estrany
spir.wikidot.com
January 25, 2011
Re: Precomposed Character & Grapheme on wikipedia
"spir" <denis.spir@gmail.com> wrote in message 
news:mailman.940.1295974243.4748.digitalmars-d@puremagic.com...
> Hello,
>
> I stepped on wikipedia's article 
> http://en.wikipedia.org/wiki/Precomposed_character which is, imo, 
> excellent. (It does not (yet) cope with consequences in programming with 
> Unicode that we debated on this list.)
> A enigmatic point is "Precomposed characters are the legacy solution for 
> representing many special letters in various character sets." I still fail 
> to see how precomposed characters help in solving issues posed by texts 
> encoded in legacy characters sets (since they need be decoded anyway). 
> Explanation welcome.
>

My guess, and this is only a guess, would be that they felt it would make 
rendering easier since 1. Many fonts already had precomposed characters, but 
may not have had any of the "modifier" markings by themselves, and 2. Font 
rendering libraries probably didn't support characters with "overlays".


> This article brought me to http://en.wikipedia.org/wiki/Grapheme. Seems I 
> was partially wrong in stating that using "grapheme" to denote what we 
> commonly think as a character is an error. Possibly "grapheme" in english 
> and "graphème" in french are not quite synonym. For instance, "ph" is 
> commonly regarded as a single grapheme in french (<--> phoneme /f/ 
> indeed), so that grapheme and chracter are not at all synonyms; while 
> according to en-wikipedia's article it may be 2 in english. What do you 
> think?

No, a grapheme is the common notion of character:

A phoneme is an atomic unit of vocal *sound*. So all that article is saying 
is that a grapheme (single written unit) can represent either:

- One specific sound
- No particular sound (like '&' or the chinese characters)
- Different sounds depending on context (like the english 'c')
- Or, as with the french 'ph', the japanese 'kyou', or the german 'sch', 
multiple graphemes can form one sound. These are known as digraphs and 
trigraphs.
January 25, 2011
Re: Precomposed Character & Grapheme on wikipedia
Nick Sabalausky wrote:
> "spir" <denis.spir@gmail.com> wrote in message

>> This article brought me to http://en.wikipedia.org/wiki/Grapheme. 
Seems I
>> was partially wrong in stating that using "grapheme" to denote what we
>> commonly think as a character is an error. Possibly "grapheme" in 
english
>> and "graph�me" in french are not quite synonym. For instance, "ph" is
>> commonly regarded as a single grapheme in french (<--> phoneme /f/
>> indeed), so that grapheme and chracter are not at all synonyms; while
>> according to en-wikipedia's article it may be 2 in english. What do you
>> think?
>
> No, a grapheme is the common notion of character:

That's my understanding too. I think the article spends too much time 
comparing graphemes and phonemes. The former is about writing, the 
latter is about speech.

Ali
January 25, 2011
Re: Precomposed Character & Grapheme on wikipedia
On 01/25/2011 10:43 PM, Ali Çehreli wrote:
> Nick Sabalausky wrote:
>>  "spir" <denis.spir@gmail.com> wrote in message
>
>> > This article brought me to http://en.wikipedia.org/wiki/Grapheme. Seems I
>> > was partially wrong in stating that using "grapheme" to denote what we
>> > commonly think as a character is an error. Possibly "grapheme" in english
>> > and "graphème" in french are not quite synonym. For instance, "ph" is
>> > commonly regarded as a single grapheme in french (<--> phoneme /f/
>> > indeed), so that grapheme and chracter are not at all synonyms; while
>> > according to en-wikipedia's article it may be 2 in english. What do you
>> > think?
>>
>>  No, a grapheme is the common notion of character:
>
> That's my understanding too. I think the article spends too much time comparing
> graphemes and phonemes. The former is about writing, the latter is about speech.

Yop, that's what I understood as well. But it's not the notion I learnt when 
studying linguistics (in french). For instance, the corresponding fr-wikipedia 
article "Graphème" explicitely states that "au" is a grapheme in french (<--> 
phoneme /o/). But it's indeed to characters (even for frenchmen ;-) Reason why 
I initially thought Unicode's use of "grapheme" was so wrong.
Anyway, we still need the extend the meaning of this term to englobe many other 
kinds of characters than plain word-scripting ones. A work that has already 
been for the term "character", both by users and programmers, along generations 
of computing. Even more than for "character" since the original sense of 
"grapheme" is far narrower. Too bad!

Denis
-- 
_________________
vita es estrany
spir.wikidot.com
Top | Discussion index | About this forum | D home