View mode: basic / threaded / horizontal-split · Log in · Help
January 14, 2011
Re: VLERange: a range in between BidirectionalRange and RandomAccessRange
"Nick Sabalausky" <a@a.a> wrote in message 
news:igori7$1ovh$1@digitalmars.com...
> "Andrei Alexandrescu" <SeeWebsiteForEmail@erdani.org> wrote in message 
> news:igoqrm$1n5r$1@digitalmars.com...
>> On 1/13/11 10:26 PM, Nick Sabalausky wrote:
>> [snip]
>>> [ 'f', {u with the umlaut}, 'n', 'f' ]
>>>
>>> Or:
>>>
>>> [ 'f', 'u', {umlaut combining character}, 'n', 'f' ]
>>>
>>> Those *both* get rendered exactly the same, and both represent the same
>>> four-letter sequence. In the second example, the 'u' and the {umlaut
>>> combining character} combine to form one grapheme. The f's and n's just
>>> happen to be single-code-point graphemes.
>>>
>>> Note that while some characters exist in pre-combined form (such as the 
>>> {u
>>> with the umlaut} above), legend has it there are others than can only be
>>> represented using a combining character.
>>>
>>> It's also my understanding, though I'm not certain, that sometimes 
>>> multiple
>>> combining characters can be used together on the same "root" character.
>>
>> Thanks. One further question is: in the above example with u-with-umlaut, 
>> there is one code point that corresponds to the entire combination. Are 
>> there combinations that do not have a unique code point?
>>
>
> My understanding is "yes". At least that's what I've heard, and I've never 
> heard any claims of "no". I don't know of any specific ones offhand, 
> though. Actually, it might be possible to use any combining character with 
> any old letter or number (like maybe a 7 with an umlaut), though I'm not 
> certain.
>
> FWIW, the Wikipedia article might help, or at least link to other things 
> that might help: http://en.wikipedia.org/wiki/Combining_character
>
> Michel or spir might have better links though.
>

Heh, as if that wasn't bad enough, there's also digraphs which, from what I 
can tell, seem to be single code-points that represent more than one 
glyph/character/grapheme:

http://en.wikipedia.org/wiki/Digraph_(orthography)#Digraphs_in_Unicode

This page may be helpful too:
http://en.wikipedia.org/wiki/Precomposed_character
January 14, 2011
Re: VLERange: a range in between BidirectionalRange and RandomAccessRange
Am 14.01.2011 08:00, schrieb Nick Sabalausky:
> "Nick Sabalausky"<a@a.a>  wrote in message
> news:igori7$1ovh$1@digitalmars.com...
>> "Andrei Alexandrescu"<SeeWebsiteForEmail@erdani.org>  wrote in message
>> news:igoqrm$1n5r$1@digitalmars.com...
>>> On 1/13/11 10:26 PM, Nick Sabalausky wrote:
>>> [snip]
>>>> [ 'f', {u with the umlaut}, 'n', 'f' ]
>>>>
>>>> Or:
>>>>
>>>> [ 'f', 'u', {umlaut combining character}, 'n', 'f' ]
>>>>
>>>> Those *both* get rendered exactly the same, and both represent the same
>>>> four-letter sequence. In the second example, the 'u' and the {umlaut
>>>> combining character} combine to form one grapheme. The f's and n's just
>>>> happen to be single-code-point graphemes.
>>>>
>>>> Note that while some characters exist in pre-combined form (such as the
>>>> {u
>>>> with the umlaut} above), legend has it there are others than can only be
>>>> represented using a combining character.
>>>>
>>>> It's also my understanding, though I'm not certain, that sometimes
>>>> multiple
>>>> combining characters can be used together on the same "root" character.
>>>
>>> Thanks. One further question is: in the above example with u-with-umlaut,
>>> there is one code point that corresponds to the entire combination. Are
>>> there combinations that do not have a unique code point?
>>>
>>
>> My understanding is "yes". At least that's what I've heard, and I've never
>> heard any claims of "no". I don't know of any specific ones offhand,
>> though. Actually, it might be possible to use any combining character with
>> any old letter or number (like maybe a 7 with an umlaut), though I'm not
>> certain.
>>
>> FWIW, the Wikipedia article might help, or at least link to other things
>> that might help: http://en.wikipedia.org/wiki/Combining_character
>>
>> Michel or spir might have better links though.
>>
>
> Heh, as if that wasn't bad enough, there's also digraphs which, from what I
> can tell, seem to be single code-points that represent more than one
> glyph/character/grapheme:
>
> http://en.wikipedia.org/wiki/Digraph_(orthography)#Digraphs_in_Unicode
>
> This page may be helpful too:
> http://en.wikipedia.org/wiki/Precomposed_character
>

OMG, this is really fucked up.
Can't we just go back to 8bit charsets like ISO 8859-* etc? :/
January 14, 2011
Re: VLERange: a range in between BidirectionalRange and RandomAccessRange
On Fri, 14 Jan 2011 01:44:19 -0500, Nick Sabalausky <a@a.a> wrote:

> "Andrei Alexandrescu" <SeeWebsiteForEmail@erdani.org> wrote in message
> news:igoqrm$1n5r$1@digitalmars.com...
>> On 1/13/11 10:26 PM, Nick Sabalausky wrote:
>> [snip]
>>> [ 'f', {u with the umlaut}, 'n', 'f' ]
>>>
>>> Or:
>>>
>>> [ 'f', 'u', {umlaut combining character}, 'n', 'f' ]
>>>
>>> Those *both* get rendered exactly the same, and both represent the same
>>> four-letter sequence. In the second example, the 'u' and the {umlaut
>>> combining character} combine to form one grapheme. The f's and n's just
>>> happen to be single-code-point graphemes.
>>>
>>> Note that while some characters exist in pre-combined form (such as the
>>> {u
>>> with the umlaut} above), legend has it there are others than can only  
>>> be
>>> represented using a combining character.
>>>
>>> It's also my understanding, though I'm not certain, that sometimes
>>> multiple
>>> combining characters can be used together on the same "root" character.
>>
>> Thanks. One further question is: in the above example with  
>> u-with-umlaut,
>> there is one code point that corresponds to the entire combination. Are
>> there combinations that do not have a unique code point?
>>
>
> My understanding is "yes". At least that's what I've heard, and I've  
> never
> heard any claims of "no". I don't know of any specific ones offhand,  
> though.
> Actually, it might be possible to use any combining character with any  
> old
> letter or number (like maybe a 7 with an umlaut), though I'm not certain.
>
> FWIW, the Wikipedia article might help, or at least link to other things
> that might help: http://en.wikipedia.org/wiki/Combining_character

http://en.wikipedia.org/wiki/Unicode_normalization

Linked from that page, the normalization process is probably something we  
need to look at.  Using decomposed canonical form would mean we need more  
state than just what code-unit are we on, plus it creates more likelyhood  
that a match will be found with part of a grapheme (spir or Michel brought  
it up earlier).  So I think the correct case is to use composed canonical  
form.  This is after just reading that page, so maybe I'm missing  
something.

Non-composable combinations would be a problem.  The string range is  
formed on the basis that the element type is a dchar.  If there are  
combinations that cannot be composed into a single dchar, then the element  
type has to be a dchar array (or some other type which contains all the  
info).  The other option is to simply leave them decomposed.  Then you  
risk things like partial matches.

I'm leaning towards a solution like this: While iterating a string, it  
should output dchars in normalized composed form.  But a specialized  
comparison function should be used when doing things like searches or  
regex, because it might not be possible to compose two combining  
characters.

The drawback to this is that a dchar might not be able to represent a  
grapheme (only if it cannot be composed), but I think it's too much of a  
hit in complexity and performance to make the element type of a string  
larger than a dchar.

Those who wish to work with a more comprehensive string type can use a  
more complex string type such as the one created by spir.

Does that sound reasonable?

-Steve
January 14, 2011
Re: VLERange: a range in between BidirectionalRange and RandomAccessRange
Am 14.01.2011 07:26, schrieb Nick Sabalausky:
> "Andrei Alexandrescu"<SeeWebsiteForEmail@erdani.org>  wrote in message
> news:igoj6s$17r6$1@digitalmars.com...
>>
>> I'm not so sure about that. What do you base this assessment on? Denis
>> wrote a library that according to him does grapheme-related stuff nobody
>> else does. So apparently graphemes is not what people care about (although
>> it might be what they should care about).
>>
>
> It's what they want, they just don't know it.
>
> Graphemes are what many people *think* code points are.
>

Agreed. Up until spir mentioned graphemes in this newsgroup I always 
thought that one Unicode code point == one character on the screen.

I guess in the majority of use cases you want to operate on user 
perceived characters.
January 14, 2011
Re: VLERange: a range in between BidirectionalRange and RandomAccessRange
On Friday 14 January 2011 04:47:59 Steven Schveighoffer wrote:
> On Fri, 14 Jan 2011 01:44:19 -0500, Nick Sabalausky <a@a.a> wrote:
> > "Andrei Alexandrescu" <SeeWebsiteForEmail@erdani.org> wrote in message
> > news:igoqrm$1n5r$1@digitalmars.com...
> > 
> >> On 1/13/11 10:26 PM, Nick Sabalausky wrote:
> >> [snip]
> >> 
> >>> [ 'f', {u with the umlaut}, 'n', 'f' ]
> >>> 
> >>> Or:
> >>> 
> >>> [ 'f', 'u', {umlaut combining character}, 'n', 'f' ]
> >>> 
> >>> Those *both* get rendered exactly the same, and both represent the same
> >>> four-letter sequence. In the second example, the 'u' and the {umlaut
> >>> combining character} combine to form one grapheme. The f's and n's just
> >>> happen to be single-code-point graphemes.
> >>> 
> >>> Note that while some characters exist in pre-combined form (such as the
> >>> {u
> >>> with the umlaut} above), legend has it there are others than can only
> >>> be
> >>> represented using a combining character.
> >>> 
> >>> It's also my understanding, though I'm not certain, that sometimes
> >>> multiple
> >>> combining characters can be used together on the same "root" character.
> >> 
> >> Thanks. One further question is: in the above example with
> >> u-with-umlaut,
> >> there is one code point that corresponds to the entire combination. Are
> >> there combinations that do not have a unique code point?
> > 
> > My understanding is "yes". At least that's what I've heard, and I've
> > never
> > heard any claims of "no". I don't know of any specific ones offhand,
> > though.
> > Actually, it might be possible to use any combining character with any
> > old
> > letter or number (like maybe a 7 with an umlaut), though I'm not certain.
> > 
> > FWIW, the Wikipedia article might help, or at least link to other things
> > that might help: http://en.wikipedia.org/wiki/Combining_character
> 
> http://en.wikipedia.org/wiki/Unicode_normalization
> 
> Linked from that page, the normalization process is probably something we
> need to look at.  Using decomposed canonical form would mean we need more
> state than just what code-unit are we on, plus it creates more likelyhood
> that a match will be found with part of a grapheme (spir or Michel brought
> it up earlier).  So I think the correct case is to use composed canonical
> form.  This is after just reading that page, so maybe I'm missing
> something.
> 
> Non-composable combinations would be a problem.  The string range is
> formed on the basis that the element type is a dchar.  If there are
> combinations that cannot be composed into a single dchar, then the element
> type has to be a dchar array (or some other type which contains all the
> info).  The other option is to simply leave them decomposed.  Then you
> risk things like partial matches.
> 
> I'm leaning towards a solution like this: While iterating a string, it
> should output dchars in normalized composed form.  But a specialized
> comparison function should be used when doing things like searches or
> regex, because it might not be possible to compose two combining
> characters.
> 
> The drawback to this is that a dchar might not be able to represent a
> grapheme (only if it cannot be composed), but I think it's too much of a
> hit in complexity and performance to make the element type of a string
> larger than a dchar.

Well, there's plenty in std.string that already deals in strings rather than 
dchar, and for the most part, any case where you couldn't fit a grapheme in a 
dchar could be covered by using a string.

> Those who wish to work with a more comprehensive string type can use a
> more complex string type such as the one created by spir.
> 
> Does that sound reasonable?

We really should have something along those lines it seems. From what little _I_ 
know, the basic approach that you suggest seems like the correct one, but 
perhaps someone more knowledgeable will be able to come up with a reason why 
it's not a good idea. Certainly, I think that any solution that I'd come up with 
would be similar to what you're suggesting.

- Jonathan M Davis
January 14, 2011
Re: VLERange: a range in between BidirectionalRange and RandomAccessRange
On 01/14/2011 05:23 AM, Andrei Alexandrescu wrote:

>> That's forgetting that most of the time people care about graphemes
>> (user-perceived characters), not code points.
>
> I'm not so sure about that. What do you base this assessment on? Denis
> wrote a library that according to him does grapheme-related stuff nobody
> else does. So apparently graphemes is not what people care about
> (although it might be what they should care about).

I'm aware of that, and I have no definitive answer to the question. The 
issue *does* exist --as shown even by trivial examples such as Michel's 
below, not corner cases. The actual question is _not_ whether code or 
"grapheme" is the proper level of abstraction. To this, the answer is 
clear: codes are simply meaningless in 99% cases. (All historic software 
deal with chars, conceptually, but they happen too be coded with single 
codes.)
(And what about Objective-C? Why did its designers even bother with that?).

The question is rather: why do we nearly all happily go on ignoring the 
issue? My present guess is a combination of factors:

* The issue is masked by the misleading use of "abstract character" in 
unicode literature. "Abstract" is very correct, but they should have 
found another term as "character", say "abstract scripting mark". Their 
deceiving terminological choice lets most programmers believe that 
codepoints code characters, like in historic charsets.
(Even worse: some doc explicitely states that ICU's notion of character 
matches the programming notion of character.)
* ICU added precomposed codes for a bunch of characters, supposedly for 
backward compatility with said charsets. (But where is the gain? We need 
to decode them anyway...) The consequence is, at the pedagogical level, 
very bad: most text-producing software (like editors) use such 
precomposed codes when available for a given character. So that 
programmers can happily go on believing in the code=character myth. 
(Note: the gain in space is ridiculous for western text.)
* Most characters that appear in western texts (at least "official" 
characters of natural languages) have precomposed forms.
* Programmers can very easily be unaware their code is incorrect: how do 
you even notice it in test output?

Thus, practically, programmers can (1) simply don't know the issue (2) 
have code that really works in typical use cases for their software (3) 
do not notice their code runs incorrectly.
There is also an intermediate situation between (2) & (3), similar to 
old problems with previous ASCII-only apps: they work wrongly when used 
in a non-english environment, but what can users do, concretely? Most 
often, they just have to cope with incorrectness, reinterpret outputs 
differently, and/or find workarounds by cheating with the interface.

The responsability of designers of tools for programmers is, imo, 
important. We should make the issue clear, first (very difficult, it's 
an ubiquitous myth to break down), and propose services that run 
correctly in situations where said issue is relevant, here manipulation 
of universal text, even if not very efficient at start.
On my side, and about D, I wish that most D programmers (1) are aware of 
the problem (2) understand its why's & how's (3) know there is a correct 
solution. Then, (4) use it actually is their choice (and I don't care 
whether or not they do).

>>>> It also supports this:
>>>>
>>>> foreach(i, d; s)
>>>> {
>>>> writeln("The character in position ", i, " is ", d);
>>>> }
>>>>
>>>> where i is the index (might not be sequential)
>>>
>>> Well string supports that too, albeit with the nit that you need to
>>> specify dchar.
>>
>> Except it breaks with combining characters. For instance, take the
>> string "t̃", which is two code points -- 't' followed by combining tilde
>> (U+0303) -- and you'll get the following output:
>>
>> The character in position 0 is t
>> The character in position 1 is ̃
>>
>> (Note that the tilde becomes combined with the preceding space
>> character.)
>>
>> The conception of character that normal people have does not match the
>> notion of code points when combining characters enters the equation.
>
> This might be a good time to see whether we need to address graphemes
> systematically. Could you please post a few links that would educate me
> and others in the mysteries of combining characters?

Beware! far too long text. 
https://bitbucket.org/denispir/denispir-d/src/c572ccaefa33/U%20missing%20level%20of%20abstraction
(the directory above contains the current rough implementation of Text, 
plus a bit of its brother package DUnicode)

> Thanks,
>
> Andrei

Denis
_________________
vita es estrany
spir.wikidot.com
January 14, 2011
Re: VLERange: a range in between BidirectionalRange and RandomAccessRange
On 01/14/2011 07:26 AM, Nick Sabalausky wrote:
> "Andrei Alexandrescu"<SeeWebsiteForEmail@erdani.org>  wrote in message
> news:igoj6s$17r6$1@digitalmars.com...
>>
>> I'm not so sure about that. What do you base this assessment on? Denis
>> wrote a library that according to him does grapheme-related stuff nobody
>> else does. So apparently graphemes is not what people care about (although
>> it might be what they should care about).
>>
>
> It's what they want, they just don't know it.
>
> Graphemes are what many people *think* code points are.
>
>>
>> This might be a good time to see whether we need to address graphemes
>> systematically. Could you please post a few links that would educate me
>> and others in the mysteries of combining characters?
>>
>
> Maybe someone else has a link to an explanation (I don't), but it's
> basically just this:

If anyone finds a pointer to such an explanation, bravo, and than you. 
(You will certainly not find it in Unicode literature, for instance.)
Nick's explanation below is good and concise. (Just 2 notes added.)

> Three levels of abstraction from lowest to highest:
> - Code Unit (ie, encoding)
> - Code Point (ie, what Unicode assigns distinct numbers to)
> - Grapheme (ie, what we think of as a "character")
>
> A code-point can be made up of one or more code-units. Likewise, a grapheme
> can be made up of one or more code-points.
>
> There are (at least) two types of code points:
>
> - Regular ones, such as letters, digits, and punctuation.
>
> - "Combining Characters", such as accent marks (or if you're familiar with
> Japanese, the little things in the upper-right corner that change an "s" to
> a "z" or an "h" to a "p". Or like German's umlaut - the two dots above a
> vowel). Ie, things that are not characters in their own right, but merely
> modify other characters. These can be often (always?) be thought of as being
> like overlays.

You can also say there are 2 kinds of characters: simple like "u" & 
composite "ü" or "ṵ̈̈". The former are coded with a single (base) code, 
the latter with one (rarely more) base codes and an arbitrary number of 
combining codes.

For a majority of _common_ characters made of 2 or 3 codes (western 
language letters, korean Hangul syllables,...), precombined codes have 
been added to the set. Thus, they can be coded with a single code like 
simple characters.

[Also note, to avoid things be too simple ;-), some (few) combining 
codes called "prepend" come _before_ the base in raw code sequence...]

> If a code point representing a "combining character" exists in a string,
> then instead of being displayed as a character it merely modifies whatever
> code-point came before it.
>
> So, for instance, if you want to store the German word for five (in all
> lower-case), there are two ways to do it:
>
> [ 'f', {u with the umlaut}, 'n', 'f' ]
>
> Or:
>
> [ 'f', 'u', {umlaut combining character}, 'n', 'f' ]

Note: the second form is the base form for Unicode. There are reasons to 
have chosen it (see my text), and why UCS does not and simply cannot 
propose precomposed codes for all possible composite characters.

> Those *both* get rendered exactly the same, and both represent the same
> four-letter sequence. In the second example, the 'u' and the {umlaut
> combining character} combine to form one grapheme. The f's and n's just
> happen to be single-code-point graphemes.
>
> Note that while some characters exist in pre-combined form (such as the {u
> with the umlaut} above), legend has it there are others than can only be
> represented using a combining character.
>
> It's also my understanding, though I'm not certain, that sometimes multiple
> combining characters can be used together on the same "root" character.

There is no logical limit, only practical such as how to display 3 
diacritics above the same base? You can invent a script for a mythical 
folk's language if you like :-)
Also, some examples of real language characters (Hebrew, IIRC) in 
Unicode test data sets hold up to 8 codes.

> Caveat: There may very well be further complications that I'm not aware of.
> Heck, knowing Unicode, there probably are.

Denis
_________________
vita es estrany
spir.wikidot.com
January 14, 2011
Re: VLERange: a range in between BidirectionalRange and RandomAccessRange
On Fri, 14 Jan 2011 08:14:02 -0500, spir <denis.spir@gmail.com> wrote:

> On 01/14/2011 05:23 AM, Andrei Alexandrescu wrote:
>
>>> That's forgetting that most of the time people care about graphemes
>>> (user-perceived characters), not code points.
>>
>> I'm not so sure about that. What do you base this assessment on? Denis
>> wrote a library that according to him does grapheme-related stuff nobody
>> else does. So apparently graphemes is not what people care about
>> (although it might be what they should care about).
>
> I'm aware of that, and I have no definitive answer to the question. The  
> issue *does* exist --as shown even by trivial examples such as Michel's  
> below, not corner cases. The actual question is _not_ whether code or  
> "grapheme" is the proper level of abstraction. To this, the answer is  
> clear: codes are simply meaningless in 99% cases. (All historic software  
> deal with chars, conceptually, but they happen too be coded with single  
> codes.)
> (And what about Objective-C? Why did its designers even bother with  
> that?).
>
> The question is rather: why do we nearly all happily go on ignoring the  
> issue? My present guess is a combination of factors:
>
> * The issue is masked by the misleading use of "abstract character" in  
> unicode literature. "Abstract" is very correct, but they should have  
> found another term as "character", say "abstract scripting mark". Their  
> deceiving terminological choice lets most programmers believe that  
> codepoints code characters, like in historic charsets.
> (Even worse: some doc explicitely states that ICU's notion of character  
> matches the programming notion of character.)
> * ICU added precomposed codes for a bunch of characters, supposedly for  
> backward compatility with said charsets. (But where is the gain? We need  
> to decode them anyway...) The consequence is, at the pedagogical level,  
> very bad: most text-producing software (like editors) use such  
> precomposed codes when available for a given character. So that  
> programmers can happily go on believing in the code=character myth.  
> (Note: the gain in space is ridiculous for western text.)
> * Most characters that appear in western texts (at least "official"  
> characters of natural languages) have precomposed forms.
> * Programmers can very easily be unaware their code is incorrect: how do  
> you even notice it in test output?

* I don't even know how to make a grapheme that is more than one  
code-unit, let alone more than one code-point :)  Every time I try, I get  
'invalid utf sequence'.

I feel significantly ignorant on this issue, and I'm slowly getting enough  
knowledge to join the discussion, but being a dumb American who only  
speaks English, I have a hard time grasping how this shit all works.

-Steve
January 14, 2011
Re: VLERange: a range in between BidirectionalRange and RandomAccessRange
On 01/14/2011 07:33 AM, Andrei Alexandrescu wrote:
> Thanks. One further question is: in the above example with
> u-with-umlaut, there is one code point that corresponds to the entire
> combination. Are there combinations that do not have a unique code point?

See my previous follow-up to nick's explanation. But the answer is yes, 
not only for usual characters, but due to the fact that a user is, 
theoratically and practically, totally free to combine base ad combining 
codes --even to invent chracters. The only limit is that fonts will not 
know how to display unprobable combinations.
(See also my presentation text, shows an example of dots below and above 
greek letters.)

Denis
_________________
vita es estrany
spir.wikidot.com
January 14, 2011
Re: VLERange: a range in between BidirectionalRange and RandomAccessRange
On 01/14/2011 07:44 AM, Nick Sabalausky wrote:
> "Andrei Alexandrescu"<SeeWebsiteForEmail@erdani.org>  wrote in message
> news:igoqrm$1n5r$1@digitalmars.com...
>> On 1/13/11 10:26 PM, Nick Sabalausky wrote:
>> [snip]
>>> [ 'f', {u with the umlaut}, 'n', 'f' ]
>>>
>>> Or:
>>>
>>> [ 'f', 'u', {umlaut combining character}, 'n', 'f' ]
>>>
>>> Those *both* get rendered exactly the same, and both represent the same
>>> four-letter sequence. In the second example, the 'u' and the {umlaut
>>> combining character} combine to form one grapheme. The f's and n's just
>>> happen to be single-code-point graphemes.
>>>
>>> Note that while some characters exist in pre-combined form (such as the
>>> {u
>>> with the umlaut} above), legend has it there are others than can only be
>>> represented using a combining character.
>>>
>>> It's also my understanding, though I'm not certain, that sometimes
>>> multiple
>>> combining characters can be used together on the same "root" character.
>>
>> Thanks. One further question is: in the above example with u-with-umlaut,
>> there is one code point that corresponds to the entire combination. Are
>> there combinations that do not have a unique code point?
>>
>
> My understanding is "yes". At least that's what I've heard, and I've never
> heard any claims of "no". I don't know of any specific ones offhand, though.
> Actually, it might be possible to use any combining character with any old
> letter or number (like maybe a 7 with an umlaut), though I'm not certain.

The problem is then whether a font knows how to display it. My usual 
fonts (DejaVu series, pretty good with Unicode) show:
	7̈
meaning they do not know how to combine digits with diacritics (they do 
it well with other rather strange combinations.)

But: one of the relevant advantages of decomposed forms is that when 
they don't know the character, they can still show at least the 
component marks, here '7' & '~'. Which is better than nothing for a user 
who knows the scripting system. If I try to display for instance a 
_precomposed_ syllable from a language my font does not know, i will get 
instead either a little square with the codepoint written inside in 
minuscules digits, or a placeholder like inversed-video "?".


denis
_________________
vita es estrany
spir.wikidot.com
2 3 4 5 6 7 8 9 10
Top | Discussion index | About this forum | D home