Thread overview | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
November 17, 2004 switch (dchar[]) | ||||
---|---|---|---|---|
| ||||
Using dchar[] as case-keys within a switch results in: Internal error: s2ir.c 670 http://svn.kuehne.cn/dstress/nocompile/switch_14.d Using multiple identical dchar[]s as case-keys within a switch results in: expression.c:1367: virtual int StringExp::compare(Object*): Assertion `0' failed http://svn.kuehne.cn/dstress/nocompile/switch_13.d I don't know why, but the current documentation states that only "integral types or char[] or wchar[]" are allowed for switch statements. It is certainly useful if wchar[] and floating types are allowed too. Thomas |
November 17, 2004 Re: switch (dchar[]) | ||||
---|---|---|---|---|
| ||||
Posted in reply to Thomas Kuehne | On Wed, 17 Nov 2004 10:09:41 +0100, Thomas Kuehne <thomas-dloop@kuehne.thisisspam.cn> wrote: > <some bugs> > I don't know why, but the current documentation states that only > "integral types or char[] or wchar[]" are allowed for switch statements. > It is certainly useful if wchar[] and floating types are allowed too. > > Thomas you mean dchar[]. Floating points cannot be compared very well, unfortunately. (1.0/3.0 != 0.2/0.6 is possible, for example) But you probably knew that. -- Using Opera's revolutionary e-mail client: http://www.opera.com/m2/ |
November 17, 2004 Re: switch (dchar[]) | ||||
---|---|---|---|---|
| ||||
Posted in reply to Simon Buchan |
Simon Buchan schrieb am Thu, 18 Nov 2004 01:01:50 +1300:
>> I don't know why, but the current documentation states that only "integral types or char[] or wchar[]" are allowed for switch statements. It is certainly useful if wchar[] and floating types are allowed too.
>
> you mean dchar[]. Floating points cannot be compared very well,
> unfortunately.
> (1.0/3.0 != 0.2/0.6 is possible, for example) But you probably knew that.
I'm aware of this problem.
Exactly for this purpose IEEE 754 defines a set of different rounding
modes. D's specification is pretty wage, thus no hard facts for this
discussion.
float.html:
# Rounding Control
# [blah, blah, blah]
Thomas
|
November 17, 2004 Re: switch (dchar[]) | ||||
---|---|---|---|---|
| ||||
Posted in reply to Thomas Kuehne | Thomas Kuehne wrote:
> I don't know why, but the current documentation states that only
> "integral types or char[] or wchar[]" are allowed for switch statements.
> It is certainly useful if wchar[] and floating types are allowed too.
Isn't dchar[] a pretty useless type ?
(dchar isn't, but an UTF-32 string...)
OTOH, good if it doesn't crash anything!
But I suspect that wchar[] is better for
storing a bunch of (dchar) code points ?
--anders
|
November 17, 2004 Re: switch (dchar[]) | ||||
---|---|---|---|---|
| ||||
Posted in reply to Anders F Björklund |
Anders F Björklund schrieb am Wed, 17 Nov 2004 14:09:54 +0100:
>
> Isn't dchar[] a pretty useless type ?
> (dchar isn't, but an UTF-32 string...)
>
> But I suspect that wchar[] is better for
> storing a bunch of (dchar) code points ?
>
When you are dealing with extended CJK, ancient or private scripts
dchar is useful. For simple operations you might use wchar, but
as soon as you start extensive text processing you add an huge amount
of overhead(lookup if this is a surrogate).
Thomas
|
November 17, 2004 Re: switch (dchar[]) | ||||
---|---|---|---|---|
| ||||
Posted in reply to Thomas Kuehne | On Wed, 17 Nov 2004 13:47:46 +0100, Thomas Kuehne <thomas-dloop@kuehne.thisisspam.cn> wrote: at least (punchline drums) we get operators like !<>= <g> -- Using Opera's revolutionary e-mail client: http://www.opera.com/m2/ |
November 17, 2004 Re: switch (dchar[]) | ||||
---|---|---|---|---|
| ||||
Posted in reply to Thomas Kuehne | Thomas Kuehne wrote: > When you are dealing with extended CJK, ancient or private scripts > dchar is useful. For simple operations you might use wchar, but as soon as you start extensive text processing you add an huge amount > of overhead(lookup if this is a surrogate). OK. My legacy Unicode code is all in java, which does wchar only... (Java only got support for surrogates in the brand new 1.5 version) I was actually talking about the array representation: dchar[], not the variable which might as well be declared dchar (32-bit registers anyway) To be honest, I just used "foreach (dchar c; str)" and let D worry about the implementation. Then again, str is just a standard char[]. My texts are just ISO-8859-1, with about 90-95% of it being US-ASCII. (actually most of them are in "MacRoman"*, but that's about the same) --anders PS. * = http://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/ROMAN.TXT |
November 17, 2004 Re: switch (dchar[]) | ||||
---|---|---|---|---|
| ||||
Posted in reply to Thomas Kuehne | In article <h00s62-v9a.ln1@kuehne.cn>, Thomas Kuehne says...
>
>Anders F Björklund schrieb am Wed, 17 Nov 2004 14:09:54 +0100:
>>
>> Isn't dchar[] a pretty useless type ?
>> (dchar isn't, but an UTF-32 string...)
>>
>> But I suspect that wchar[] is better for
>> storing a bunch of (dchar) code points ?
>
>When you are dealing with extended CJK, ancient or private scripts
>dchar is useful. For simple operations you might use wchar, but
>as soon as you start extensive text processing you add an huge amount
>of overhead(lookup if this is a surrogate).
Semi-related question. Is it possible for there to be multiple UTF-8 (or UTF-16) sequences which represent the same UTF-32 character? I would assume not, but don't want to make any assumptions.
Sean
|
November 17, 2004 [OT] Re: switch (dchar[]) | ||||
---|---|---|---|---|
| ||||
Posted in reply to Sean Kelly | Sean Kelly schrieb am Wed, 17 Nov 2004 19:45:54 +0000 (UTC): >>> Isn't dchar[] a pretty useless type ? >>> (dchar isn't, but an UTF-32 string...) >>> >>> But I suspect that wchar[] is better for >>> storing a bunch of (dchar) code points ? >>When you are dealing with extended CJK, ancient or private scripts >>dchar is useful. For simple operations you might use wchar, but >>as soon as you start extensive text processing you add an huge amount >>of overhead(lookup if this is a surrogate). > Semi-related question. Is it possible for there to be multiple UTF-8 (or UTF-16) sequences which represent the same UTF-32 character? I would assume not, but don't want to make any assumptions. The used encodings could technically present one codepoint with different UTF-16/UTF-8 sequences. But the standards require you to use the shortest possible sequence. Please don't confuse characters and codepoints. e.g "small Latin letter a with accent grave" can be represented in with 2 different codepoint sequences and thus with different UTF8/16 sequences. Thomas |
November 17, 2004 Re: [OT] Re: switch (dchar[]) | ||||
---|---|---|---|---|
| ||||
Posted in reply to Thomas Kuehne | In article <bens62-32r.ln1@kuehne.cn>, Thomas Kuehne says...
>
>
>Sean Kelly schrieb am Wed, 17 Nov 2004 19:45:54 +0000 (UTC):
>>>> Isn't dchar[] a pretty useless type ?
>>>> (dchar isn't, but an UTF-32 string...)
>>>>
>>>> But I suspect that wchar[] is better for
>>>> storing a bunch of (dchar) code points ?
>
>>>When you are dealing with extended CJK, ancient or private scripts
>>>dchar is useful. For simple operations you might use wchar, but
>>>as soon as you start extensive text processing you add an huge amount
>>>of overhead(lookup if this is a surrogate).
>
>> Semi-related question. Is it possible for there to be multiple UTF-8 (or UTF-16) sequences which represent the same UTF-32 character? I would assume not, but don't want to make any assumptions.
>
>The used encodings could technically present one codepoint with different UTF-16/UTF-8 sequences. But the standards require you to use the shortest possible sequence.
>
>Please don't confuse characters and codepoints.
>e.g "small Latin letter a with accent grave" can be represented in
>with 2 different codepoint sequences and thus with different UTF8/16
>sequences.
The reason I asked was for string matching. I wanted to be sure there was no advantage to doing comparisons in UTF-32 vs. UTF-8, for example. So you're saying that while it's theoretically possible to have two different UTF-8/16 sequences present the same codepoint, the requirements of the standard make this effectively impossible. Is that correct?
Sean
|
Copyright © 1999-2021 by the D Language Foundation