What am I missing here? Is this some UTF conversion issue?
string a;
char[] b;
pragma(msg, typeof(a.take(1).front)); // dchar
pragma(msg, typeof(b.take(1).front)); // dchar
Thread overview | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
|
February 22, 2022 Odd behaviour of std.range | ||||
---|---|---|---|---|
| ||||
What am I missing here? Is this some UTF conversion issue?
|
February 22, 2022 Re: Odd behaviour of std.range | ||||
---|---|---|---|---|
| ||||
Posted in reply to frame | On Tuesday, 22 February 2022 at 12:48:21 UTC, frame wrote:
> What am I missing here? Is this some UTF conversion issue?
`front` is a phobos function. Phobos treats char as special than all other arrays.
It was a naive design flaw that nobody has the courage to fix.
Either just don't use phobos on strings (the language itself treats them sane, you can foreach etc), use the .representation member on them before putting it into any range, or ask why you're doing range operations on a string in the first place and see if the behavior actually kinda makes sense for you.
|
February 22, 2022 Re: Odd behaviour of std.range | ||||
---|---|---|---|---|
| ||||
Posted in reply to frame | On Tuesday, 22 February 2022 at 12:48:21 UTC, frame wrote: >What am I missing here? Is this some UTF conversion issue?
This is a feature of the D standard library known as "auto decoding": >as a convenience, when iterating over a string using the range functions, each element of strings and wstrings is converted into a UTF-32 code-point as each item. This practice, known as auto decoding, means that
|
February 22, 2022 Re: Odd behaviour of std.range | ||||
---|---|---|---|---|
| ||||
Posted in reply to frame | On Tuesday, 22 February 2022 at 12:48:21 UTC, frame wrote: >What am I missing here? Is this some UTF conversion issue?
Welcome to the world of auto decoding, D's million dollar mistake. |
February 22, 2022 Re: Odd behaviour of std.range | ||||
---|---|---|---|---|
| ||||
Posted in reply to Adam D Ruppe | On Tuesday, 22 February 2022 at 12:53:03 UTC, Adam D Ruppe wrote: >On Tuesday, 22 February 2022 at 12:48:21 UTC, frame wrote: >What am I missing here? Is this some UTF conversion issue?
Ah, ok. It directly attaches It was a naive design flaw that nobody has the courage to fix. >... or ask why you're doing range operations on a string in the first place and see if the behavior actually kinda makes sense for you. Because I needed a similar function to |
February 22, 2022 Re: Odd behaviour of std.range | ||||
---|---|---|---|---|
| ||||
Posted in reply to bauss | On Tuesday, 22 February 2022 at 13:25:16 UTC, bauss wrote: >Welcome to the world of auto decoding, D's million dollar mistake. Well, I think it's ok for strings but it shouldn't do it for simple arrays where it's intentional that I want to process the character and not a UTF-8 codepoint. Thank you all. |
February 22, 2022 Re: Odd behaviour of std.range | ||||
---|---|---|---|---|
| ||||
Posted in reply to frame | On Tue, Feb 22, 2022 at 05:25:18PM +0000, frame via Digitalmars-d-learn wrote: > On Tuesday, 22 February 2022 at 13:25:16 UTC, bauss wrote: > > > Welcome to the world of auto decoding, D's million dollar mistake. > > Well, I think it's ok for strings but it shouldn't do it for simple arrays [...] In D, a string *is* an array. `string` is just an alias for `immutable(char)[]`. T -- Gone Chopin. Bach in a minuet. |
February 22, 2022 Re: Odd behaviour of std.range | ||||
---|---|---|---|---|
| ||||
Posted in reply to H. S. Teoh | On Tuesday, 22 February 2022 at 17:33:18 UTC, H. S. Teoh wrote: >On Tue, Feb 22, 2022 at 05:25:18PM +0000, frame via Digitalmars-d-learn wrote: >On Tuesday, 22 February 2022 at 13:25:16 UTC, bauss wrote: >Welcome to the world of auto decoding, D's million dollar mistake. Well, I think it's ok for strings but it shouldn't do it for simple arrays In D, a string is an array. I know, but it's also a type that says "this data belongs together, characters will not change, it's finalized" and it makes sense that it can contain combined bytes for a code point. |
February 22, 2022 Re: Odd behaviour of std.range | ||||
---|---|---|---|---|
| ||||
Posted in reply to frame | On 2/22/22 09:25, frame wrote: > Well, I think it's ok for strings but it shouldn't do it for simple > arrays string is a simple array as well just with immutable(char) as elements. It is just an alias: alias string = immutable(char)[]; > where it's intentional that I want to process the character and > not a UTF-8 codepoint. I understand how auto decoding can be bad but I doubt you need to process a char. char is a UTF-8 code unit, likely one of multiple bytes that represent a Unicode character; an information encoding byte, not the information. That code unit includes encoding bits that tell the decoder whether it is the first character or a continuation character. Not many programmer will ever need to write code to decode UTF-8. Ali |