Thread overview
Odd behaviour of std.range
Feb 22, 2022
frame
Feb 22, 2022
Adam D Ruppe
Feb 22, 2022
frame
Feb 22, 2022
Paul Backus
Feb 22, 2022
bauss
Feb 22, 2022
frame
Feb 22, 2022
H. S. Teoh
Feb 22, 2022
frame
Feb 22, 2022
Ali Çehreli
February 22, 2022

What am I missing here? Is this some UTF conversion issue?

string a;
char[] b;

pragma(msg, typeof(a.take(1).front)); // dchar
pragma(msg, typeof(b.take(1).front)); // dchar
February 22, 2022
On Tuesday, 22 February 2022 at 12:48:21 UTC, frame wrote:
> What am I missing here? Is this some UTF conversion issue?

`front` is a phobos function. Phobos treats char as special than all other arrays.

It was a naive design flaw that nobody has the courage to fix.

Either just don't use phobos on strings (the language itself treats them sane, you can foreach etc), use the .representation member on them before putting it into any range, or ask why you're doing range operations on a string in the first place and see if the behavior actually kinda makes sense for you.
February 22, 2022

On Tuesday, 22 February 2022 at 12:48:21 UTC, frame wrote:

>

What am I missing here? Is this some UTF conversion issue?

string a;
char[] b;

pragma(msg, typeof(a.take(1).front)); // dchar
pragma(msg, typeof(b.take(1).front)); // dchar

This is a feature of the D standard library known as "auto decoding":

>

as a convenience, when iterating over a string using the range functions, each element of strings and wstrings is converted into a UTF-32 code-point as each item. This practice, known as auto decoding, means that

static assert(is(typeof(utf8.front) == dchar));

Source: https://tour.dlang.org/tour/en/gems/unicode

February 22, 2022

On Tuesday, 22 February 2022 at 12:48:21 UTC, frame wrote:

>

What am I missing here? Is this some UTF conversion issue?

string a;
char[] b;

pragma(msg, typeof(a.take(1).front)); // dchar
pragma(msg, typeof(b.take(1).front)); // dchar

Welcome to the world of auto decoding, D's million dollar mistake.

February 22, 2022

On Tuesday, 22 February 2022 at 12:53:03 UTC, Adam D Ruppe wrote:

>

On Tuesday, 22 February 2022 at 12:48:21 UTC, frame wrote:

>

What am I missing here? Is this some UTF conversion issue?

front is a phobos function. Phobos treats char as special than all other arrays.

Ah, ok. It directly attaches front to the string, regardless of the function. That is the problem.

>

It was a naive design flaw that nobody has the courage to fix.

>

... or ask why you're doing range operations on a string in the first place and see if the behavior actually kinda makes sense for you.

Because I needed a similar function to tail that takes care of the length and even it's trivial to implement it by myself, I just thought it's better to use a function that is already there.

February 22, 2022

On Tuesday, 22 February 2022 at 13:25:16 UTC, bauss wrote:

>

Welcome to the world of auto decoding, D's million dollar mistake.

Well, I think it's ok for strings but it shouldn't do it for simple arrays where it's intentional that I want to process the character and not a UTF-8 codepoint.

Thank you all.

February 22, 2022
On Tue, Feb 22, 2022 at 05:25:18PM +0000, frame via Digitalmars-d-learn wrote:
> On Tuesday, 22 February 2022 at 13:25:16 UTC, bauss wrote:
> 
> > Welcome to the world of auto decoding, D's million dollar mistake.
> 
> Well, I think it's ok for strings but it shouldn't do it for simple arrays
[...]

In D, a string *is* an array. `string` is just an alias for
`immutable(char)[]`.


T

-- 
Gone Chopin. Bach in a minuet.
February 22, 2022

On Tuesday, 22 February 2022 at 17:33:18 UTC, H. S. Teoh wrote:

>

On Tue, Feb 22, 2022 at 05:25:18PM +0000, frame via Digitalmars-d-learn wrote:

>

On Tuesday, 22 February 2022 at 13:25:16 UTC, bauss wrote:

>

Welcome to the world of auto decoding, D's million dollar mistake.

Well, I think it's ok for strings but it shouldn't do it for simple arrays
[...]

In D, a string is an array. string is just an alias for
immutable(char)[].

I know, but it's also a type that says "this data belongs together, characters will not change, it's finalized" and it makes sense that it can contain combined bytes for a code point. char[] is just an array to work with. It should be seen as a collection of single characters. If you want auto decoding, use a string instead.

February 22, 2022
On 2/22/22 09:25, frame wrote:

> Well, I think it's ok for strings but it shouldn't do it for simple
> arrays

string is a simple array as well just with immutable(char) as elements. It is just an alias:

  alias string = immutable(char)[];

> where it's intentional that I want to process the character and
> not a UTF-8 codepoint.

I understand how auto decoding can be bad but I doubt you need to process a char. char is a UTF-8 code unit, likely one of multiple bytes that represent a Unicode character; an information encoding byte, not the information. That code unit includes encoding bits that tell the decoder whether it is the first character or a continuation character. Not many programmer will ever need to write code to decode UTF-8.

Ali