Thread overview
evenChunks on a string - hasLength constraint fails?
Mar 14, 2023
amarillion
Mar 14, 2023
Paul Backus
Mar 16, 2023
amarillion
March 14, 2023

Hey

I'm trying to split a string down the middle. I thought the function std.range.evenChunks would be perfect for this:

#!/usr/bin/env -S rdmd -I..

import std.range;

void main() {
	string line = "abcdef";
	auto parts = evenChunks(line, 2);
	assert(parts == ["abc", "def"]);
}

But I'm getting a compiler error:

/usr/include/dmd/phobos/std/range/package.d(8569):        Candidate is: `evenChunks(Source)(Source source, size_t chunkCount)`
  with `Source = string`
  whose parameters have the following constraints:
  `~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~`
`    isForwardRange!Source
  > hasLength!Source
`  `~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~`
./test.d(7):        All possible candidates are marked as `deprecated` or `@disable`

I'm trying to understand why this doesn't work. I don't really understand the error. If I interpret this correctly, it's missing a length attribute on a string, but shouldn't length be there?

March 14, 2023

On Tuesday, 14 March 2023 at 08:21:00 UTC, amarillion wrote:

>

I'm trying to understand why this doesn't work. I don't really understand the error. If I interpret this correctly, it's missing a length attribute on a string, but shouldn't length be there?

By default, D's standard library treats a string as a range of Unicode code points (i.e., a range of dchars), encoded in UTF-8. Because UTF-8 is a variable-length encoding, it's impossible to know how many code points there are in a string without iterating it--which means that, as far as the standard library is concerned, string does not have a valid .length property.

This behavior is known as "auto decoding", and is described in more detail in this article by Jack Stouffer:

https://jackstouffer.com/blog/d_auto_decoding_and_you.html

If you do not want the standard library to treat your string as an array of code points, you must use a wrapper like std.utf.byCodeUnit (to get a range of chars) or std.string.representation (to get a range of ubytes). For example:

auto parts = evenChunks(line.byCodeUnit, 2);

Of course, if you do this, there is a risk that you will split a code point in half and end up with invalid Unicode. If your program needs to handle Unicode input, you would be better off finding a different solution—for example, you could use std.range.primitives.walkLength to compute the midpoint of the range by hand, and split it using std.range.chunks:

size_t length = line.walkLength;
auto parts = chunks(line, length / 2);
March 16, 2023

On Tuesday, 14 March 2023 at 18:41:50 UTC, Paul Backus wrote:

>

On Tuesday, 14 March 2023 at 08:21:00 UTC, amarillion wrote:

>

I'm trying to understand why this doesn't work. I don't really understand the error. If I interpret this correctly, it's missing a length attribute on a string, but shouldn't length be there?

By default, D's standard library treats a string as a range of Unicode code points (i.e., a range of dchars), encoded in UTF-8. Because UTF-8 is a variable-length encoding, it's impossible to know how many code points there are in a string without iterating it--which means that, as far as the standard library is concerned, string does not have a valid .length property.

Thanks for the clear explanation! I was already aware that you could iterate by codepoint with foreach(dchar c; s), but it just didn't cross my mind that the same concept was playing a role here. I guess it's just one of those things that you just have to know.

regards,
Amarillion