fixedstring: a @safe, @nogc string type (page 2)

On Tuesday, 11 January 2022 at 17:55:28 UTC, H. S. Teoh wrote: > Generally, I'd advise not conflating your containers with ranges over your containers: I'd make .opSlice return a traditional D slice (i.e., const(char)[]) instead of a FixedString, and just require writing `[]` when you need to iterate over the string as a range: > > FixedString!64 mystr; > foreach (ch; mystr[]) { // <-- iterates over const(char)[] > ... > } > > This way, no redundant copying of data is done during iteration. It already does this. In D2, `[]` is handled by a zero-argument `opIndex` overload, not by `opSlice`. [1] FixedString has such an overload [2], and it does, in fact, return a slice. [1] https://dlang.org/spec/operatoroverloading.html#slice [2] https://github.com/Moth-Tolias/fixedstring/blob/v1.0.0/source/fixedstring.d#L105

January 12, 2022

Re: fixedstring: a @safe, @nogc string type

Posted by Moth
in reply to WebFreak001

Permalink

Moth

Posted in reply to WebFreak001

Permalink

On Tuesday, 11 January 2022 at 12:22:36 UTC, WebFreak001 wrote:

[snip]

you can relatively easily find out how many bytes a string takes up with std.utf. You can also iterate by code points or graphemes there if you want to translate some kind of character index to byte position.

HOWEVER it's not clear what a character is. Sure for the posted cases here it's no problem but when it comes to languages based on combining glyphs together to form new glyphs it's no longer clear what is a character. There are Graphemes (grapheme clusters) which are probably the closest to what everybody would think a character is, but IIRC there are edge cases with that a programmer wouldn't expect, like adding a character not increasing the count of characters of the string because it merges with the last Grapheme. Additionally there is a performance impact on using Graphemes over simpler things like codepoints which fit 98% of use-cases with strings. Codepoints in D are mapped 1:1 using dchar, take up to 2 wchars or up to 4 chars. You can use std.utf to compute byte lengths for a codepoint given a string.

aha, i think i might have miscommunicated here - i was talking about an error i thought i was having where a fixedstring of "áéíóú" wasn't equal to a string literal of the same, but as it turned out i was misreading the error message [i had been trying to assign a literal larger than the fixedstring could take]. to tell the truth, unicode awareness is... not something i really want to mess with right now, lol. it would be nice to have the option at some point in the future though.

I would rather suggest you support FixedString with types other than char. (wchar, dchar, heck users could even use any arbitrary type and use this as array class) For languages that commonly use more than 1 byte per codepoint or for interop with Win32 unicode APIs, JavaScript strings, C# strings, UTF16 files in general, etc. programmers might opt to use FixedString with wchar then.

With D's templates that should be quite easy to do (add a template parameter to the struct like struct FixedString(size_t maxSize, CharT = char) and replace all usage of char in your code with CharT in this case)

i've pushed an update to the repo for this! =] it was a bit more complicated than a simple replace all, but not too hard.

On Tuesday, 11 January 2022 at 17:55:28 UTC, H. S. Teoh wrote:

[snip]

One minor usability issue I found just glancing over the code: many of your methods take char[] as argument. Generally, you want const(char)[] instead, so that it will work with both char[] and immutable(char)[]. No reason why you can't copy some immutable chars into a FixedString, for example.

they should all already be in char[]? i've added a test to confirm it works with both char[] and immutable(char)[] and it compiles fine.

[snip]
Another issue is the way concatenation is implemented. Since FixedStrings have compile-time size, this potentially means every time you concatenate a string in your code you get another instantiation of FixedString. This can lead to a LOT of template bloat if you're not careful, which may quickly outweigh any benefits you may have gained from not using the built-in strings.

oh dear, that doesn't sound good. i hadn't considered that at all. i'm not sure how to even begin going about fixing that...

On Wed, Jan 12, 2022 at 07:55:41PM +0000, Moth via Digitalmars-d-announce wrote: > On Tuesday, 11 January 2022 at 17:55:28 UTC, H. S. Teoh wrote: [...] > > One minor usability issue I found just glancing over the code: many of your methods take char[] as argument. Generally, you want const(char)[] instead, so that it will work with both char[] and immutable(char)[]. No reason why you can't copy some immutable chars into a FixedString, for example. > > they should all already be `in char[]`? i've added a test to confirm it works with both `char[]` and `immutable(char)[]` and it compiles fine. [...] Oh you're right! I totally missed that. Sorry, my bad. T -- Talk is cheap. Whining is actually free. -- Lars Wirzenius

Forums