Why is there no lazy `format`?

Oct 20, 2020

burt

Oct 20, 2020

rikki cattermole

Oct 20, 2020

Oct 20, 2020

Oct 20, 2020

Oct 20, 2020

Hello, I noticed that there is the function `formattedWrite`, which outputs its resulting strings to an output range, as follows: ``` unittest { auto output = appender!string(); output.formattedWrite!"%s %s"(1, 2); assert(output.data == "1 2"); } ``` But why is there no formatting function that returns a lazy input range? That way, string formatting with (barely) any allocation would be possible, in the following way: ``` @nogc unittest { auto range = formatRange!"%s %s"(42, 43); assert(range.front == "42"); range.popFront(); assert(range.front == " "); range.popFront(); assert(range.front == "43"); range.popFront(); assert(range.empty); } ``` The range returned by `formatRange` could have an internal buffer of maybe 16 characters that stores small strings, e.g. for small integers. It would also allow chaining with other range algorithms: you would call `.joiner()` on it to get an input range of chars. Is this something worth including in the standard library (presumably in std.format)? (The same may also be possible for `std.conv.text` but I did not look into this.)

You are describing the purpose of an output range. I.e. void test() { InPlaceAppender appender; appender.formattedWrite!"%d: %d"(123, 456); stdout.rawWrite(appender.get); } struct InPlaceAppender { private { char[ushort.max] buffer; size_t used; } @disable this(this); void put(char c) { assert(used < buffer.length); buffer[used++] = c; } scope char[] get() { return buffer[0 .. used]; } void reset() { used = 0; buffer[] = '\0'; } }

On Tuesday, 20 October 2020 at 13:45:20 UTC, rikki cattermole wrote: > You are describing the purpose of an output range. > > I.e. > > [...] I see. However, this still feels wrong; after all, we also do not use an output range for algorithms like map: ``` OutputRange output; [1, 2, 3].map!((x) => x + 1)(output); ``` Mostly because it does not allow chaining like `lazyFormat("%d plus %d is %d", 1, 2, 3).joiner().map!toUpper()`. It still feels incosistent to me. An input range could achieve the same goals, but it would be much more flexible and pleasing to use.

On 10/20/20 9:28 AM, burt wrote: > > The range returned by `formatRange` could have an internal buffer of maybe 16 characters that stores small strings, e.g. for small integers. It would also allow chaining with other range algorithms: you would call `.joiner()` on it to get an input range of chars. > > Is this something worth including in the standard library (presumably in std.format)? > > (The same may also be possible for `std.conv.text` but I did not look into this.) I think it's possible, but also it needs a buffer. Which means it needs to allocate. Even a 16 character buffer might not be enough. std.format is not designed around tracking an in-progress conversion, so you would have to convert whole things at once. It might not be that desirable. For example: formatRange("%s", someLargeArrayOrStruct); this is going to have to buffer the *whole thing*, and then give you lazy access to the buffer. In order for this to work, I think you would have to redesign how format works. It's not an easy thing, but could be an interesting way of looking at it. Note that you can probably mimic this with fibers, but that's really heavy for this task. And you still need to allocate a buffer. -Steve

October 20, 2020

Re: Why is there no lazy `format`?

Posted by H. S. Teoh
in reply to Steven Schveighoffer

Permalink

H. S. Teoh

Posted in reply to Steven Schveighoffer

Permalink

On Tue, Oct 20, 2020 at 01:10:12PM -0400, Steven Schveighoffer via Digitalmars-d wrote: [...]
> std.format is not designed around tracking an in-progress conversion, so you would have to convert whole things at once. It might not be that desirable.
> 
> For example:
> 
> formatRange("%s", someLargeArrayOrStruct);
> 
> this is going to have to buffer the *whole thing*, and then give you lazy access to the buffer.

Yeah, I think std.format's design isn't really conducive to lazy access. Also, the way the OP wrote the example code isn't really consistent, because it appears to be returning segments of the formatted string rather than characters in the string, i.e., it behaves like `string[]` rather than `string`, which isn't how std.format is designed to work.

If anything, perhaps what's closer to what the OP wants is a lazy version of text(), because there you can actually individually format arguments lazily.  But nonetheless, as Steven said, you still need a buffer of arbitrary size because the .toString of an arbitrary user-defined type can return an arbitrary amount of formatted data.  You also cannot impose @nogc, because .toString methods can potentially be allocating (complex ones almost certainly will).

In such scenarios, output ranges are a much better way to control allocations -- the caller specifies the allocation scheme (by passing in an output range that implements the desired allocation scheme).

What *would* be nice, is a standard library construct for inverting an output range into an input range. Fibers is one way of doing this. Basically, the pipeline up to the output range will run in its own fiber, and initially it's backgrounded. As data is requested from the input range end of the interface, it will context-switch to the output range fiber and generate data which gets saved into a buffer. At some point calling Fiber.yield(); then the input range end will start spooling the generated data to the caller.  Once the buffered data is exhausted, it context-switches to the output range fiber again, etc..

Note that this does not alleviate the need for buffering, and it's not 100% lazy; what it primarily does is to give a nice input range interface for stuff written into an output range.  I don't expect it will do very well performance-wise either, unless the data generators are designed to cooperate with the inverter -- but in that case, they would have been written to return an input range instead of requiring an output range in the first place. So this construct is really more for convenience than anything.

T

-- 
Любишь кататься - люби и саночки возить.

On Tuesday, 20 October 2020 at 18:03:32 UTC, H. S. Teoh wrote: > [...] > > Yeah, I think std.format's design isn't really conducive to lazy access. Also, the way the OP wrote the example code isn't really consistent, because it appears to be returning segments of the formatted string rather than characters in the string, i.e., it behaves like `string[]` rather than `string`, which isn't how std.format is designed to work. Well, the idea was that you could call `join()` or `flatten()` or whatever it is called to turn it into an input range of chars. But it could also do that directly. I understand now why returning an input range could be problematic though. > [...] > What *would* be nice, is a standard library construct for inverting an output range into an input range. Fibers is one way of doing this. Basically, the pipeline up to the output range will run in its own fiber, and initially it's backgrounded. As data is requested from the input range end of the interface, it will context-switch to the output range fiber and generate data which gets saved into a buffer. At some point calling Fiber.yield(); then the input range end will start spooling the generated data to the caller. Once the buffered data is exhausted, it context-switches to the output range fiber again, etc.. > > Note that this does not alleviate the need for buffering, and it's not 100% lazy; what it primarily does is to give a nice input range interface for stuff written into an output range. I don't expect it will do very well performance-wise either, unless the data generators are designed to cooperate with the inverter -- but in that case, they would have been written to return an input range instead of requiring an output range in the first place. So this construct is really more for convenience than anything. Interesting idea. Although maybe it doesn't even have to use fibers to work, if you're willing to give up the laziness part: ``` /*ref*/ O pipeRange(alias fn, O, T...)(/*ref*/ O output, T args) if (isInputRange!O && isOutputRange!O) { fn(output, args); return output; } auto thing = appender!string() .pipeRange!formattedWrite("%d plus %d is %d", 1, 2, 3) .map!toUpperCase() .array(); ``` Or something like that.

Forums