December 10, 2015
I want to combine block reads with lazy conversion of utf-8 characters to dchars. Solution I came with is in the program below. This works fine. Has good performance, etc.

Question I have is if there is a better way to do this. For example, a different way to construct the lazy 'decodeUTF8Range' rather than writing it out in this fashion. There is quite a bit of power in the library and I'm still learning it. I'm wondering if I overlooked a useful alternative.

--Jon

Program:
-----------

import std.algorithm: each, joiner, map;
import std.conv;
import std.range;
import std.stdio;
import std.traits;
import std.utf: decodeFront;

auto decodeUTF8Range(Range)(Range charSource)
    if (isInputRange!Range && is(Unqual!(ElementType!Range) == char))
{
    static struct Result
    {
        private Range source;
        private dchar next;

        bool empty = false;
        dchar front() @property { return next; }
        void popFront() {
            if (source.empty) {
                empty = true;
                next = dchar.init;
            } else {
                next = source.decodeFront;
            }
        }
    }
    auto r = Result(charSource);
    r.popFront;
    return r;
}

void main(string[] args)
{
    if (args.length != 2) { writeln("Provide one file name."); return; }

    ubyte[1024*1024] rawbuf;
    auto inputStream = args[1].File();
    inputStream
        .byChunk(rawbuf)        // Read in blocks
        .joiner                 // Join the blocks into a single input char range
        .map!(a => to!char(a))  // Cast ubyte to char for decodeFront. Any better ways?
        .decodeUTF8Range        // utf8 to dchar conversion.
        .each;                  // Real work goes here.
    writeln("done");
}

December 10, 2015
On Thursday, 10 December 2015 at 00:36:27 UTC, Jon D wrote:
> Question I have is if there is a better way to do this. For example, a different way to construct the lazy 'decodeUTF8Range' rather than writing it out in this fashion.

A further thought - The decodeUTF8Range function is basically constructing a lazy wrapper range around decodeFront, which is effectively combining a 'front' and 'popFront' operation. So perhaps a generic way to compose a wrapper for such functions.

>
> auto decodeUTF8Range(Range)(Range charSource)
>     if (isInputRange!Range && is(Unqual!(ElementType!Range) == char))
> {
>     static struct Result
>     {
>         private Range source;
>         private dchar next;
>
>         bool empty = false;
>         dchar front() @property { return next; }
>         void popFront() {
>             if (source.empty) {
>                 empty = true;
>                 next = dchar.init;
>             } else {
>                 next = source.decodeFront;
>             }
>         }
>     }
>     auto r = Result(charSource);
>     r.popFront;
>     return r;
> }