Thread overview
Appending char[] to char[][] has unexpected results
May 01, 2013
Tim Keating
May 01, 2013
anonymous
May 01, 2013
Tim Keating
May 01, 2013
Peter Alexander
May 01, 2013
Not sure whether this is a bug, or perhaps I'm misunderstanding something, but it seems like this should work:

void main()
{
	char[][] outBuf;
	auto f = File("testData.txt", "r");
	char[] buf;

	writeln("\n**** RAW OUTPUT *****");

	while (f.readln(buf))
	{
		write(buf);
		outBuf ~= buf;
	}

	writeln("\n**** BUFFERED OUTPUT *****");

	foreach (line; outBuf)
	{
		write(line);
	}
}

testData.txt is just a couple of lines of miscellaneous text. The expectation is that the raw output and the buffered output should be exactly the same... but they are not. (If anyone would like to see this for themselves, I stuck it in github: https://github.com/MrTact/CharBug.)

Changing the types of outBuf and buf to dchar works as expected. Changing outBuf to a string[] and appending buf.idup does as well.
May 01, 2013
On Wednesday, 1 May 2013 at 03:54:23 UTC, Tim Keating wrote:
> Not sure whether this is a bug, or perhaps I'm misunderstanding something, but it seems like this should work:
>
> void main()
> {
> 	char[][] outBuf;
> 	auto f = File("testData.txt", "r");
> 	char[] buf;
>
> 	writeln("\n**** RAW OUTPUT *****");
>
> 	while (f.readln(buf))
> 	{
> 		write(buf);
> 		outBuf ~= buf;
> 	}
>
> 	writeln("\n**** BUFFERED OUTPUT *****");
>
> 	foreach (line; outBuf)
> 	{
> 		write(line);
> 	}
> }
>
> testData.txt is just a couple of lines of miscellaneous text. The expectation is that the raw output and the buffered output should be exactly the same... but they are not. (If anyone would like to see this for themselves, I stuck it in github: https://github.com/MrTact/CharBug.)
>
> Changing the types of outBuf and buf to dchar works as expected. Changing outBuf to a string[] and appending buf.idup does as well.

Just outBuf ~= buf.dup; works, too. Without .dup you're overwriting and appending the same chunk of memory again and again.
From the documentation on File.readln
(<http://dlang.org/phobos/std_stdio#readln>): "Note that reusing the buffer means that the previous contents of it has to be copied if needed."
I'm a bit puzzled as for why it behaves differently with dchar.
May 01, 2013
On Wednesday, 1 May 2013 at 04:33:28 UTC, anonymous wrote:
> Just outBuf ~= buf.dup; works, too. Without .dup you're overwriting and appending the same chunk of memory again and again.
> From the documentation on File.readln
> (<http://dlang.org/phobos/std_stdio#readln>): "Note that reusing the buffer means that the previous contents of it has to be copied if needed."
> I'm a bit puzzled as for why it behaves differently with dchar.

Okay, that was obviously the bit I was missing. The dchar situation IS baffling -- if that hadn't worked, I would have been more certain I was simply doing something wrong.
May 01, 2013
On Wednesday, 1 May 2013 at 13:56:48 UTC, Tim Keating wrote:
> On Wednesday, 1 May 2013 at 04:33:28 UTC, anonymous wrote:
>> Just outBuf ~= buf.dup; works, too. Without .dup you're overwriting and appending the same chunk of memory again and again.
>> From the documentation on File.readln
>> (<http://dlang.org/phobos/std_stdio#readln>): "Note that reusing the buffer means that the previous contents of it has to be copied if needed."
>> I'm a bit puzzled as for why it behaves differently with dchar.
>
> Okay, that was obviously the bit I was missing. The dchar situation IS baffling -- if that hadn't worked, I would have been more certain I was simply doing something wrong.

The wchar and dchar versions don't reuse the buffer. Not sure why. Here's the implementation, complete with relevant TODO

    size_t readln(C)(ref C[] buf, dchar terminator = '\n') if (isSomeChar!C && !is(C == enum))
    {
        static if (is(C == char))
        {
            enforce(_p && _p.handle, "Attempt to read from an unopened file.");
            return readlnImpl(_p.handle, buf, terminator);
        }
        else
        {
            // TODO: optimize this
            string s = readln(terminator);
            if (!s.length) return 0;
            buf.length = 0;
            foreach (wchar c; s)
            {
                buf ~= c;
            }
            return buf.length;
        }
    }

Oh dear!
May 02, 2013
	On Wed, 01 May 2013 13:19:16 -0700, Peter Alexander <peter.alexander.au@gmail.com> wrote:


> The wchar and dchar versions don't reuse the buffer. Not sure why. Here's the implementation, complete with relevant TODO

Wow, that is really awful.

Needs immediate improvement.  I would say the following code would be at least a bandaid-fix:

            ...
	    if(buf.length == buf.capacity) {
		buf.length = 0;
		buf.assumeSafeAppend();
            } else {
		buf.length = 0;
	    }
            foreach (wchar c; s)
            {
                buf ~= c;
            }
            ...

Refactor as desired.

-Steve
May 02, 2013
On Wed, 01 May 2013 06:56:47 -0700, Tim Keating <mrtact@gmail.com> wrote:

> On Wednesday, 1 May 2013 at 04:33:28 UTC, anonymous wrote:
>> Just outBuf ~= buf.dup; works, too. Without .dup you're overwriting and appending the same chunk of memory again and again.
>> From the documentation on File.readln
>> (<http://dlang.org/phobos/std_stdio#readln>): "Note that reusing the buffer means that the previous contents of it has to be copied if needed."
>> I'm a bit puzzled as for why it behaves differently with dchar.
>
> Okay, that was obviously the bit I was missing. The dchar situation IS baffling -- if that hadn't worked, I would have been more certain I was simply doing something wrong.

Note that it could have worked even with utf8 depending on the input file.  Although I agree the library code is not ideal, this is not an excuse ;)

-Steve