Thread overview
RE: std.array.array broken?
Feb 01, 2014
Andrej Mitrovic
Feb 02, 2014
bearophile
Re: std.array.array broken?
Feb 02, 2014
deed
Feb 02, 2014
Peter Alexander
Feb 02, 2014
Stanislav Blinov
Feb 02, 2014
Andrej Mitrovic
February 01, 2014
In reference to this thread:
http://forum.dlang.org/thread/ouyuujnzzvfkvxbfzyak@forum.dlang.org#post-ouyuujnzzvfkvxbfzyak:40forum.dlang.org

Personally I think it was a mistake providing unsafe APIs by default. If I would have had it my way, I would introduce:

byLine -> safe, doesn't reuse a buffer
byLineBuffer -> reuses a buffer

That way you get safe-by-default operations for the vast majority of users, and a speedy version for those who need it when they need it.

This is similar to how the new regex APIs encode in their name exactly what they do, e.g. the new matchAll is self-describing rather than guessing whether match() has a default mode of "g" that matches all or not.

It's probably too late to change byLine now. But warnings and notes in the comments have so far been unfruitful. I can't imagine many people are aware of warnings, and some warnings are so ridiculously long that it makes you question why a function was made to have so many caveats. For a classic example, read the warnings for toUTFz: http://dlang.org/phobos/std_utf.html#.toUTFz

Safe and simple should be the default, leave the "if ((cast(size_t)p & 3) && *p == '\0') return str.ptr" wizardry for a separately named function that provides these speed benefits at the cost of safety.
February 02, 2014
Andrej Mitrovic:

> Personally I think it was a mistake providing unsafe APIs by default. If I would have had it my way, I would introduce:
>
> byLine -> safe, doesn't reuse a buffer
> byLineBuffer -> reuses a buffer

I agree. I proposed something related lot of time ago, see (the original title of this ER was "Safer stdin.byLine()"):
http://d.puremagic.com/issues/show_bug.cgi?id=4474

Bye,
bearophile
February 02, 2014
On 2/1/14, 3:07 PM, Andrej Mitrovic wrote:
> In reference to this thread:
> http://forum.dlang.org/thread/ouyuujnzzvfkvxbfzyak@forum.dlang.org#post-ouyuujnzzvfkvxbfzyak:40forum.dlang.org
>
>
> Personally I think it was a mistake providing unsafe APIs by default. If
> I would have had it my way, I would introduce:
>
> byLine -> safe, doesn't reuse a buffer
> byLineBuffer -> reuses a buffer

No. Too much breakage.

Andrei

February 02, 2014
On Sunday, 2 February 2014 at 01:03:25 UTC, Andrei Alexandrescu wrote:
> On 2/1/14, 3:07 PM, Andrej Mitrovic wrote:
>> In reference to this thread:
>> http://forum.dlang.org/thread/ouyuujnzzvfkvxbfzyak@forum.dlang.org#post-ouyuujnzzvfkvxbfzyak:40forum.dlang.org
>>
>>
>> Personally I think it was a mistake providing unsafe APIs by default. If
>> I would have had it my way, I would introduce:
>>
>> byLine -> safe, doesn't reuse a buffer
>> byLineBuffer -> reuses a buffer
>
> No. Too much breakage.
>
> Andrei


From the docs it appears as array() will handle the required copying.

std.array.array doc:
---
Returns a newly-allocated dynamic array consisting of a copy of the input range, static array, dynamic array, or class or struct with an opApply function r. Note that narrow strings are handled as a special case in an overload.
---

std.stdio.byLine doc:
---
Returns an input range set up to read from the file handle one line at a time.

The element type for the range will be Char[]. Range primitives may throw StdioException on I/O error.

Note:
Each front will not persist after popFront is called, so the caller must copy its contents (e.g. by calling to!string) if retention is needed.
---
February 02, 2014
On Sunday, 2 February 2014 at 01:03:25 UTC, Andrei Alexandrescu wrote:
> On 2/1/14, 3:07 PM, Andrej Mitrovic wrote:
>> In reference to this thread:
>> http://forum.dlang.org/thread/ouyuujnzzvfkvxbfzyak@forum.dlang.org#post-ouyuujnzzvfkvxbfzyak:40forum.dlang.org
>>
>>
>> Personally I think it was a mistake providing unsafe APIs by default. If
>> I would have had it my way, I would introduce:
>>
>> byLine -> safe, doesn't reuse a buffer
>> byLineBuffer -> reuses a buffer
>
> No. Too much breakage.
>
> Andrei

Agreed.

I wonder if the problem can be fixed another way:

1. Introduce a new function ("File.lines" perhaps) which is like byLine, but safe, and has an option to re-use a buffer (but isn't default).
2. After a while, remove documentation for byLine, but leave it in Phobos.

This way, newcomers will never see byLine and will get safe behaviour by default with "lines", and existing code will continue to work using the undocumented byLine.
February 02, 2014
On Sunday, 2 February 2014 at 01:03:25 UTC, Andrei Alexandrescu wrote:

>> I would have had it my way, I would introduce:
>>
>> byLine -> safe, doesn't reuse a buffer
>> byLineBuffer -> reuses a buffer
>
> No. Too much breakage.

How exactly is it breakage? The user code:

- will not stop to compile
- will not stop to link
- will still produce expected results

The only thing that can "break" is that the user code will lose performance where it actually does make an explicit copy. This can be solved by introducing a message with pragma(msg), directing users to get rid of unnecessary copying. The message could stick around for several releases.
February 02, 2014
On 2/2/14, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:
> On 2/1/14, 3:07 PM, Andrej Mitrovic wrote:
>> byLine -> safe, doesn't reuse a buffer
>> byLineBuffer -> reuses a buffer
>
> No. Too much breakage.

No, I meant before the function was even introduced. But for future new APIs we could be more careful. byLine resulting in "strange results" is one of the most asked about things in IRC and the DForums, here's a short list of threads I could find in a quick search:

std.array.array broken? http://forum.dlang.org/thread/ouyuujnzzvfkvxbfzyak@forum.dlang.org#post-ouyuujnzzvfkvxbfzyak:40forum.dlang.org

Reading file by line, weird result http://forum.dlang.org/thread/iklwhshvwqbubzpvfcgu@forum.dlang.org

csvReader byLine http://forum.dlang.org/thread/mailman.1694.1340281202.24740.digitalmars-d@puremagic.com#post-mailman.1713.1340376472.24740.digitalmars-d:40puremagic.com

persistent byLine http://forum.dlang.org/thread/ksj7b6$86b$1@digitalmars.com

array(file.byLine()) is a problem
http://forum.dlang.org/thread/bug-6495-3@http.d.puremagic.com%2Fissues%2F

std.stdio.ByLine is not true input range http://forum.dlang.org/thread/bug-8084-3@http.d.puremagic.com%2Fissues%2F

Read Complete File to Array of Lines http://forum.dlang.org/thread/aimdwqgymyuajjbsycfj@forum.dlang.org#post-mefabsmxvzwahzdlkvnp:40forum.dlang.org

File.byLine should return dups? http://forum.dlang.org/thread/hubkh9$1k6$1@digitalmars.com

Safer stdin.byLine()
http://forum.dlang.org/thread/bug-4474-3@http.d.puremagic.com/issues/