March 16, 2014
On Sunday, 16 March 2014 at 17:51:31 UTC, Andrei Alexandrescu
wrote:
> On 3/16/14, 10:49 AM, bearophile wrote:
>> "byDupLines"
>
> It introduces the notion of "dup" to newbies. I'd rather go with a natural name.
>

If the newbie already knows that there is a difference between
byLine and byLineWhatever doesn't s/he already know about the
concept of dup?

Anyway
byLineCopy or
byLineCopied
sounds natural for me (native german).
March 16, 2014
On 3/16/14, 11:20 AM, HeiHon wrote:
> On Sunday, 16 March 2014 at 17:51:31 UTC, Andrei Alexandrescu
> wrote:
>> On 3/16/14, 10:49 AM, bearophile wrote:
>>> "byDupLines"
>>
>> It introduces the notion of "dup" to newbies. I'd rather go with a
>> natural name.
>>
>
> If the newbie already knows that there is a difference between
> byLine and byLineWhatever doesn't s/he already know about the
> concept of dup?

Depends on the situation. Consider a D tutorial. It would feature little programs like "copy a file" or "put each line in a hashtable" etc.

> Anyway
> byLineCopy or
> byLineCopied
> sounds natural for me (native german).

Thanks.


Andrei

March 16, 2014
HeiHon:

>> byLineCopy or
>> byLineCopied
>> sounds natural for me (native german).

byLineCopy sounds good.

Bye,
bearophile
March 16, 2014
On Sunday, 16 March 2014 at 16:58:36 UTC, Andrei Alexandrescu wrote:
> A classic idiom for reading lines and keeping them is f.byLine.map!(x => x.idup) to get strings instead of the buffer etc.
>
> The current behavior trips new users on occasion, and the idiom solving it is very frequent. So what the heck - let's put that in a function, expose and document it nicely, and call it a day.
>
> A good name would help a lot. Let's paint that bikeshed!
>
>
> Andrei

f.lines

March 16, 2014
On Sunday, 16 March 2014 at 16:58:36 UTC, Andrei Alexandrescu wrote:
> A classic idiom for reading lines and keeping them is f.byLine.map!(x => x.idup) to get strings instead of the buffer etc.
>
> The current behavior trips new users on occasion, and the idiom solving it is very frequent. So what the heck - let's put that in a function, expose and document it nicely, and call it a day.
>
> A good name would help a lot. Let's paint that bikeshed!
>
>
> Andrei

I'm for it in the sense that : "f.byLine.map!(x => x.idup)" is *WRONG*.

It'll allocate on *every* call to front. Pipe it into a filter, and you have massive gratuitous memory allocations.

A named function will do better than the sloppy code above. So there's my vote...

...or... we could merge my "cache" proposal (https://github.com/D-Programming-Language/phobos/pull/1364). I'll admit I wrote it with *that* particular case in mind. Then, we promote:

"f.byLine.map!(x => x.idup).cache()"

But wait! Now it correct, but *definitly* not nooby friendly. So yes, my vote is to have a named function.

On Sunday, 16 March 2014 at 17:51:31 UTC, Andrei Alexandrescu wrote:
> On 3/16/14, 10:49 AM, bearophile wrote:
>> A good function name for the copying version is:
>>
>> "byDupLines"
>
> It introduces the notion of "dup" to newbies. I'd rather go with a natural name.
>
> Andrei

There comes a point where you have to learn the language to use it. "dup" is an array built-in; it's not ridiculous that expect the user to know it.

On Sunday, 16 March 2014 at 18:19:01 UTC, bearophile wrote:
> Once dup/idup become free functions in object you can also write:
>
> f.byLine.map!idup

Indeed, I've been wanting to write this before. IMO, not being able to write it is a serious inconsistency.
March 16, 2014
On 3/16/14, 2:50 PM, Andrei Alexandrescu wrote:
> On 3/16/14, 10:49 AM, bearophile wrote:
>> So I think byLine should dup on default, and it should not dup on
>> request.
>
> That will not happen.
>
> Andrei
>

Why not?

Don't optimize upfront. If later you find out that was the bottleneck then you change byLine to something more efficient and that's it. And new users don't get unexpected results.

I'm with bearophile here.
March 16, 2014
On 3/16/14, 12:28 PM, Ary Borenszweig wrote:
> On 3/16/14, 2:50 PM, Andrei Alexandrescu wrote:
>> On 3/16/14, 10:49 AM, bearophile wrote:
>>> So I think byLine should dup on default, and it should not dup on
>>> request.
>>
>> That will not happen.
>>
>> Andrei
>>
>
> Why not?

Performance regression.

Andrei

March 16, 2014
On Sunday, 16 March 2014 at 19:45:21 UTC, Andrei Alexandrescu wrote:
> On 3/16/14, 12:28 PM, Ary Borenszweig wrote:
>> On 3/16/14, 2:50 PM, Andrei Alexandrescu wrote:
>>> On 3/16/14, 10:49 AM, bearophile wrote:
>>>> So I think byLine should dup on default, and it should not dup on
>>>> request.
>>>
>>> That will not happen.
>>>
>>> Andrei
>>>
>>
>> Why not?
>
> Performance regression.
>
> Andrei

Not just that, but it would be a breaking change if we make the element type `string` instead of `char[]`, and if we *don't* do that, we end up in a poor compromise where it's neither optimally usable nor performant.
March 16, 2014
On Sunday, 16 March 2014 at 18:14:18 UTC, Jakob Ovrum wrote:
> On Sunday, 16 March 2014 at 18:06:00 UTC, Vladimir Panteleev wrote:
>> On Sunday, 16 March 2014 at 16:58:36 UTC, Andrei Alexandrescu wrote:
>>> A classic idiom for reading lines and keeping them is f.byLine.map!(x => x.idup) to get strings instead of the buffer etc.
>>>
>>> The current behavior trips new users on occasion, and the idiom solving it is very frequent. So what the heck - let's put that in a function, expose and document it nicely, and call it a day.
>>>
>>> A good name would help a lot. Let's paint that bikeshed!
>>
>> For the record, if you want to keep all lines in memory anyway, it's more efficient to just read the whole file at once then split it with splitLines(), because you avoid doing one memory allocation per line. The downside is if you want to keep only some of the lines on the heap in a long-running program - with this approach, the slices pin the entire file content.
>
> Reading all at once is also a problem for really big files.

It is no different from:

f.byLine.map!(x => x.idup).array

...which is why I said "if you want to keep all lines in memory anyway".
March 16, 2014
On Sunday, 16 March 2014 at 17:49:56 UTC, bearophile wrote:
> Andrei Alexandrescu:
>
>> A classic idiom for reading lines and keeping them is f.byLine.map!(x => x.idup) to get strings instead of the buffer etc.
>
> This is essentially this issue, I will reopen it if you want:
> https://d.puremagic.com/issues/show_bug.cgi?id=4474
>
>
>> The current behavior trips new users on occasion,
>
> In D the default behaviors should be "not tripping". And the optimized behavour should be on explicit request.
>
> So I think byLine should dup on default, and it should not dup on request. As I explained in Issue 4474.
>
> Perhaps this breaking change (I asked in Issue 4474) in byLine can't happen now.
>
>
>> and the idiom solving it is very frequent. So what the heck - let's put that in a function, expose and document it nicely, and call it a day.
>>
>> A good name would help a lot. Let's paint that bikeshed!
>
> A good function name for the copying version is:
>
> "byDupLines"
>
>
> An alternative solution is the opposite of that I was suggesting in Issue 4474:
>
> byLine => not dup
>
> byLine!true => copies every line with dup
>
> Bye,
> bearophile.

Can't it be as simple as adding a new overload for byLine?

auto byLine(Terminator = char, Char = char, Flag!"cacheLines" cacheLines = Yes.cacheLines)(KeepTerminator keepTerminator = KeepTerminator.no, Terminator terminator = '\x0a')

Otherwise, we should probably just merge MonarchDodra's cache range adapter, and could even add a specialization for byLine's ByLine struct.