Jump to page: 1 2 3
Thread overview
Making byLine faster: we should be able to delegate this
Mar 22, 2015
weaselcat
Mar 22, 2015
Sönke Ludwig
Mar 22, 2015
Sönke Ludwig
Mar 22, 2015
Sad panda
Mar 22, 2015
Vladimir Panteleev
Mar 22, 2015
tcak
Mar 23, 2015
John Colvin
Mar 23, 2015
rumbu
Mar 23, 2015
Tobias Pankrath
Mar 23, 2015
rumbu
Mar 29, 2015
bioinfornatics
Mar 29, 2015
bioinfornatics
Mar 31, 2015
bioinfornatics
March 22, 2015
I just took a look at making byLine faster. It took less than one evening:

https://github.com/D-Programming-Language/phobos/pull/3089

I confess I am a bit disappointed with the leadership being unable to delegate this task to a trusty lieutenant in the community. There's been a bug opened on this for a long time, it gets regularly discussed here (with the wrong conclusions ("we must redo D's I/O because FILE* is killing it!") about performance bottlenecks drawn from unverified assumptions), and the techniques used to get a marked improvement in the diff above are trivial fare for any software engineer. The following factors each had a significant impact on speed:

* On OSX (which I happened to test with) getdelim() exists but wasn't being used. I made the implementation use it.

* There was one call to fwide() per line read. I used simple caching (a stream's width cannot be changed once set, making it a perfect candidate for caching).

(As an aside there was some unreachable code in ByLineImpl.empty, which didn't impact performance but was overdue for removal.)

* For each line read there was a call to malloc() and one to free(). I set things up that the buffer used for reading is reused by simply making the buffer static.

* assumeSafeAppend() was unnecessarily used once per line read. Its removal led to a whopping 35% on top of everything else. I'm not sure what it does, but boy it does takes its sweet time. Maybe someone should look into it.

Destroy.


Andrei
March 22, 2015
On Sunday, 22 March 2015 at 07:03:14 UTC, Andrei Alexandrescu wrote:
> I just took a look at making byLine faster. It took less than one evening:
>
> https://github.com/D-Programming-Language/phobos/pull/3089
>
> I confess I am a bit disappointed with the leadership being unable to delegate this task to a trusty lieutenant in the community. There's been a bug opened on this for a long time,

there's thousands of open bugs, and no real ranking of high priority bugs or just minor things.
March 22, 2015
Am 22.03.2015 um 08:18 schrieb weaselcat:
> On Sunday, 22 March 2015 at 07:03:14 UTC, Andrei Alexandrescu wrote:
>> I just took a look at making byLine faster. It took less than one
>> evening:
>>
>> https://github.com/D-Programming-Language/phobos/pull/3089
>>
>> I confess I am a bit disappointed with the leadership being unable to
>> delegate this task to a trusty lieutenant in the community. There's
>> been a bug opened on this for a long time,
>
> there's thousands of open bugs, and no real ranking of high priority
> bugs or just minor things.

We have votes and the importance field:
https://issues.dlang.org/buglist.cgi?bug_status=NEW&bug_status=ASSIGNED&bug_status=REOPENED&columnlist=product%2Ccomponent%2Cassigned_to%2Cbug_status%2Cresolution%2Cshort_desc%2Cchangeddate%2Cvotes&list_id=199241&query_format=advanced&votes=1&votes_type=greaterthaneq

However, the byLine issue does not have particularly high priority by any of those measures.
March 22, 2015
On 3/22/15 12:03 AM, Andrei Alexandrescu wrote:
> I just took a look at making byLine faster. It took less than one evening:
>
> https://github.com/D-Programming-Language/phobos/pull/3089
>
> I confess I am a bit disappointed with the leadership being unable to
> delegate this task to a trusty lieutenant in the community. There's been
> a bug opened on this for a long time, it gets regularly discussed here
> (with the wrong conclusions ("we must redo D's I/O because FILE* is
> killing it!") about performance bottlenecks drawn from unverified
> assumptions), and the techniques used to get a marked improvement in the
> diff above are trivial fare for any software engineer. The following
> factors each had a significant impact on speed:
>
> * On OSX (which I happened to test with) getdelim() exists but wasn't
> being used. I made the implementation use it.
>
> * There was one call to fwide() per line read. I used simple caching (a
> stream's width cannot be changed once set, making it a perfect candidate
> for caching).
>
> (As an aside there was some unreachable code in ByLineImpl.empty, which
> didn't impact performance but was overdue for removal.)
>
> * For each line read there was a call to malloc() and one to free(). I
> set things up that the buffer used for reading is reused by simply
> making the buffer static.
>
> * assumeSafeAppend() was unnecessarily used once per line read. Its
> removal led to a whopping 35% on top of everything else. I'm not sure
> what it does, but boy it does takes its sweet time. Maybe someone should
> look into it.
>
> Destroy.
>
>
> Andrei

* Avoid most calls to GC.sizeOf.

Andrei

March 22, 2015
Am 22.03.2015 um 08:43 schrieb Sönke Ludwig:
> Am 22.03.2015 um 08:18 schrieb weaselcat:
>> On Sunday, 22 March 2015 at 07:03:14 UTC, Andrei Alexandrescu wrote:
>>> I just took a look at making byLine faster. It took less than one
>>> evening:
>>>
>>> https://github.com/D-Programming-Language/phobos/pull/3089
>>>
>>> I confess I am a bit disappointed with the leadership being unable to
>>> delegate this task to a trusty lieutenant in the community. There's
>>> been a bug opened on this for a long time,
>>
>> there's thousands of open bugs, and no real ranking of high priority
>> bugs or just minor things.
>
> We have votes and the importance field:
> https://issues.dlang.org/buglist.cgi?bug_status=NEW&bug_status=ASSIGNED&bug_status=REOPENED&columnlist=product%2Ccomponent%2Cassigned_to%2Cbug_status%2Cresolution%2Cshort_desc%2Cchangeddate%2Cvotes&list_id=199241&query_format=advanced&votes=1&votes_type=greaterthaneq
>
>
> However, the byLine issue does not have particularly high priority by
> any of those measures.

Oh, and bounties of course: https://www.bountysource.com/teams/d/issues
March 22, 2015
On Sunday, 22 March 2015 at 07:03:14 UTC, Andrei Alexandrescu wrote:
> I confess I am a bit disappointed with the leadership being unable to delegate this task to a trusty lieutenant in the community. There's been a bug opened on this for a long time, it gets regularly discussed here (with the wrong conclusions ("we must redo D's I/O because FILE* is killing it!") about performance bottlenecks drawn from unverified assumptions), and the techniques used to get a marked improvement in the diff above are trivial fare for any software engineer. The following factors each had a significant impact on speed:

Lack of developer itch in a comparatively small developer base making the complement of no one dealing with it too small. :c

Cheers for taking the time, though! All the love for devs.
March 22, 2015
On 3/22/15 1:26 AM, Sad panda wrote:
> On Sunday, 22 March 2015 at 07:03:14 UTC, Andrei Alexandrescu wrote:
>> I confess I am a bit disappointed with the leadership being unable to
>> delegate this task to a trusty lieutenant in the community. There's
>> been a bug opened on this for a long time, it gets regularly discussed
>> here (with the wrong conclusions ("we must redo D's I/O because FILE*
>> is killing it!") about performance bottlenecks drawn from unverified
>> assumptions), and the techniques used to get a marked improvement in
>> the diff above are trivial fare for any software engineer. The
>> following factors each had a significant impact on speed:
>
> Lack of developer itch in a comparatively small developer base making
> the complement of no one dealing with it too small. :c

Heh, nicely put :o).

> Cheers for taking the time, though! All the love for devs.

Thanks!


Andrei
March 22, 2015
On Sunday, 22 March 2015 at 07:03:14 UTC, Andrei Alexandrescu wrote:
> * For each line read there was a call to malloc() and one to free(). I set things up that the buffer used for reading is reused by simply making the buffer static.

What about e.g.

zip(File("a.txt").byLine, File("b.txt").byLine)
March 22, 2015
On 3/22/15 3:10 AM, Vladimir Panteleev wrote:
> On Sunday, 22 March 2015 at 07:03:14 UTC, Andrei Alexandrescu wrote:
>> * For each line read there was a call to malloc() and one to free(). I
>> set things up that the buffer used for reading is reused by simply
>> making the buffer static.
>
> What about e.g.
>
> zip(File("a.txt").byLine, File("b.txt").byLine)

No matter, the static buffer is copied into the result. -- Andrei
March 22, 2015
On Sunday, 22 March 2015 at 16:03:11 UTC, Andrei Alexandrescu wrote:
> On 3/22/15 3:10 AM, Vladimir Panteleev wrote:
>> On Sunday, 22 March 2015 at 07:03:14 UTC, Andrei Alexandrescu wrote:
>>> * For each line read there was a call to malloc() and one to free(). I
>>> set things up that the buffer used for reading is reused by simply
>>> making the buffer static.
>>
>> What about e.g.
>>
>> zip(File("a.txt").byLine, File("b.txt").byLine)
>
> No matter, the static buffer is copied into the result. -- Andrei

I didn't see the code though, won't using "static" buffer make the function thread UNSAFE?

I think we should add somewhere in documentation about thread safety as well. Phobos doesn't have any.
« First   ‹ Prev
1 2 3