Thread overview | ||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
March 22, 2015 Making byLine faster: we should be able to delegate this | ||||
---|---|---|---|---|
| ||||
I just took a look at making byLine faster. It took less than one evening: https://github.com/D-Programming-Language/phobos/pull/3089 I confess I am a bit disappointed with the leadership being unable to delegate this task to a trusty lieutenant in the community. There's been a bug opened on this for a long time, it gets regularly discussed here (with the wrong conclusions ("we must redo D's I/O because FILE* is killing it!") about performance bottlenecks drawn from unverified assumptions), and the techniques used to get a marked improvement in the diff above are trivial fare for any software engineer. The following factors each had a significant impact on speed: * On OSX (which I happened to test with) getdelim() exists but wasn't being used. I made the implementation use it. * There was one call to fwide() per line read. I used simple caching (a stream's width cannot be changed once set, making it a perfect candidate for caching). (As an aside there was some unreachable code in ByLineImpl.empty, which didn't impact performance but was overdue for removal.) * For each line read there was a call to malloc() and one to free(). I set things up that the buffer used for reading is reused by simply making the buffer static. * assumeSafeAppend() was unnecessarily used once per line read. Its removal led to a whopping 35% on top of everything else. I'm not sure what it does, but boy it does takes its sweet time. Maybe someone should look into it. Destroy. Andrei |
March 22, 2015 Re: Making byLine faster: we should be able to delegate this | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | On Sunday, 22 March 2015 at 07:03:14 UTC, Andrei Alexandrescu wrote:
> I just took a look at making byLine faster. It took less than one evening:
>
> https://github.com/D-Programming-Language/phobos/pull/3089
>
> I confess I am a bit disappointed with the leadership being unable to delegate this task to a trusty lieutenant in the community. There's been a bug opened on this for a long time,
there's thousands of open bugs, and no real ranking of high priority bugs or just minor things.
|
March 22, 2015 Re: Making byLine faster: we should be able to delegate this | ||||
---|---|---|---|---|
| ||||
Posted in reply to weaselcat | Am 22.03.2015 um 08:18 schrieb weaselcat: > On Sunday, 22 March 2015 at 07:03:14 UTC, Andrei Alexandrescu wrote: >> I just took a look at making byLine faster. It took less than one >> evening: >> >> https://github.com/D-Programming-Language/phobos/pull/3089 >> >> I confess I am a bit disappointed with the leadership being unable to >> delegate this task to a trusty lieutenant in the community. There's >> been a bug opened on this for a long time, > > there's thousands of open bugs, and no real ranking of high priority > bugs or just minor things. We have votes and the importance field: https://issues.dlang.org/buglist.cgi?bug_status=NEW&bug_status=ASSIGNED&bug_status=REOPENED&columnlist=product%2Ccomponent%2Cassigned_to%2Cbug_status%2Cresolution%2Cshort_desc%2Cchangeddate%2Cvotes&list_id=199241&query_format=advanced&votes=1&votes_type=greaterthaneq However, the byLine issue does not have particularly high priority by any of those measures. |
March 22, 2015 Re: Making byLine faster: we should be able to delegate this | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | On 3/22/15 12:03 AM, Andrei Alexandrescu wrote:
> I just took a look at making byLine faster. It took less than one evening:
>
> https://github.com/D-Programming-Language/phobos/pull/3089
>
> I confess I am a bit disappointed with the leadership being unable to
> delegate this task to a trusty lieutenant in the community. There's been
> a bug opened on this for a long time, it gets regularly discussed here
> (with the wrong conclusions ("we must redo D's I/O because FILE* is
> killing it!") about performance bottlenecks drawn from unverified
> assumptions), and the techniques used to get a marked improvement in the
> diff above are trivial fare for any software engineer. The following
> factors each had a significant impact on speed:
>
> * On OSX (which I happened to test with) getdelim() exists but wasn't
> being used. I made the implementation use it.
>
> * There was one call to fwide() per line read. I used simple caching (a
> stream's width cannot be changed once set, making it a perfect candidate
> for caching).
>
> (As an aside there was some unreachable code in ByLineImpl.empty, which
> didn't impact performance but was overdue for removal.)
>
> * For each line read there was a call to malloc() and one to free(). I
> set things up that the buffer used for reading is reused by simply
> making the buffer static.
>
> * assumeSafeAppend() was unnecessarily used once per line read. Its
> removal led to a whopping 35% on top of everything else. I'm not sure
> what it does, but boy it does takes its sweet time. Maybe someone should
> look into it.
>
> Destroy.
>
>
> Andrei
* Avoid most calls to GC.sizeOf.
Andrei
|
March 22, 2015 Re: Making byLine faster: we should be able to delegate this | ||||
---|---|---|---|---|
| ||||
Posted in reply to Sönke Ludwig | Am 22.03.2015 um 08:43 schrieb Sönke Ludwig: > Am 22.03.2015 um 08:18 schrieb weaselcat: >> On Sunday, 22 March 2015 at 07:03:14 UTC, Andrei Alexandrescu wrote: >>> I just took a look at making byLine faster. It took less than one >>> evening: >>> >>> https://github.com/D-Programming-Language/phobos/pull/3089 >>> >>> I confess I am a bit disappointed with the leadership being unable to >>> delegate this task to a trusty lieutenant in the community. There's >>> been a bug opened on this for a long time, >> >> there's thousands of open bugs, and no real ranking of high priority >> bugs or just minor things. > > We have votes and the importance field: > https://issues.dlang.org/buglist.cgi?bug_status=NEW&bug_status=ASSIGNED&bug_status=REOPENED&columnlist=product%2Ccomponent%2Cassigned_to%2Cbug_status%2Cresolution%2Cshort_desc%2Cchangeddate%2Cvotes&list_id=199241&query_format=advanced&votes=1&votes_type=greaterthaneq > > > However, the byLine issue does not have particularly high priority by > any of those measures. Oh, and bounties of course: https://www.bountysource.com/teams/d/issues |
March 22, 2015 Re: Making byLine faster: we should be able to delegate this | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | On Sunday, 22 March 2015 at 07:03:14 UTC, Andrei Alexandrescu wrote:
> I confess I am a bit disappointed with the leadership being unable to delegate this task to a trusty lieutenant in the community. There's been a bug opened on this for a long time, it gets regularly discussed here (with the wrong conclusions ("we must redo D's I/O because FILE* is killing it!") about performance bottlenecks drawn from unverified assumptions), and the techniques used to get a marked improvement in the diff above are trivial fare for any software engineer. The following factors each had a significant impact on speed:
Lack of developer itch in a comparatively small developer base making the complement of no one dealing with it too small. :c
Cheers for taking the time, though! All the love for devs.
|
March 22, 2015 Re: Making byLine faster: we should be able to delegate this | ||||
---|---|---|---|---|
| ||||
Posted in reply to Sad panda | On 3/22/15 1:26 AM, Sad panda wrote: > On Sunday, 22 March 2015 at 07:03:14 UTC, Andrei Alexandrescu wrote: >> I confess I am a bit disappointed with the leadership being unable to >> delegate this task to a trusty lieutenant in the community. There's >> been a bug opened on this for a long time, it gets regularly discussed >> here (with the wrong conclusions ("we must redo D's I/O because FILE* >> is killing it!") about performance bottlenecks drawn from unverified >> assumptions), and the techniques used to get a marked improvement in >> the diff above are trivial fare for any software engineer. The >> following factors each had a significant impact on speed: > > Lack of developer itch in a comparatively small developer base making > the complement of no one dealing with it too small. :c Heh, nicely put :o). > Cheers for taking the time, though! All the love for devs. Thanks! Andrei |
March 22, 2015 Re: Making byLine faster: we should be able to delegate this | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | On Sunday, 22 March 2015 at 07:03:14 UTC, Andrei Alexandrescu wrote:
> * For each line read there was a call to malloc() and one to free(). I set things up that the buffer used for reading is reused by simply making the buffer static.
What about e.g.
zip(File("a.txt").byLine, File("b.txt").byLine)
|
March 22, 2015 Re: Making byLine faster: we should be able to delegate this | ||||
---|---|---|---|---|
| ||||
Posted in reply to Vladimir Panteleev | On 3/22/15 3:10 AM, Vladimir Panteleev wrote:
> On Sunday, 22 March 2015 at 07:03:14 UTC, Andrei Alexandrescu wrote:
>> * For each line read there was a call to malloc() and one to free(). I
>> set things up that the buffer used for reading is reused by simply
>> making the buffer static.
>
> What about e.g.
>
> zip(File("a.txt").byLine, File("b.txt").byLine)
No matter, the static buffer is copied into the result. -- Andrei
|
March 22, 2015 Re: Making byLine faster: we should be able to delegate this | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | On Sunday, 22 March 2015 at 16:03:11 UTC, Andrei Alexandrescu wrote:
> On 3/22/15 3:10 AM, Vladimir Panteleev wrote:
>> On Sunday, 22 March 2015 at 07:03:14 UTC, Andrei Alexandrescu wrote:
>>> * For each line read there was a call to malloc() and one to free(). I
>>> set things up that the buffer used for reading is reused by simply
>>> making the buffer static.
>>
>> What about e.g.
>>
>> zip(File("a.txt").byLine, File("b.txt").byLine)
>
> No matter, the static buffer is copied into the result. -- Andrei
I didn't see the code though, won't using "static" buffer make the function thread UNSAFE?
I think we should add somewhere in documentation about thread safety as well. Phobos doesn't have any.
|
Copyright © 1999-2021 by the D Language Foundation