Making byLine faster: we should be able to delegate this (page 3)

On 3/23/15 10:59 AM, Andrei Alexandrescu wrote: > On 3/23/15 7:52 AM, Steven Schveighoffer wrote: >> On 3/22/15 3:03 AM, Andrei Alexandrescu wrote: >> >>> * assumeSafeAppend() was unnecessarily used once per line read. Its >>> removal led to a whopping 35% on top of everything else. I'm not sure >>> what it does, but boy it does takes its sweet time. Maybe someone should >>> look into it. >> >> That's not expected. assumeSafeAppend should be pretty quick, and >> DEFINITELY should not be a significant percentage of reading lines. I >> will look into it. > > Thanks! > >> Just to verify, your test application was a simple byline loop? > > Yes, the code was that in > http://stackoverflow.com/questions/28922323/improving-line-wise-i-o-operations-in-d/29153508#29153508 My investigation seems to suggest that assumeSafeAppend is not using that much time for what it does. The reason for the "35%" is that you are talking 35% of a very small value. At that level, and with these numbers of calls, combined with the fact that the calls MUST occur (these are opaque functions), I think we are talking about a non issue here. This is what assumeSafeAppend does: 1. Access TypeInfo and convert array to "void[]" array (this could probably be adjusted to avoid using the TypeInfo, since assumeSafeAppend is a template). 2. Look up block info, which should be a loop through 8 array cache elements. 3. Verify the block has the APPENDABLE flag, and write the new "used" space into the right place. I suspect some combination of memory cache failures, or virtual function calls on the TypeInfo, or failure to inline some functions is what's slowing it down. But let's not forget that the 35% savings was AFTER all the original savings. On my system, using a 2 million line file, the original took 2.2 seconds, the version with the superfluous assumeSafeAppend took .3 seconds, without it takes .15 seconds. Still should be examined further, but I'm not as concerned as I was before. -Steve

March 23, 2015

Re: Making byLine faster: we should be able to delegate this

Posted by Andrei Alexandrescu
in reply to Steven Schveighoffer

Permalink

Andrei Alexandrescu

Posted in reply to Steven Schveighoffer

Permalink

On 3/23/15 2:42 PM, Steven Schveighoffer wrote:
> On 3/23/15 10:59 AM, Andrei Alexandrescu wrote:
>> On 3/23/15 7:52 AM, Steven Schveighoffer wrote:
>>> On 3/22/15 3:03 AM, Andrei Alexandrescu wrote:
>>>
>>>> * assumeSafeAppend() was unnecessarily used once per line read. Its
>>>> removal led to a whopping 35% on top of everything else. I'm not sure
>>>> what it does, but boy it does takes its sweet time. Maybe someone
>>>> should
>>>> look into it.
>>>
>>> That's not expected. assumeSafeAppend should be pretty quick, and
>>> DEFINITELY should not be a significant percentage of reading lines. I
>>> will look into it.
>>
>> Thanks!
>>
>>> Just to verify, your test application was a simple byline loop?
>>
>> Yes, the code was that in
>> http://stackoverflow.com/questions/28922323/improving-line-wise-i-o-operations-in-d/29153508#29153508
>>
>
> My investigation seems to suggest that assumeSafeAppend is not using
> that much time for what it does. The reason for the "35%" is that you
> are talking 35% of a very small value.

I don't see the logic here. Unless the value is so small that noise margins become significant (it isn't), 35% is large.

> At that level, and with these
> numbers of calls, combined with the fact that the calls MUST occur
> (these are opaque functions), I think we are talking about a non issue
> here.

I disagree with this assessment. In this case it takes us from losing to winning to Python.

> This is what assumeSafeAppend does:
>
> 1. Access TypeInfo and convert array to "void[]" array (this could
> probably be adjusted to avoid using the TypeInfo, since assumeSafeAppend
> is a template).
> 2. Look up block info, which should be a loop through 8 array cache
> elements.
> 3. Verify the block has the APPENDABLE flag, and write the new "used"
> space into the right place.
>
> I suspect some combination of memory cache failures, or virtual function
> calls on the TypeInfo, or failure to inline some functions is what's
> slowing it down. But let's not forget that the 35% savings was AFTER all
> the original savings. On my system, using a 2 million line file, the
> original took 2.2 seconds, the version with the superfluous
> assumeSafeAppend took .3 seconds, without it takes .15 seconds.
>
> Still should be examined further, but I'm not as concerned as I was before.

We should.


Andrei

On 3/23/15 7:33 PM, Andrei Alexandrescu wrote: > On 3/23/15 2:42 PM, Steven Schveighoffer wrote: >> On 3/23/15 10:59 AM, Andrei Alexandrescu wrote: >>> On 3/23/15 7:52 AM, Steven Schveighoffer wrote: >>>> On 3/22/15 3:03 AM, Andrei Alexandrescu wrote: >>>> >>>>> * assumeSafeAppend() was unnecessarily used once per line read. Its >>>>> removal led to a whopping 35% on top of everything else. I'm not sure >>>>> what it does, but boy it does takes its sweet time. Maybe someone >>>>> should >>>>> look into it. >>>> >>>> That's not expected. assumeSafeAppend should be pretty quick, and >>>> DEFINITELY should not be a significant percentage of reading lines. I >>>> will look into it. >>> >>> Thanks! >>> >>>> Just to verify, your test application was a simple byline loop? >>> >>> Yes, the code was that in >>> http://stackoverflow.com/questions/28922323/improving-line-wise-i-o-operations-in-d/29153508#29153508 >>> >>> >> >> My investigation seems to suggest that assumeSafeAppend is not using >> that much time for what it does. The reason for the "35%" is that you >> are talking 35% of a very small value. > > I don't see the logic here. Unless the value is so small that noise > margins become significant (it isn't), 35% is large. > >> At that level, and with these >> numbers of calls, combined with the fact that the calls MUST occur >> (these are opaque functions), I think we are talking about a non issue >> here. > > I disagree with this assessment. In this case it takes us from losing to > winning to Python. > Yes, rethinking, you are right. I was jolted by the 35% thinking it was 35% of the original problem. I re-examined and found something interesting -- assumeSafeAppend doesn't cache the block, it only uses the cache if it's ALREADY cached. So a large chunk of that 35% is the runtime looking up that block info in the heap. On my machine, this brings the time from .3 down to .2 s. I also found a bad memory corruption bug you introduced. I'll make some PRs. -Steve

On 3/23/15 9:17 PM, Steven Schveighoffer wrote: > I re-examined and found something interesting -- assumeSafeAppend > doesn't cache the block, it only uses the cache if it's ALREADY cached. > > So a large chunk of that 35% is the runtime looking up that block info > in the heap. On my machine, this brings the time from .3 down to .2 s. > > I also found a bad memory corruption bug you introduced. I'll make some > PRs. https://github.com/D-Programming-Language/druntime/pull/1198 https://github.com/D-Programming-Language/phobos/pull/3098 Note, this doesn't affect performance in this case, as assumeSafeAppend isn't used any more. -Steve

On 3/23/15 6:44 PM, Steven Schveighoffer wrote: > On 3/23/15 9:17 PM, Steven Schveighoffer wrote: >> I re-examined and found something interesting -- assumeSafeAppend >> doesn't cache the block, it only uses the cache if it's ALREADY cached. >> >> So a large chunk of that 35% is the runtime looking up that block info >> in the heap. On my machine, this brings the time from .3 down to .2 s. >> >> I also found a bad memory corruption bug you introduced. I'll make some >> PRs. > > https://github.com/D-Programming-Language/druntime/pull/1198 > https://github.com/D-Programming-Language/phobos/pull/3098 > > Note, this doesn't affect performance in this case, as assumeSafeAppend > isn't used any more. Can't tell how much I appreciate this work! -- Andrei

Java has disruptor to provide the fatest way to ring file. website: http://lmax-exchange.github.io/disruptor/ technical information: http://lmax-exchange.github.io/disruptor/files/Disruptor-1.0.pdf

Little ping I hope an answer about IO in D and disruptor form java world Disruptor seem to provide a smart implementation between IO and their buffer. What did you think about it? D could to provided a high level way to process efficiently a file. (using Range, forwardrange ... will be better) I think for this kind of usual process D should to be battery included. Whithout the need to know if you are on SSD or HD, if the page size is 4096, if hugepagesize is enabled ... will be realy awesome to have an abstraction layer on this

Forums