Jump to page: 1 2
Thread overview
LRUCache - a simple least recently used cache
Nov 04, 2007
Charles D Hixson
Nov 04, 2007
renoX
Nov 04, 2007
Charles D Hixson
Nov 04, 2007
Bill Baxter
Nov 04, 2007
Charles D Hixson
Nov 04, 2007
renoX
Nov 04, 2007
Charles D Hixson
Nov 04, 2007
Charles D Hixson
Nov 08, 2007
janderson
Nov 08, 2007
Charles D Hixson
Nov 05, 2007
BCS
Nov 05, 2007
Charles D Hixson
Nov 10, 2007
Charles D Hixson
Nov 11, 2007
BCS
November 04, 2007
In the hopes that someone find this useful:
http://www.prowiki.org/wiki4d/wiki.cgi?CharlesHixson

This particular version presumes D2.x code, but I think the adjustments are trivial for D1.x.
November 04, 2007
Charles D Hixson a écrit :
> In the hopes that someone find this useful:
> http://www.prowiki.org/wiki4d/wiki.cgi?CharlesHixson
> 
> This particular version presumes D2.x code, but I think the adjustments are trivial for D1.x.

I think that there is a flaw in your code: the access counter is a ushort (16bit value) and is incremented each time you access an element in the cache: it can overflow quickly and will wrap around..

This means that recently used elements could have a _nodeId smaller than nodes used less recently, which means that the overflow handling code will drop the wrong elements..

This may not be the case in your usage, but this is either a bug or at least a pretty big restriction in the usage which should be clearly indicated.


Regards,
renoX
November 04, 2007
renoX wrote:
> Charles D Hixson a écrit :
>> In the hopes that someone find this useful:
>> http://www.prowiki.org/wiki4d/wiki.cgi?CharlesHixson
>>
>> This particular version presumes D2.x code, but I think the adjustments are trivial for D1.x.
> 
> I think that there is a flaw in your code: the access counter is a ushort (16bit value) and is incremented each time you access an element in the cache: it can overflow quickly and will wrap around..
> 
> This means that recently used elements could have a _nodeId smaller than nodes used less recently, which means that the overflow handling code will drop the wrong elements..
> 
> This may not be the case in your usage, but this is either a bug or at least a pretty big restriction in the usage which should be clearly indicated.
> 
> 
> Regards,
> renoX

Well, two things:
1)  This code isn't really adapted for large sets of data.  If you want that, you'd better hand optimize it for your intended application.
2)  Whenever the cache is cleared, all the cache nodeId's are decreased by the minimum nodeId (so the new minimum is zero...or one, I'd need to check again.  This should decrease the rate at which the maximum nodeId grows.

However I *did* take your suggestion about documenting it.

OTOH, it would be better if I could renumber the nodes sequentially, but doing that without a sort is difficult (i.e., I haven't figured out how).

A different organization would be to keep nodeIds in a queue, with pointers to the hash table, but that has other costs. Still, I SHOULD add a routine to renumber everything whenever the maxT value starts to get withing maxSize of ushort.max. This routine will probably be introduced at some later version.

Any suggestions for a quick and simple way to do this?
Would it be reasonable to arbitrarily renumber everything in the cache serially (i.e., without considering the current nodeIds) whenever the maximum nodeId started to approach ushort.max?  (That's the direction that I'm leaning in...but I haven't decided whether or not it's reasonable.)
November 04, 2007
Charles D Hixson wrote:
> renoX wrote:
>> Charles D Hixson a écrit :
>>> In the hopes that someone find this useful:
>>> http://www.prowiki.org/wiki4d/wiki.cgi?CharlesHixson
>>>
>>> This particular version presumes D2.x code, but I think the adjustments are trivial for D1.x.
>>
>> I think that there is a flaw in your code: the access counter is a ushort (16bit value) and is incremented each time you access an element in the cache: it can overflow quickly and will wrap around..
>>
> Well, two things:
> 1)  This code isn't really adapted for large sets of data.  If you want that, you'd better hand optimize it for your intended application.
> 2)  Whenever the cache is cleared, all the cache nodeId's are decreased by the minimum nodeId (so the new minimum is zero...or one, I'd need to check again.  This should decrease the rate at which the maximum nodeId grows.
> 

I don't know why you chose to go with a short, but I have had situations in the past where using a short instead of an int had significantly worse performance.  In my case it was a big array of shorts vs a big array of ints, so probably there was extra alignment mojo that had to happen to fetch data from the short array into a register and then put it back.  But anyway CPUs seem to like working on chunks of 32-bits better than 16.  You might try making it an int just to see how it performs.

--bb
November 04, 2007
Bill Baxter wrote:
> Charles D Hixson wrote:
>> renoX wrote:
>>> Charles D Hixson a écrit :
>>>> In the hopes that someone find this useful:
>>>> http://www.prowiki.org/wiki4d/wiki.cgi?CharlesHixson
>>>>
>>>> This particular version presumes D2.x code, but I think the adjustments are trivial for D1.x.
>>>
>>> I think that there is a flaw in your code: the access counter is a ushort (16bit value) and is incremented each time you access an element in the cache: it can overflow quickly and will wrap around..
>>>
>> Well, two things:
>> 1)  This code isn't really adapted for large sets of data.  If you want that, you'd better hand optimize it for your intended application.
>> 2)  Whenever the cache is cleared, all the cache nodeId's are decreased by the minimum nodeId (so the new minimum is zero...or one, I'd need to check again.  This should decrease the rate at which the maximum nodeId grows.
>>
> 
> I don't know why you chose to go with a short, but I have had situations in the past where using a short instead of an int had significantly worse performance.  In my case it was a big array of shorts vs a big array of ints, so probably there was extra alignment mojo that had to happen to fetch data from the short array into a register and then put it back.  But anyway CPUs seem to like working on chunks of 32-bits better than 16.  You might try making it an int just to see how it performs.
> 
> --bb
Well, an int seemed like overkill.  I *did* consider using a ubyte.... but that seemed like cutting things too close. (It's certainly easy enough to change it in the code, but that wouldn't be my preferred solution.)

November 04, 2007
Charles D Hixson a écrit :
> renoX wrote:
>> Charles D Hixson a écrit :
>>> In the hopes that someone find this useful:
>>> http://www.prowiki.org/wiki4d/wiki.cgi?CharlesHixson
>>>
>>> This particular version presumes D2.x code, but I think the adjustments are trivial for D1.x.
>>
>> I think that there is a flaw in your code: the access counter is a ushort (16bit value) and is incremented each time you access an element in the cache: it can overflow quickly and will wrap around..
>>
>> This means that recently used elements could have a _nodeId smaller than nodes used less recently, which means that the overflow handling code will drop the wrong elements..
>>
>> This may not be the case in your usage, but this is either a bug or at least a pretty big restriction in the usage which should be clearly indicated.
>>
>>
>> Regards,
>> renoX
> 
> Well, two things:
> 1)  This code isn't really adapted for large sets of data.  If you want that, you'd better hand optimize it for your intended application.

I'm not talking about a large set of data, just about a program which runs for a long time and/or which make heavy usage of the data in the cache: each call to 'contains' or 'opIndex' increase maxNode so it could overflow quite quickly (an ushort isn't that big).

> Any suggestions for a quick and simple way to do this?

Just make the access counter 32bit, it should be enough for a 'normal' program (not an Apache style program though).

> Would it be reasonable to arbitrarily renumber everything in the cache serially (i.e., without considering the current nodeIds) whenever the maximum nodeId started to approach ushort.max?  (That's the direction that I'm leaning in...but I haven't decided whether or not it's reasonable.)

I'm not an expert in D's but I think that an array of
struct Node{ Key key; Body bdy; ushort _nodeId; } or an array of
struct Node{ Key key; Body bdy; uint   _nodeId; } have exactly the same size due to the padding..

And maybe on x86-64, you could use ulong without increasing the size of the array..

Could someone confirm/infirm?
For me, key and bdy are reference, so they have the same size as pointers, am I right?

Regards,
renoX


November 04, 2007
renoX wrote:
> Charles D Hixson a écrit :
>> renoX wrote:
>>> Charles D Hixson a écrit :
>>>> In the hopes that someone find this useful:
>>>> http://www.prowiki.org/wiki4d/wiki.cgi?CharlesHixson
>>>>
>>>> This particular version presumes D2.x code, but I think the adjustments are trivial for D1.x.
>>>
>>> I think that there is a flaw in your code: the access counter is a ushort (16bit value) and is incremented each time you access an element in the cache: it can overflow quickly and will wrap around..
>>>
>>> This means that recently used elements could have a _nodeId smaller than nodes used less recently, which means that the overflow handling code will drop the wrong elements..
>>>
>>> This may not be the case in your usage, but this is either a bug or at least a pretty big restriction in the usage which should be clearly indicated.
>>>
>>>
>>> Regards,
>>> renoX
>>
>> Well, two things:
>> 1)  This code isn't really adapted for large sets of data.  If you want that, you'd better hand optimize it for your intended application.
> 
> I'm not talking about a large set of data, just about a program which runs for a long time and/or which make heavy usage of the data in the cache: each call to 'contains' or 'opIndex' increase maxNode so it could overflow quite quickly (an ushort isn't that big).
> 
>> Any suggestions for a quick and simple way to do this?
> 
> Just make the access counter 32bit, it should be enough for a 'normal' program (not an Apache style program though).
> 
>> Would it be reasonable to arbitrarily renumber everything in the cache serially (i.e., without considering the current nodeIds) whenever the maximum nodeId started to approach ushort.max?  (That's the direction that I'm leaning in...but I haven't decided whether or not it's reasonable.)
> 
> I'm not an expert in D's but I think that an array of
> struct Node{ Key key; Body bdy; ushort _nodeId; } or an array of
> struct Node{ Key key; Body bdy; uint   _nodeId; } have exactly the same size due to the padding..
> 
> And maybe on x86-64, you could use ulong without increasing the size of the array..
> 
> Could someone confirm/infirm?
> For me, key and bdy are reference, so they have the same size as pointers, am I right?
> 
> Regards,
> renoX
> 
> 
Well, they aren't necessarily pointers/references.  Look at the test examples.  OTOH, they easily COULD be.

I've done a slight redesign (not yet posted) that should answer your objections (I decided to renumber when necessary, but I made NodeId a type, so it's easy for anyone to change if they want to).

OTOH, Body MIGHT be rather large, so I feel that I need to split things up so that foreach won't necessarily copy the entire structure when only an access to the nodeId, e.g., is needed.
November 04, 2007
Charles D Hixson wrote:
> renoX wrote:
>> Charles D Hixson a écrit :
>>> renoX wrote:
>>>> Charles D Hixson a écrit :
>>>>> In the hopes that someone find this useful:
>>>>> http://www.prowiki.org/wiki4d/wiki.cgi?CharlesHixson
>>>>>
>>>>> This particular version presumes D2.x code, but I think the adjustments are trivial for D1.x.
>>>>
>>>> I think that there is a flaw in your code: the access counter is a ushort (16bit value) and is incremented each time you access an element in the cache: it can overflow quickly and will wrap around..
>>>>
>>>> This means that recently used elements could have a _nodeId smaller than nodes used less recently, which means that the overflow handling code will drop the wrong elements..
>>>>
>>>> This may not be the case in your usage, but this is either a bug or at least a pretty big restriction in the usage which should be clearly indicated.
>>>>
>>>>
>>>> Regards,
>>>> renoX
>>>
>>> Well, two things:
>>> 1)  This code isn't really adapted for large sets of data.  If you want that, you'd better hand optimize it for your intended application.
>>
>> I'm not talking about a large set of data, just about a program which runs for a long time and/or which make heavy usage of the data in the cache: each call to 'contains' or 'opIndex' increase maxNode so it could overflow quite quickly (an ushort isn't that big).
>>
>>> Any suggestions for a quick and simple way to do this?
>>
>> Just make the access counter 32bit, it should be enough for a 'normal' program (not an Apache style program though).
>>
>>> Would it be reasonable to arbitrarily renumber everything in the cache serially (i.e., without considering the current nodeIds) whenever the maximum nodeId started to approach ushort.max?  (That's the direction that I'm leaning in...but I haven't decided whether or not it's reasonable.)
>>
>> I'm not an expert in D's but I think that an array of
>> struct Node{ Key key; Body bdy; ushort _nodeId; } or an array of
>> struct Node{ Key key; Body bdy; uint   _nodeId; } have exactly the same size due to the padding..
>>
>> And maybe on x86-64, you could use ulong without increasing the size of the array..
>>
>> Could someone confirm/infirm?
>> For me, key and bdy are reference, so they have the same size as pointers, am I right?
>>
>> Regards,
>> renoX
>>
>>
> Well, they aren't necessarily pointers/references.  Look at the test examples.  OTOH, they easily COULD be.
> 
> I've done a slight redesign (not yet posted) that should answer your objections (I decided to renumber when necessary, but I made NodeId a type, so it's easy for anyone to change if they want to).
> 
> OTOH, Body MIGHT be rather large, so I feel that I need to split things up so that foreach won't necessarily copy the entire structure when only an access to the nodeId, e.g., is needed.
OK.
Version 0.2 has now been posted.  I switched from using structures to using arrays.  It's now on a page linked off my main page, because I wasn't able to get changes to the main page to stick.
http://www.prowiki.org/wiki4d/wiki.cgi?LruCache

I still haven't tested the renumbering...that will have to wait for something that uses it for an extended period of time, but the code is in there, if you need it.  (It's also easy to change the NodeId type from ushort to ulong if that's your preference.)
November 05, 2007
Reply to Charles,

> In the hopes that someone find this useful:
> http://www.prowiki.org/wiki4d/wiki.cgi?CharlesHixson
> This particular version presumes D2.x code, but I think the
> adjustments are trivial for D1.x.
> 

Would you like to add this to scrapple?

I will get you access if you would.


November 05, 2007
BCS wrote:
> Reply to Charles,
> 
>> In the hopes that someone find this useful:
>> http://www.prowiki.org/wiki4d/wiki.cgi?CharlesHixson
>> This particular version presumes D2.x code, but I think the
>> adjustments are trivial for D1.x.
>>
> 
> Would you like to add this to scrapple?
> 
> I will get you access if you would.
> 
> 
Sure.  I may do a few more small pieces, too, though I don't have any in mind right now.
« First   ‹ Prev
1 2