January 10, 2009
On 2009-01-09 23:14:07 -0500, dsimcha <dsimcha@yahoo.com> said:

> Wouldn't this also cause threading issues?

Well, I'd trust the compiler-generated code would not cause threading issues. :-)


> The obvious solution would be to use TLS, but his requires duplicating the cache across threads.

Well, there isn't many choices. Either it's TLS, or the cache need to be synchronized (using either locks or lock-free algorithms; both will add some overhead anyway).

That said, if you memoize a function member of a class, the memoization data could be added to the class and not be global. If the object is not shared across threads, then the memoization data isn't either. If the object is shared, then you'd need either TLS or synchronization.


> Also, using AAs internally like this would lead to very deeply hidden memory allocations, and
> therefore possibly more frequent GC, etc.

Perhaps the cache needs to be a little smarter than a regular AA. You may not want to keep each and every value that was computed. Depending on the situation, keeping only the 100 last results may be enough, in which case you can dedicate a fixed amount of memory for caching.

-- 
Michel Fortin
michel.fortin@michelf.com
http://michelf.com/

January 10, 2009
Michel Fortin:

> So to determine if a function is worth memoizing, you have to say yes
> to those two things:
> 1.	is it faster than recomputing the return value?
> 2.	is this function called often with the same inputs?
> While the compiler could take an educated guess at answering 1 given
> the code of a function, question 2 can only be answered by knowing the
> usage pattern of various functions.

GCC has profile-driven optimization. I presume it may help in such problems too.


> Perhaps the cache needs to be a little smarter than a regular AA. You may not want to keep each and every value that was computed. Depending on the situation, keeping only the 100 last results may be enough, in which case you can dedicate a fixed amount of memory for caching.

Right. There are time-limited memoization strategies, and other ones as well. In Python I have seen plenty of them:
http://code.activestate.com/recipes/325905/
http://code.activestate.com/recipes/498110/

Bye,
bearophile
January 10, 2009
Michel Fortin wrote:
> On 2009-01-10 00:10:11 -0500, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> said:
> 
>>> The problem is identifying if this would be faster than recomputing the return value.
>>
>> I used memoizers for exp and log (the most called functions in some code I wrote) and it made the original version feel like it was driving in reverse.
> 
> That's only true because, in your specific context, those functions were called often with the same input. In a program that rarely use the same inputs, your memoizing functions will just waste time and memory.
> 
> So to determine if a function is worth memoizing, you have to say yes to those two things:
> 
> 1.    is it faster than recomputing the return value?
> 2.    is this function called often with the same inputs?
> 
> While the compiler could take an educated guess at answering 1 given the code of a function, question 2 can only be answered by knowing the usage pattern of various functions.

... so let's tell it what the usage patterns are!

1. Compile with profiling.

$ dmd -profile pure_program.d

2. Feed the profiled executable a big gob of test data.  Om nom nom.

$ for I in {0..99} do; pure_program < test_data_$I.dat; done

3. Recompile, giving the compiler the trace log, telling it which functions got called frequently, and where time was spent.

$ dmd -trace=trace.log pure_program.d

Viola; now the compiler can make a well-informed guess at compile-time.  All it really lacks is information on how many distinct argument sets any given pure functions gets, but I imagine that could be added.

I think feeding the program test data with profiling enabled and then re-compiling would be a very acceptable trade off.

  -- Daniel
January 10, 2009
Michel Fortin wrote:
> On 2009-01-10 00:10:11 -0500, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> said:
> 
>>> The problem is identifying if this would be faster than recomputing the return value.
>>
>> I used memoizers for exp and log (the most called functions in some code I wrote) and it made the original version feel like it was driving in reverse.
> 
> That's only true because, in your specific context, those functions were called often with the same input. In a program that rarely use the same inputs, your memoizing functions will just waste time and memory.

No, because I use interpolation.

Andrei
January 10, 2009
Walter Bright wrote:
> Michel Fortin wrote:
>> Hum, could the compiler be trusted to add the memoization code to pure functions so they can stay pure?
> 
> If the compiler does general memoization on pure functions, all it has to do is use the bits of the arguments passed on the stack to the function as an index into an associative array of the return values.
<snip>

Walter, before you go and implement any of this, I must point out that it's spelt "memorization" and "memorize".  (Or "memorisation" and "memorise" if you're British, but that's an aside.)

Stewart.
January 10, 2009
Stewart Gordon wrote:
<snip>
> Walter, before you go and implement any of this, I must point out that it's spelt "memorization" and "memorize".  (Or "memorisation" and "memorise" if you're British, but that's an aside.)

Or maybe not...
http://en.wikipedia.org/wiki/Memoization

Still, I'm confused....

Stewart.
January 10, 2009
Stewart Gordon wrote:
> Stewart Gordon wrote:
> <snip>
>> Walter, before you go and implement any of this, I must point out that it's spelt "memorization" and "memorize".  (Or "memorisation" and "memorise" if you're British, but that's an aside.)
> 
> Or maybe not...
> http://en.wikipedia.org/wiki/Memoization
> 
> Still, I'm confused....
> 
> Stewart.

I thought you were joking. What's confusing you?
January 10, 2009
Walter Bright wrote:
<snip>
> Memoization and purity don't mix. For each function, you'll have to pick which way you want to go. I don't see what that has to do with D's future, though, the language allows either.

Actually, what they don't mix with is the requirement that it still be callable on a mutable object.

There's
http://d.puremagic.com/issues/show_bug.cgi?id=1824
and, while it's a nice idea, it precludes caching of the result.  Unless you want to keep the cache separate from the object.

A pure toString may be a nice idea; however, this precludes using any mutable members of the class to generate it.  But it would also be nice to be able to call such methods on all objects of the class, mutable or not.

One possibility is to have separate functions

    string toString()
    string toString() invariant

and let the programmer implement memo(r)ization on the mutable version if desired, whereas the invariant version would have it generated by the compiler.  But there are still problems
- nobody would know which to call if the reference is const, rather than mutable or invariant
- it would be necessary to maintain two versions of toString in parallel

Maybe there's a solution somewhere out there....

Stewart.
January 10, 2009
Andrei Alexandrescu wrote:
> Walter Bright wrote:
>> Michel Fortin wrote:
>>> Hum, could the compiler be trusted to add the memoization code to pure functions so they can stay pure?
>>
>> If the compiler does general memoization on pure functions, all it has to do is use the bits of the arguments passed on the stack to the function as an index into an associative array of the return values.
> 
> Not the bits, the full objects if they have indirection. (Pure functions should accept const parameters!)

You're right, the "just the bits on the stack" is if the arguments are immutable. Const arguments would have to include whatever was indirectly referenced, which I argue is impractical for the runtime to handle automatically.
January 10, 2009
Christopher Wright wrote:
> Stewart Gordon wrote:
>> Stewart Gordon wrote:
>> <snip>
>>> Walter, before you go and implement any of this, I must point out that it's spelt "memorization" and "memorize".  (Or "memorisation" and "memorise" if you're British, but that's an aside.)
>>
>> Or maybe not...
>> http://en.wikipedia.org/wiki/Memoization
>>
>> Still, I'm confused....
>>
>> Stewart.
> 
> I thought you were joking. What's confusing you?

I forgot.