On Feb 9, 2012, at 2:17 PM, Sean Kelly wrote:

Best first order optimization would be to allocate the list node deterministically.

Neat idea.  I think I can make that change fairly trivially.

$ time abc

real 0m0.556s
user 0m0.555s
sys 0m0.001s

So another 100ms improvement.  Switching to a (__gshared, no mutex) free-list that falls back on malloc yields:

$ time abc

real 0m0.505s
user 0m0.503s
sys 0m0.001s

Not as much of a gain there, and I believe we've eliminated all the allocations (though I'd have to do a pile build to verify).  Still, that's approaching being twice as fast as before, which is definitely something.