July 25, 2013
On 7/25/2013 11:49 AM, Dmitry S wrote:
> I am also confused by the numbers. What I see at the end of the article is
> "21.56 seconds, and the latest development version does it in 12.19", which is
> really a 43% improvement. (Which is really great too.)

21.56/12.19 is 1.77, i.e. a >75% improvement in speed.

A reduction in time would be the reciprocal of that.

July 25, 2013
On Thursday, July 25, 2013 14:25:20 Walter Bright wrote:
> Also, computing the hash is done exactly once, in the lexer. Thereafter, all identifiers are known only by their handles, which are (not coincidentally) the pointer to the identifier, and by its very nature is unique.

I've always thought that that was a neat trick. But you seem to have quite a few of those in the compiler that you've come up with or learned about over the years. We're definitely benefiting from your experience.

- Jonathan M Davis
July 25, 2013
On Thursday, 25 July 2013 at 21:25:17 UTC, Walter Bright wrote:
> Also, computing the hash is done exactly once, in the lexer. Thereafter, all identifiers are known only by their handles, which are (not coincidentally) the pointer to the identifier, and by its very nature is unique.

Like string interning?
July 25, 2013
On 7/25/2013 3:54 PM, Vladimir Panteleev wrote:
> Like string interning?

Exactly.
July 26, 2013
Walter Bright, el 25 de July a las 14:27 me escribiste:
> On 7/25/2013 11:49 AM, Dmitry S wrote:
> >I am also confused by the numbers. What I see at the end of the article is "21.56 seconds, and the latest development version does it in 12.19", which is really a 43% improvement. (Which is really great too.)
> 
> 21.56/12.19 is 1.77, i.e. a >75% improvement in speed.

This is certainly misleading, is very easy to be confused with a time reduction of 75%, which one would expect to be 1/4 of the original time. :)

-- 
Leandro Lucarella (AKA luca)                     http://llucax.com.ar/
----------------------------------------------------------------------
GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145  104C 949E BFB6 5F5A 8D05)
----------------------------------------------------------------------
"CIRILO" Y "SIRACUSA" DE "SEÑORITA MAESTRA": UNO MUERTO Y OTRO PRESO
	-- Crónica TV
July 26, 2013
On 7/25/2013 4:15 PM, Leandro Lucarella wrote:
> Walter Bright, el 25 de July a las 14:27 me escribiste:
>> On 7/25/2013 11:49 AM, Dmitry S wrote:
>>> I am also confused by the numbers. What I see at the end of the article is
>>> "21.56 seconds, and the latest development version does it in 12.19", which is
>>> really a 43% improvement. (Which is really great too.)
>>
>> 21.56/12.19 is 1.77, i.e. a >75% improvement in speed.
>
> This is certainly misleading, is very easy to be confused with a time
> reduction of 75%, which one would expect to be 1/4 of the original time.
> :)

I don't think it's misleading at all. Speed is distance/time. A change in speed (which is the title) is the reciprocal of a change in time.

For example, a doubling of speed (100% increase) is a halving of time (50% reduction).

July 26, 2013
In IT speed is time. Weight, volume and size are bytes. Kilo- is 1024. And other non-SI weird stuff.
July 26, 2013
> In IT speed is time.

That's probably because IT folks are loose with units. Though pedantism is ok.
July 26, 2013
Walter Bright:

> Hash collisions are not the problem - I sized the hash bucket array to make it fairly sparse. Neither is the hash algorithm.
>
> The slowness was in the frackin' "convert the hash to an index in the bucket", which is a modulus operation.

Thankfully in that thread Paul Hsieh has given more precise suggestions, he's kind of expert on such matters.

Bye,
bearophile
July 26, 2013
26-Jul-2013 01:25, Walter Bright пишет:
> On 7/25/2013 1:00 PM, bearophile wrote:
>> Walter Bright:
>>
>>> It's not the hashing that's slow. It's the lookup that is.
>>
>> By "different hashing scheme" I meant different strategies in
>> resolving hash
>> collisions, likes double hashing, internal hashing, cuckoo hashing,
>> and so on
>> and on. Maybe one of such alternative strategies is more fit for the
>> needs of
>> dmd compilation. (I think that currently the Python dicts are using a
>> hashing
>> strategy different from the built-in dictionaries of D. The Python
>> style of
>> hashing was implemented in D some months ago, but I don't remember
>> what happened
>> to that project later).
>
> Hash collisions are not the problem - I sized the hash bucket array to
> make it fairly sparse. Neither is the hash algorithm.
>
> The slowness was in the frackin' "convert the hash to an index in the
> bucket", which is a modulus operation.

Then it's past due to finally stop the madness of modulo prime table and use a power of 2 size. In essence what modulo prime does is simply enhancing the quality of your hash function w.r.t. collisions (it helps to distribute values more evenly).

Any of new decent hashes are good enough to work with plain slice the lower bits approach.

>
> Also, computing the hash is done exactly once, in the lexer.

All the more reason to use good hash function and kill the modulo prime.

> Thereafter,
> all identifiers are known only by their handles, which are (not
> coincidentally) the pointer to the identifier, and by its very nature is
> unique.
>


-- 
Dmitry Olshansky