July 30, 2013
On Tuesday, 30 July 2013 at 10:08:22 UTC, Leandro Lucarella wrote:
> JS, el 29 de July a las 22:32 me escribiste:
>> On Monday, 29 July 2013 at 19:38:51 UTC, Walter Bright wrote:
>> >On 7/29/2013 12:08 PM, JS wrote:
>> >>Trying to use distance and speed as a measure of performance of
>> >>a program is
>> >>just ridiculous.
>> >
>> >If you google "program execution speed" you'll find it's a
>> >commonly used term. "Lines per second" is a common measure of
>> >compiler execution speed - google "compiler lines per second" and
>> >see.
>> >
>> >
>> >>(again, if we started with 12 second and went to 21 seconds, it
>> >>would be a near
>> >>75% increase. But a 75% increase is not a 75% decrease!!!!!!!!)
>> >
>> >Speed is the reciprocal of time, meaning a decrease in time is an
>> >increase in speed.
>> 
>> You are right, sorry. There is no difference.
>> 
>> I think the issue is interpretation. When I read "X% increase in
>> speed" I think "X% faster [in time]".
>
> I just want to point out that being so much people getting this wrong
> (and even fighting to convince other people that the wrong
> interpretation is right) might be an indication that the message you
> wanted to give in that blog is not extremely clear :)
>
> That was my whole point. If you used some easier measure to understand
> (like using time instead of speed) you could have avoided all this
> confusion :)

It depends on the audience, but to write an article that is informal then use a very formal definition of speed is just begging for problems.

It's hard for me to really imagine that lines is a measurement of distance and speed is lines per second... but even if I allow this I don't understand why it is a better metric than just time alone(even though it is equivalent mathematically it is a convoluted approach).

If a compiler can compile n lines per second, is that really all that useful as a direct measurement? Are you saying I have to go count the lines of code in my program? Do I count empty lines? commented lines? etc...  Why don't you just give me the most direct answer?

Time is really all that matters, everything else is just obfuscation.
July 30, 2013
26-Jul-2013 23:17, Walter Bright пишет:
> On 7/26/2013 5:11 AM, Dmitry Olshansky wrote:
>> 26-Jul-2013 14:47, Dmitry Olshansky пишет:
>>> 26-Jul-2013 01:25, Walter Bright пишет:
>>
>>>> The slowness was in the frackin' "convert the hash to an index in the
>>>> bucket", which is a modulus operation.
>>>
>>> Then it's past due to finally stop the madness of modulo prime table and
>>> use a power of 2 size. In essence what modulo prime does is simply
>>> enhancing the quality of your hash function w.r.t. collisions (it helps
>>> to distribute values more evenly).
>>>
>>> Any of new decent hashes are good enough to work with plain slice the
>>> lower bits approach.
>>>
>>
>> To be more concrete:
>> Spooky hash
>> http://burtleburtle.net/bob/hash/spooky.html (Public domain)
>> S-box hash
>> http://home.comcast.net/~bretm/hash/10.html (Published paper)
>>
>> Or even a good ol' FNV (Public domain)
>> http://isthe.com/chongo/tech/comp/fnv/#FNV-1a
>>
>
> How about a pull request so we can try it out?

Preliminary pull is here:
https://github.com/D-Programming-Language/dmd/pull/2436

So far it looses a bit. I'm still playing with it, looking at load factor, distribution of sizes, profiling.

So far observations are that SpookyHash is slower then the one that was there thus stealing a few percents of speed. That is then hardly regained by a faster lookup of a slot:
almost all of "large" tables were 31 in size and you have special case for that anyway.

What bothers me is that while I've been hacking at this I couldn't shake off the feeling that AA code assumes NO FULL HASH COLLISIONS at all?

Isn't that betting on luck (and a crude hash) a little too much (esp in 32 bit mode)?

That is e.g. in code pasted from aav.c Key is only a hash and there is no way whatsoever to discern a full hash collision.

Value _aaGetRvalue(AA* aa, Key key)
{
    //printf("_aaGetRvalue(key = %p)\n", key);
    if (aa)
    {
        size_t i;
        size_t len = aa->b_length;
        if (len == 4)
            i = (size_t)key & 3;
        else if (len == 31)
            i = (size_t)key % 31;
        else
            i = (size_t)key % len;
        aaA* e = aa->b[i];
        while (e)
        {
            if (key == e->key)
                return e->value;
            e = e->next;
        }
    }
    return NULL;    // not found
}

-- 
Dmitry Olshansky
July 30, 2013
On 7/30/13 2:59 AM, Leandro Lucarella wrote:
> I just want to point out that being so much people getting this wrong
> (and even fighting to convince other people that the wrong
> interpretation is right) might be an indication that the message you
> wanted to give in that blog is not extremely clear :)
>
> That was my whole point. If you used some easier measure to understand
> (like using time instead of speed) you could have avoided all this
> confusion :)

I tend to disagree on this, and I think the title of the article is good as is (I've reviewed it).

I'd say if a programmer doesn't have a clear notion of what speed is and how comparisons go etc., they better learn it pronto. There's nothing complicated here, and if something is not obvious to some readers this is even better because the article serves as an educational tool in more ways than one.

Speed involves time at the denominator, i.e. seconds come "to the power of -1". I don't think it's fair to negotiate whether this is expected. Every engineer must know this.

Percentage changes go as follows. If you "increase something by 100%" it means you doubled it. If you "decrease something by 50%" it means you halved it. Everything else can be easily derived from these. Again, I consider this non-negotiable background knowledge.


Andrei
July 30, 2013
On 7/30/13 4:10 AM, JS wrote:
> It depends on the audience, but to write an article that is informal
> then use a very formal definition of speed is just begging for problems.
>
> It's hard for me to really imagine that lines is a measurement of
> distance and speed is lines per second... but even if I allow this I
> don't understand why it is a better metric than just time alone(even
> though it is equivalent mathematically it is a convoluted approach).
>
> If a compiler can compile n lines per second, is that really all that
> useful as a direct measurement? Are you saying I have to go count the
> lines of code in my program? Do I count empty lines? commented lines?
> etc... Why don't you just give me the most direct answer?
>
> Time is really all that matters, everything else is just obfuscation.

This is wrong on many levels.

Andrei
July 30, 2013
On 7/30/2013 2:59 AM, Leandro Lucarella wrote:
> I just want to point out that being so much people getting this wrong
> (and even fighting to convince other people that the wrong
> interpretation is right) might be an indication that the message you
> wanted to give in that blog is not extremely clear :)

It never occurred to me that anyone would have any difficulty understanding the notion of "speed". After all, we deal with it every day when driving.

July 30, 2013
On 7/30/2013 11:02 AM, Dmitry Olshansky wrote:
> 26-Jul-2013 23:17, Walter Bright пишет:
>> How about a pull request so we can try it out?
>
> Preliminary pull is here:
> https://github.com/D-Programming-Language/dmd/pull/2436
>
> So far it looses a bit.

:-) That's often been my experience.


> I'm still playing with it, looking at load factor,
> distribution of sizes, profiling.
>
> So far observations are that SpookyHash is slower then the one that was there
> thus stealing a few percents of speed. That is then hardly regained by a faster
> lookup of a slot:
> almost all of "large" tables were 31 in size and you have special case for that
> anyway.
>
> What bothers me is that while I've been hacking at this I couldn't shake off the
> feeling that AA code assumes NO FULL HASH COLLISIONS at all?

I don't know what you mean, as it has a collision resolution system. See embedded code below.


> Isn't that betting on luck (and a crude hash) a little too much (esp in 32 bit
> mode)?
>
> That is e.g. in code pasted from aav.c Key is only a hash and there is no way
> whatsoever to discern a full hash collision.
>
> Value _aaGetRvalue(AA* aa, Key key)
> {
>      //printf("_aaGetRvalue(key = %p)\n", key);
>      if (aa)
>      {
>          size_t i;
>          size_t len = aa->b_length;
>          if (len == 4)
>              i = (size_t)key & 3;
>          else if (len == 31)
>              i = (size_t)key % 31;
>          else
>              i = (size_t)key % len;
>          aaA* e = aa->b[i];
>          while (e)
>          {
>              if (key == e->key)
>                  return e->value;
>              e = e->next;

**** ^^^ collision resolution code ^^^ *****

>          }
>      }
>      return NULL;    // not found
> }


July 30, 2013
On 7/30/13 11:13 AM, Walter Bright wrote:
> On 7/30/2013 2:59 AM, Leandro Lucarella wrote:
>> I just want to point out that being so much people getting this wrong
>> (and even fighting to convince other people that the wrong
>> interpretation is right) might be an indication that the message you
>> wanted to give in that blog is not extremely clear :)
>
> It never occurred to me that anyone would have any difficulty
> understanding the notion of "speed". After all, we deal with it every
> day when driving.

And when applying the engine brake. Today, I reduced my speed by engine braking by 50%!

Andrei


July 30, 2013
On Tue, Jul 30, 2013 at 12:05 PM, Andrei Alexandrescu < SeeWebsiteForEmail@erdani.org> wrote:

> On 7/30/13 11:13 AM, Walter Bright wrote:
>
>> On 7/30/2013 2:59 AM, Leandro Lucarella wrote:
>>
>>> I just want to point out that being so much people getting this wrong (and even fighting to convince other people that the wrong interpretation is right) might be an indication that the message you wanted to give in that blog is not extremely clear :)
>>>
>>
>> It never occurred to me that anyone would have any difficulty understanding the notion of "speed". After all, we deal with it every day when driving.
>>
>
Yeh sure.  Like "I made the trip to grandmother's house in 0.25
trips/hour!.  That's 25% faster than last week when I only drove at 0.2
trips/hour."
I say that all the time.  ;-)

--bb


July 31, 2013
30-Jul-2013 22:22, Walter Bright пишет:
> On 7/30/2013 11:02 AM, Dmitry Olshansky wrote:

>>
>> What bothers me is that while I've been hacking at this I couldn't
>> shake off the
>> feeling that AA code assumes NO FULL HASH COLLISIONS at all?
>
> I don't know what you mean, as it has a collision resolution system. See
> embedded code below.

Yes but it does so using full _hash_ alone.
Basically Key is size_t, if we store strings in this AA and they hash to exactly the same size_t "key" then you'll never find one of them.


>> Value _aaGetRvalue(AA* aa, Key key)
>> {
>>      //printf("_aaGetRvalue(key = %p)\n", key);
>>      if (aa)
>>      {
>>          size_t i;
>>          size_t len = aa->b_length;
>>          if (len == 4)
>>              i = (size_t)key & 3;
>>          else if (len == 31)
>>              i = (size_t)key % 31;
>>          else
>>              i = (size_t)key % len;
>>          aaA* e = aa->b[i];

***   ^^^ obviously key is only a hash value ***

>>          while (e)
>>          {
>>              if (key == e->key)
>>                  return e->value;
>>              e = e->next;
>
> **** ^^^ collision resolution code ^^^ *****


Here key is 32 bits. Surely 2 strings can hash to the exact same 32 bit value. This resolves only slot collision. It doesn't resolve full hash collision.

>
>>          }
>>      }
>>      return NULL;    // not found
>> }
>
>


-- 
Dmitry Olshansky
July 31, 2013
On 7/31/2013 1:49 AM, Dmitry Olshansky wrote:
> Here key is 32 bits. Surely 2 strings can hash to the exact same 32 bit value.

No, they cannot. The "hash value" is a pointer to the string. The strings are already inserted into another hash table, so all strings that are the same are combined. Therefore, all unique strings "hash" to unique values.

> This resolves only slot collision. It doesn't resolve full hash collision.

If it was broken the compiler wouldn't work at all :-)