Improve the OOP ABI (page 4)

FNV1A is used for hashOf. HashOf seems to be a lot faster than md5 only when you're dealing with smaller strings. I've done a test online and you'll see, this won't do actual difference in compilation time:

import std;

    auto test(T)(scope T delegate() dg)
    {
        import std.datetime.stopwatch;
        StopWatch sw = StopWatch(AutoStart.yes);
        scope(exit){writeln(sw.peek.total!"msecs");}
        return dg();
    }

    string repeat(string a, int n)
    {
        char[] ret = new char[](a.length*n);
        foreach(i; 0..n)
            ret[i*a.length..(i+1)*a.length] = a[];
        return cast(string)ret;
    }
    void main()
    {
        import std.digest.md;

        foreach(count; [1, 5, 10, 15, 25, 35, 50, 100, 200])
        {
        	string testString = "a".repeat(count);

            writeln("\nResult for 1_000_000 times executing hashes for a string if size ", count);
            test(()
                 {
                     foreach(i; 0..1_000_000)
                         hashOf(testString);
                 });
            test(()
                 {
                     foreach(i; 0..1_000_000)
                         md5Of(testString);
                 });
        }
    }

This is the result I have for this program:

Result for 1_000_000 times executing hashes for a string if size 1
12
130

Result for 1_000_000 times executing hashes for a string if size 5
25
132

Result for 1_000_000 times executing hashes for a string if size 10
36
129

Result for 1_000_000 times executing hashes for a string if size 15
41
124

Result for 1_000_000 times executing hashes for a string if size 25
54
127

Result for 1_000_000 times executing hashes for a string if size 35
69
120

Result for 1_000_000 times executing hashes for a string if size 50
94
116

Result for 1_000_000 times executing hashes for a string if size 100
176
217

Result for 1_000_000 times executing hashes for a string if size 200
379
394

As you see, this is a result for 1 million times hashing strings. I can't even think in many situations people would be doing 1 million time hashings. For a big program, you'll probably get 200 string literals. And if you test, it is still 0 milliseconds. So, MD5 is a pretty solution for that problem, and this would be too much effort for actually no gain at all, it could even create duplicate strings for no millisecond gain :)

January 22

Re: Improve the OOP ABI

Posted by ryuukk_
in reply to Hipreme

Permalink

ryuukk_

Posted in reply to Hipreme

Permalink

On Monday, 22 January 2024 at 14:22:33 UTC, Hipreme wrote:

On Monday, 22 January 2024 at 13:48:20 UTC, Andrea Fontana wrote:

On Monday, 22 January 2024 at 13:47:34 UTC, Andrea Fontana wrote:

On Monday, 22 January 2024 at 10:23:55 UTC, ryuukk_ wrote:

On Monday, 22 January 2024 at 10:08:16 UTC, Andrea Fontana wrote:

On Sunday, 14 January 2024 at 09:04:32 UTC, Walter Bright wrote:

On 9/28/2023 5:42 AM, deadalnix wrote:

1/ Downcast to final classes.

This has a PR for it now.

https://github.com/dlang/dmd/pull/16032

Why md5 and not a faster method?

Andrea

That is good opportunity to submit a PR, try it, it's not that hard and is rewarding

I know: I've already submitted PR to dmd and phobos :)

For example, this one. Easy, fast, insecure (who cares, in this case).
http://www.isthe.com/chongo/tech/comp/fnv/

Andrea

The right one is FNV-1a:
http://www.isthe.com/chongo/src/fnv/hash_32a.c

import std;

    auto test(T)(scope T delegate() dg)
    {
        import std.datetime.stopwatch;
        StopWatch sw = StopWatch(AutoStart.yes);
        scope(exit){writeln(sw.peek.total!"msecs");}
        return dg();
    }

    string repeat(string a, int n)
    {
        char[] ret = new char[](a.length*n);
        foreach(i; 0..n)
            ret[i*a.length..(i+1)*a.length] = a[];
        return cast(string)ret;
    }
    void main()
    {
        import std.digest.md;

        foreach(count; [1, 5, 10, 15, 25, 35, 50, 100, 200])
        {
        	string testString = "a".repeat(count);

            writeln("\nResult for 1_000_000 times executing hashes for a string if size ", count);
            test(()
                 {
                     foreach(i; 0..1_000_000)
                         hashOf(testString);
                 });
            test(()
                 {
                     foreach(i; 0..1_000_000)
                         md5Of(testString);
                 });
        }
    }

This is the result I have for this program:

Result for 1_000_000 times executing hashes for a string if size 1
12
130

Result for 1_000_000 times executing hashes for a string if size 5
25
132

Result for 1_000_000 times executing hashes for a string if size 10
36
129

Result for 1_000_000 times executing hashes for a string if size 15
41
124

Result for 1_000_000 times executing hashes for a string if size 25
54
127

Result for 1_000_000 times executing hashes for a string if size 35
69
120

Result for 1_000_000 times executing hashes for a string if size 50
94
116

Result for 1_000_000 times executing hashes for a string if size 100
176
217

Result for 1_000_000 times executing hashes for a string if size 200
379
394

It hashes class name, they'll probably never be more than 30 characters, so md5 in your example would be 2x slower

I think it worth doing more benchmarks, even few milliseconds gained is worthwhile, that's milliseconds you could spend doing more template work for example, it's additive

January 22

Re: Improve the OOP ABI

Posted by Andrea Fontana
in reply to ryuukk_

Permalink

Andrea Fontana

Posted in reply to ryuukk_

Permalink

On Monday, 22 January 2024 at 15:46:02 UTC, ryuukk_ wrote:

> >

It hashes class name, they'll probably never be more than 30 characters, so md5 in your example would be 2x slower

I think it worth doing more benchmarks, even few milliseconds gained is worthwhile, that's milliseconds you could spend doing more template work for example, it's additive

Indeed. And probably xxhash would be even faster if well optimized, I guess.

Andrea

January 22

Re: Improve the OOP ABI

Posted by Walter Bright
in reply to Andrea Fontana

Permalink

Walter Bright

Posted in reply to Andrea Fontana

Permalink

On 1/22/2024 2:08 AM, Andrea Fontana wrote:
> Why md5 and not a faster method?

Md5 has an extremely remote chance of two class names hashing to the same value. Some people have argued this is unacceptable, though I opine that the odds are so low they are unimaginable to humans.

A replacement that has a perceptible collision rate will be unacceptable.

This is not a hash backed up by a string compare. The hash has to be a substitute for the string compare.

January 23

Re: Improve the OOP ABI

Posted by Richard (Rikki) Andrew Cattermole
in reply to Walter Bright

Permalink

Richard (Rikki) Andrew Cattermole

Posted in reply to Walter Bright

Permalink

On 23/01/2024 9:46 AM, Walter Bright wrote:
> On 1/22/2024 2:08 AM, Andrea Fontana wrote:
>> Why md5 and not a faster method?
> 
> Md5 has an extremely remote chance of two class names hashing to the same value. Some people have argued this is unacceptable, though I opine that the odds are so low they are unimaginable to humans.
> 
> A replacement that has a perceptible collision rate will be unacceptable.
> 
> This is not a hash backed up by a string compare. The hash has to be a substitute for the string compare.

Quite a few months ago I discussed this problem for class comparison and proposed the hash to deadalnix.

But not as a substitute for the string comparison, but as a limiter for how often it needs to be done.

The string comparison is the only way to do the comparison reliably. You can't use the pointers nor do the string comparison all the time, too expensive.

So by doing the hash comparison of say an int, we can reliably determine if it not a match and fail fast. Sadly no way to success fast without potential problems, which is what you seem to be hand waving the possible problems for so it'll be fast success.

January 22

Re: Improve the OOP ABI

Posted by Bruce Carneal
in reply to Walter Bright

Permalink

Bruce Carneal

Posted in reply to Walter Bright

Permalink

On Monday, 22 January 2024 at 20:46:26 UTC, Walter Bright wrote:
> On 1/22/2024 2:08 AM, Andrea Fontana wrote:
>> Why md5 and not a faster method?
>
> Md5 has an extremely remote chance of two class names hashing to the same value. Some people have argued this is unacceptable, though I opine that the odds are so low they are unimaginable to humans.
>
> A replacement that has a perceptible collision rate will be unacceptable.
>
> This is not a hash backed up by a string compare. The hash has to be a substitute for the string compare.

Blake3 might be worth a look.  It's reportedly faster and stronger than md5.

https://github.com/BLAKE3-team/BLAKE3

January 22

Re: Improve the OOP ABI

Posted by Walter Bright
in reply to Richard (Rikki) Andrew Cattermole

Permalink

Walter Bright

Posted in reply to Richard (Rikki) Andrew Cattermole

Permalink

On 1/22/2024 12:57 PM, Richard (Rikki) Andrew Cattermole wrote:
> The string comparison is the only way to do the comparison reliably.


https://stackoverflow.com/questions/201705/how-many-random-elements-before-md5-produces-collisions

One collision in 6 billion hashes per second for 100 years.

Top | Forum index | About this forum

Forums