January 22

On Monday, 22 January 2024 at 10:23:55 UTC, ryuukk_ wrote:

>

On Monday, 22 January 2024 at 10:08:16 UTC, Andrea Fontana wrote:

>

On Sunday, 14 January 2024 at 09:04:32 UTC, Walter Bright wrote:

>

On 9/28/2023 5:42 AM, deadalnix wrote:

>

1/ Downcast to final classes.

This has a PR for it now.

https://github.com/dlang/dmd/pull/16032

Why md5 and not a faster method?

Andrea

That is good opportunity to submit a PR, try it, it's not that hard and is rewarding

I know: I've already submitted PR to dmd and phobos :)

For example, this one. Easy, fast, insecure (who cares, in this case).
http://www.isthe.com/chongo/tech/comp/fnv/

Andrea

January 22

On Monday, 22 January 2024 at 13:47:34 UTC, Andrea Fontana wrote:

>

On Monday, 22 January 2024 at 10:23:55 UTC, ryuukk_ wrote:

>

On Monday, 22 January 2024 at 10:08:16 UTC, Andrea Fontana wrote:

>

On Sunday, 14 January 2024 at 09:04:32 UTC, Walter Bright wrote:

>

On 9/28/2023 5:42 AM, deadalnix wrote:

>

1/ Downcast to final classes.

This has a PR for it now.

https://github.com/dlang/dmd/pull/16032

Why md5 and not a faster method?

Andrea

That is good opportunity to submit a PR, try it, it's not that hard and is rewarding

I know: I've already submitted PR to dmd and phobos :)

For example, this one. Easy, fast, insecure (who cares, in this case).
http://www.isthe.com/chongo/tech/comp/fnv/

Andrea

The right one is FNV-1a:
http://www.isthe.com/chongo/src/fnv/hash_32a.c

January 22

On Monday, 22 January 2024 at 13:48:20 UTC, Andrea Fontana wrote:

>

The right one is FNV-1a:
http://www.isthe.com/chongo/src/fnv/hash_32a.c

Probably xxhash is even faster, but there's no public domain implementation.

January 22

On Monday, 22 January 2024 at 13:48:20 UTC, Andrea Fontana wrote:

>

On Monday, 22 January 2024 at 13:47:34 UTC, Andrea Fontana wrote:

>

On Monday, 22 January 2024 at 10:23:55 UTC, ryuukk_ wrote:

>

On Monday, 22 January 2024 at 10:08:16 UTC, Andrea Fontana wrote:

>

On Sunday, 14 January 2024 at 09:04:32 UTC, Walter Bright wrote:

>

On 9/28/2023 5:42 AM, deadalnix wrote:

>

1/ Downcast to final classes.

This has a PR for it now.

https://github.com/dlang/dmd/pull/16032

Why md5 and not a faster method?

Andrea

That is good opportunity to submit a PR, try it, it's not that hard and is rewarding

I know: I've already submitted PR to dmd and phobos :)

For example, this one. Easy, fast, insecure (who cares, in this case).
http://www.isthe.com/chongo/tech/comp/fnv/

Andrea

The right one is FNV-1a:
http://www.isthe.com/chongo/src/fnv/hash_32a.c

FNV1A is used for hashOf. HashOf seems to be a lot faster than md5 only when you're dealing with smaller strings. I've done a test online and you'll see, this won't do actual difference in compilation time:

import std;

    auto test(T)(scope T delegate() dg)
    {
        import std.datetime.stopwatch;
        StopWatch sw = StopWatch(AutoStart.yes);
        scope(exit){writeln(sw.peek.total!"msecs");}
        return dg();
    }

    string repeat(string a, int n)
    {
        char[] ret = new char[](a.length*n);
        foreach(i; 0..n)
            ret[i*a.length..(i+1)*a.length] = a[];
        return cast(string)ret;
    }
    void main()
    {
        import std.digest.md;

        foreach(count; [1, 5, 10, 15, 25, 35, 50, 100, 200])
        {
        	string testString = "a".repeat(count);

            writeln("\nResult for 1_000_000 times executing hashes for a string if size ", count);
            test(()
                 {
                     foreach(i; 0..1_000_000)
                         hashOf(testString);
                 });
            test(()
                 {
                     foreach(i; 0..1_000_000)
                         md5Of(testString);
                 });
        }
    }

This is the result I have for this program:

Result for 1_000_000 times executing hashes for a string if size 1
12
130

Result for 1_000_000 times executing hashes for a string if size 5
25
132

Result for 1_000_000 times executing hashes for a string if size 10
36
129

Result for 1_000_000 times executing hashes for a string if size 15
41
124

Result for 1_000_000 times executing hashes for a string if size 25
54
127

Result for 1_000_000 times executing hashes for a string if size 35
69
120

Result for 1_000_000 times executing hashes for a string if size 50
94
116

Result for 1_000_000 times executing hashes for a string if size 100
176
217

Result for 1_000_000 times executing hashes for a string if size 200
379
394

As you see, this is a result for 1 million times hashing strings. I can't even think in many situations people would be doing 1 million time hashings. For a big program, you'll probably get 200 string literals. And if you test, it is still 0 milliseconds. So, MD5 is a pretty solution for that problem, and this would be too much effort for actually no gain at all, it could even create duplicate strings for no millisecond gain :)

January 22

On Monday, 22 January 2024 at 14:22:33 UTC, Hipreme wrote:

>

On Monday, 22 January 2024 at 13:48:20 UTC, Andrea Fontana wrote:

>

On Monday, 22 January 2024 at 13:47:34 UTC, Andrea Fontana wrote:

>

On Monday, 22 January 2024 at 10:23:55 UTC, ryuukk_ wrote:

>

On Monday, 22 January 2024 at 10:08:16 UTC, Andrea Fontana wrote:

>

On Sunday, 14 January 2024 at 09:04:32 UTC, Walter Bright wrote:

>

On 9/28/2023 5:42 AM, deadalnix wrote:

>

1/ Downcast to final classes.

This has a PR for it now.

https://github.com/dlang/dmd/pull/16032

Why md5 and not a faster method?

Andrea

That is good opportunity to submit a PR, try it, it's not that hard and is rewarding

I know: I've already submitted PR to dmd and phobos :)

For example, this one. Easy, fast, insecure (who cares, in this case).
http://www.isthe.com/chongo/tech/comp/fnv/

Andrea

The right one is FNV-1a:
http://www.isthe.com/chongo/src/fnv/hash_32a.c

FNV1A is used for hashOf. HashOf seems to be a lot faster than md5 only when you're dealing with smaller strings. I've done a test online and you'll see, this won't do actual difference in compilation time:

import std;

    auto test(T)(scope T delegate() dg)
    {
        import std.datetime.stopwatch;
        StopWatch sw = StopWatch(AutoStart.yes);
        scope(exit){writeln(sw.peek.total!"msecs");}
        return dg();
    }

    string repeat(string a, int n)
    {
        char[] ret = new char[](a.length*n);
        foreach(i; 0..n)
            ret[i*a.length..(i+1)*a.length] = a[];
        return cast(string)ret;
    }
    void main()
    {
        import std.digest.md;

        foreach(count; [1, 5, 10, 15, 25, 35, 50, 100, 200])
        {
        	string testString = "a".repeat(count);

            writeln("\nResult for 1_000_000 times executing hashes for a string if size ", count);
            test(()
                 {
                     foreach(i; 0..1_000_000)
                         hashOf(testString);
                 });
            test(()
                 {
                     foreach(i; 0..1_000_000)
                         md5Of(testString);
                 });
        }
    }

This is the result I have for this program:

Result for 1_000_000 times executing hashes for a string if size 1
12
130

Result for 1_000_000 times executing hashes for a string if size 5
25
132

Result for 1_000_000 times executing hashes for a string if size 10
36
129

Result for 1_000_000 times executing hashes for a string if size 15
41
124

Result for 1_000_000 times executing hashes for a string if size 25
54
127

Result for 1_000_000 times executing hashes for a string if size 35
69
120

Result for 1_000_000 times executing hashes for a string if size 50
94
116

Result for 1_000_000 times executing hashes for a string if size 100
176
217

Result for 1_000_000 times executing hashes for a string if size 200
379
394

As you see, this is a result for 1 million times hashing strings. I can't even think in many situations people would be doing 1 million time hashings. For a big program, you'll probably get 200 string literals. And if you test, it is still 0 milliseconds. So, MD5 is a pretty solution for that problem, and this would be too much effort for actually no gain at all, it could even create duplicate strings for no millisecond gain :)

It hashes class name, they'll probably never be more than 30 characters, so md5 in your example would be 2x slower

I think it worth doing more benchmarks, even few milliseconds gained is worthwhile, that's milliseconds you could spend doing more template work for example, it's additive

January 22

On Monday, 22 January 2024 at 15:46:02 UTC, ryuukk_ wrote:

> >

As you see, this is a result for 1 million times hashing strings. I can't even think in many situations people would be doing 1 million time hashings. For a big program, you'll probably get 200 string literals. And if you test, it is still 0 milliseconds. So, MD5 is a pretty solution for that problem, and this would be too much effort for actually no gain at all, it could even create duplicate strings for no millisecond gain :)

It hashes class name, they'll probably never be more than 30 characters, so md5 in your example would be 2x slower

I think it worth doing more benchmarks, even few milliseconds gained is worthwhile, that's milliseconds you could spend doing more template work for example, it's additive

Indeed. And probably xxhash would be even faster if well optimized, I guess.

Andrea

January 22
On 1/22/2024 2:08 AM, Andrea Fontana wrote:
> Why md5 and not a faster method?

Md5 has an extremely remote chance of two class names hashing to the same value. Some people have argued this is unacceptable, though I opine that the odds are so low they are unimaginable to humans.

A replacement that has a perceptible collision rate will be unacceptable.

This is not a hash backed up by a string compare. The hash has to be a substitute for the string compare.

January 23
On 23/01/2024 9:46 AM, Walter Bright wrote:
> On 1/22/2024 2:08 AM, Andrea Fontana wrote:
>> Why md5 and not a faster method?
> 
> Md5 has an extremely remote chance of two class names hashing to the same value. Some people have argued this is unacceptable, though I opine that the odds are so low they are unimaginable to humans.
> 
> A replacement that has a perceptible collision rate will be unacceptable.
> 
> This is not a hash backed up by a string compare. The hash has to be a substitute for the string compare.

Quite a few months ago I discussed this problem for class comparison and proposed the hash to deadalnix.

But not as a substitute for the string comparison, but as a limiter for how often it needs to be done.

The string comparison is the only way to do the comparison reliably. You can't use the pointers nor do the string comparison all the time, too expensive.

So by doing the hash comparison of say an int, we can reliably determine if it not a match and fail fast. Sadly no way to success fast without potential problems, which is what you seem to be hand waving the possible problems for so it'll be fast success.

January 22
On Monday, 22 January 2024 at 20:46:26 UTC, Walter Bright wrote:
> On 1/22/2024 2:08 AM, Andrea Fontana wrote:
>> Why md5 and not a faster method?
>
> Md5 has an extremely remote chance of two class names hashing to the same value. Some people have argued this is unacceptable, though I opine that the odds are so low they are unimaginable to humans.
>
> A replacement that has a perceptible collision rate will be unacceptable.
>
> This is not a hash backed up by a string compare. The hash has to be a substitute for the string compare.

Blake3 might be worth a look.  It's reportedly faster and stronger than md5.

https://github.com/BLAKE3-team/BLAKE3

January 22
On 1/22/2024 12:57 PM, Richard (Rikki) Andrew Cattermole wrote:
> The string comparison is the only way to do the comparison reliably.


https://stackoverflow.com/questions/201705/how-many-random-elements-before-md5-produces-collisions

One collision in 6 billion hashes per second for 100 years.