January 23

On Monday, 22 January 2024 at 23:31:16 UTC, deadalnix wrote:

>

On Monday, 22 January 2024 at 22:44:20 UTC, Walter Bright wrote:

>

On 1/22/2024 1:11 PM, Bruce Carneal wrote:

>

Blake3 might be worth a look.  It's reportedly faster and stronger than md5.

https://github.com/BLAKE3-team/BLAKE3

Thanks for the tip. I don't know how to evaluate a hash function for uniqueness.

Blake is > 128 bits, so I don't think there is anything really interesting there anyways.

It's probably possible to take just the first (or last) 128 bits of BLAKE3 and the result might have better properties than MD5 (considering that MD5 is broken). There are some answers related to "truncated hash" on the Internet about SHA-256, but any cryptographically secure hash is likely to be similar in this aspect: https://crypto.stackexchange.com/questions/161/should-i-use-the-first-or-last-bits-from-a-sha-256-hash/163#163

January 22
On Monday, January 22, 2024 7:33:33 PM MST Siarhei Siamashka via Digitalmars-d wrote:
> On Monday, 22 January 2024 at 23:31:16 UTC, deadalnix wrote:
> > On Monday, 22 January 2024 at 22:44:20 UTC, Walter Bright wrote:
> >> On 1/22/2024 1:11 PM, Bruce Carneal wrote:
> >>> Blake3 might be worth a look.  It's reportedly faster and stronger than md5.
> >>>
> >>> https://github.com/BLAKE3-team/BLAKE3
> >>
> >> Thanks for the tip. I don't know how to evaluate a hash function for uniqueness.
> >
> > Blake is > 128 bits, so I don't think there is anything really interesting there anyways.
>
> It's probably possible to take just the first (or last) 128 bits
> of BLAKE3 and the result might have better properties than MD5
> (considering that MD5 is broken). There are some answers related
> to "truncated hash" on the Internet about SHA-256, but any
> cryptographically secure hash is likely to be similar in this
> aspect:
> https://crypto.stackexchange.com/questions/161/should-i-use-the-first-or-las
> t-bits-from-a-sha-256-hash/163#163

Why on earth would we care if the hash is secure in this context? We obviously care about the likelihood of collisions, and if that's high enough (which does not seem to be the case for md5), then the hash is unsuitable for comparing class names, but why would crytographic security matter when comparing class names in the compiler?

- Jonathan M Davis



January 23

On Tuesday, 23 January 2024 at 03:34:23 UTC, Jonathan M Davis wrote:

>

On Monday, January 22, 2024 7:33:33 PM MST Siarhei Siamashka via Digitalmars-d wrote:

>

On Monday, 22 January 2024 at 23:31:16 UTC, deadalnix wrote:

>

On Monday, 22 January 2024 at 22:44:20 UTC, Walter Bright wrote:

>

On 1/22/2024 1:11 PM, Bruce Carneal wrote:

>

Blake3 might be worth a look. It's reportedly faster and stronger than md5.

https://github.com/BLAKE3-team/BLAKE3

Thanks for the tip. I don't know how to evaluate a hash function for uniqueness.

Blake is > 128 bits, so I don't think there is anything really interesting there anyways.

It's probably possible to take just the first (or last) 128 bits
of BLAKE3 and the result might have better properties than MD5
(considering that MD5 is broken). There are some answers related
to "truncated hash" on the Internet about SHA-256, but any
cryptographically secure hash is likely to be similar in this
aspect:
https://crypto.stackexchange.com/questions/161/should-i-use-the-first-or-las
t-bits-from-a-sha-256-hash/163#163

Why on earth would we care if the hash is secure in this context? We obviously care about the likelihood of collisions, and if that's high enough (which does not seem to be the case for md5), then the hash is unsuitable for comparing class names, but why would crytographic security matter when comparing class names in the compiler?

BLAKE3 was mentioned because it is allegedly faster than MD5 (this still needs to be confirmed). You can take as many bits of its output as you need and being too large is not a problem. Having high likelihood of collisions would have made it a bad hash from the security standpoint, so you don't need to worry about collisions for your use case either.

January 23
On Tuesday, 23 January 2024 at 03:34:23 UTC, Jonathan M Davis wrote:
> On Monday, January 22, 2024 7:33:33 PM MST Siarhei Siamashka via Digitalmars-d wrote:
>> On Monday, 22 January 2024 at 23:31:16 UTC, deadalnix wrote:
>> > On Monday, 22 January 2024 at 22:44:20 UTC, Walter Bright wrote:
>> >> On 1/22/2024 1:11 PM, Bruce Carneal wrote:
>> >>> Blake3 might be worth a look.  It's reportedly faster and stronger than md5.
>>
>>
>> It's probably possible to take just the first (or last) 128 bits
>> of BLAKE3 and the result might have better properties than MD5
>> (considering that MD5 is broken). There are some answers related
>> to "truncated hash" on the Internet about SHA-256, but any
>> cryptographically secure hash is likely to be similar in this
>> aspect:
>> https://crypto.stackexchange.com/questions/161/should-i-use-the-first-or-las
>> t-bits-from-a-sha-256-hash/163#163
>
> Why on earth would we care if the hash is secure in this context? We obviously care about the likelihood of collisions, and if that's high enough (which does not seem to be the case for md5), then the hash is unsuitable for comparing class names, but why would crytographic security matter when comparing class names in the compiler?
>
> - Jonathan M Davis

I don't think it matters at all except for what it implies about collisions.  And, again, I don't think any modern hash is bad wrt collisions but I'm not a hash researcher.

If it is true that any modern hash is in the "more likely to be smacked by a meteor than experience a collision" category *and* the current (md5) time burn is more than a blip then it's probably worth taking a look at the alternatives.

OTOH, it's not like there is a shortage of things to work on... :-)

January 23
On Monday, 22 January 2024 at 20:46:26 UTC, Walter Bright wrote:
> On 1/22/2024 2:08 AM, Andrea Fontana wrote:
>> Why md5 and not a faster method?
>
> Md5 has an extremely remote chance of two class names hashing to the same value. Some people have argued this is unacceptable, though I opine that the odds are so low they are unimaginable to humans.
>
> A replacement that has a perceptible collision rate will be unacceptable.
>
> This is not a hash backed up by a string compare. The hash has to be a substitute for the string compare.

So probably you like xxhash128

https://xxhash.com/

Andrea
January 23
On 1/23/2024 3:49 AM, Andrea Fontana wrote:
> So probably you like xxhash128
> 
> https://xxhash.com/

Thanks! I do, it has a 64 bit version, does anyone know what the collision rate is?
January 23
On 1/23/24 2:37 PM, Walter Bright wrote:
> On 1/23/2024 3:49 AM, Andrea Fontana wrote:
>> So probably you like xxhash128
>>
>> https://xxhash.com/
> 
> Thanks! I do, it has a 64 bit version, does anyone know what the collision rate is?

xxhash has a wiki page about it [1]. Given that it's their page, it's probably unsurprising that xxhash does well.

[1]: https://github.com/Cyan4973/xxHash/wiki/Collision-ratio-comparison#collision-study

January 23

On Tuesday, 23 January 2024 at 21:37:42 UTC, Walter Bright wrote:

>

On 1/23/2024 3:49 AM, Andrea Fontana wrote:

>

So probably you like xxhash128

https://xxhash.com/

Thanks! I do, it has a 64 bit version, does anyone know what the collision rate is?

If you really need a 64-bit version, then you can simply take the first 64 bits of MD5. Assuming no defects in a 64-bit hash implementation, the collision probability is 1.0 / 2^^64 for a pair of hashed strings. And it becomes a https://en.wikipedia.org/wiki/Birthday_problem if the number of hashed strings is larger. There's no magic algorithmic sauce that can make a small hash collision resistant. A quote from the Wikipedia article:

>

As an example, if a 64-bit hash is used, there are approximately 1.8×10^^19 different outputs. If these are all equally probable (the best case), then it would take 'only' approximately 5 billion attempts (5.38×10^^9) to generate a collision using brute force.[6] This value is called birthday bound[7] and for n-bit codes it could be approximated as 2^^(n/2).

I would say that any 64-bit hash is not sufficiently collision resistant for the problem that you are trying to solve.

January 23
On Tuesday, 23 January 2024 at 21:37:42 UTC, Walter Bright wrote:
> On 1/23/2024 3:49 AM, Andrea Fontana wrote:
>> So probably you like xxhash128
>> 
>> https://xxhash.com/
>
> Thanks! I do, it has a 64 bit version, does anyone know what the collision rate is?

Check this:
https://fossies.org/linux/xxHash/tests/collisions/README.md

All good hash algorithms should be uniform...

Andrea
January 24

On Tuesday, 23 January 2024 at 22:15:11 UTC, Siarhei Siamashka wrote:

>

On Tuesday, 23 January 2024 at 21:37:42 UTC, Walter Bright wrote:

>

On 1/23/2024 3:49 AM, Andrea Fontana wrote:

>

So probably you like xxhash128

https://xxhash.com/

Thanks! I do, it has a 64 bit version, does anyone know what the collision rate is?

If you really need a 64-bit version, then you can simply take the first 64 bits of MD5. Assuming no defects in a 64-bit hash implementation, the collision probability is 1.0 / 2^^64 for a pair of hashed strings. And it becomes a https://en.wikipedia.org/wiki/Birthday_problem if the number of hashed strings is larger. There's no magic algorithmic sauce that can make a small hash collision resistant. A quote from the Wikipedia article:

While it is not possible to do better than this, it is certainly possible - and in fact way easier than you'd expect - to do worse.

I see no reason to move away from md5 here. It works, its speed is nowhere close to the bottleneck, and its pre-image resistance is of no concern for this use case.

This thread has gone in the typical bullshit D goes into, arguing about the intricacies of some details that do not matter at all for the value delivered in the end.