January 22
On 1/22/2024 1:11 PM, Bruce Carneal wrote:
> Blake3 might be worth a look.  It's reportedly faster and stronger than md5.
> 
> https://github.com/BLAKE3-team/BLAKE3

Thanks for the tip. I don't know how to evaluate a hash function for uniqueness.

January 22
On Sunday, 14 January 2024 at 09:04:32 UTC, Walter Bright wrote:
> On 9/28/2023 5:42 AM, deadalnix wrote:
>> 1/ Downcast to final classes.
>
> This has a PR for it now.
>
> https://github.com/dlang/dmd/pull/16032

Just wondering, could a trie be used instead of hashing algorithm?

It would guarantee that no collisions would happen at all, with better performance than simple comparison.

Best regards,
Alexandru.
January 23
On 23/01/2024 11:41 AM, Walter Bright wrote:
> On 1/22/2024 12:57 PM, Richard (Rikki) Andrew Cattermole wrote:
>> The string comparison is the only way to do the comparison reliably.
> 
> 
> https://stackoverflow.com/questions/201705/how-many-random-elements-before-md5-produces-collisions
> 
> One collision in 6 billion hashes per second for 100 years.

If we were gambling on something not happening, those are great odds.

However if this does occur, silent memory corruption is possible.

I don't want to be gambling on whether there will be silent memory corruption.

This is my perspective on the topic anyway. I love guarantees of program security.
January 23
On 23/01/2024 11:48 AM, Alexandru Ermicioi wrote:
> On Sunday, 14 January 2024 at 09:04:32 UTC, Walter Bright wrote:
>> On 9/28/2023 5:42 AM, deadalnix wrote:
>>> 1/ Downcast to final classes.
>>
>> This has a PR for it now.
>>
>> https://github.com/dlang/dmd/pull/16032
> 
> Just wondering, could a trie be used instead of hashing algorithm?
> 
> It would guarantee that no collisions would happen at all, with better performance than simple comparison.
> 
> Best regards,
> Alexandru.

That sounds an lot like an assumption that TypeInfo is unique in a process. Which may not be true.

If it was true, we could compare pointers and don't need any of this logic.
January 22
On 1/22/2024 2:49 PM, Richard (Rikki) Andrew Cattermole wrote:
>> One collision in 6 billion hashes per second for 100 years.
> 
> If we were gambling on something not happening, those are great odds.
> 
> However if this does occur, silent memory corruption is possible.
> 
> I don't want to be gambling on whether there will be silent memory corruption.
> 
> This is my perspective on the topic anyway. I love guarantees of program security.

I do understand the desire for mathematical perfection, and feel it myself. But it has a significant and always present performance penalty. Engineering is always about tradeoffs. There's no getting away from it.

I'm not concerned about a meteor hitting my house, either, which is far far more likely.

BTW, D uses md5 hashes to enable the linker to merge redundant string literals.

Other compromises we make for performance include things like floating point rounding.
January 22
On Monday, 22 January 2024 at 22:44:20 UTC, Walter Bright wrote:
> On 1/22/2024 1:11 PM, Bruce Carneal wrote:
>> Blake3 might be worth a look.  It's reportedly faster and stronger than md5.
>> 
>> https://github.com/BLAKE3-team/BLAKE3
>
> Thanks for the tip. I don't know how to evaluate a hash function for uniqueness.

Blake is > 128 bits, so I don't think there is anything really interesting there anyways.

md5 isn't cryptographically secure, mostly because 128 bits isn't cryptographically secure. But let's be real here, a collision would 100% be purposeful, there are zero chances of one happening by mistake.

I don't see what using a different hash function would buy us. I doubt the hash of class name would be anywhere close to the bottleneck in term of speed.
January 22
On 1/22/2024 3:31 PM, deadalnix wrote:
> I don't see what using a different hash function would buy us. I doubt the hash of class name would be anywhere close to the bottleneck in term of speed.

The hash also happens at compile time, not runtime. The only thing I would prefer is a hash that's 64 bits instead of 128, so it's one less compare instruction.
January 22
On 1/22/2024 2:50 PM, Richard (Rikki) Andrew Cattermole wrote:
> That sounds an lot like an assumption that TypeInfo is unique in a process. Which may not be true.
> If it was true, we could compare pointers and don't need any of this logic.

It isn't true because of shared libraries.
January 23
On 23/01/2024 12:52 PM, Walter Bright wrote:
> On 1/22/2024 2:50 PM, Richard (Rikki) Andrew Cattermole wrote:
>> That sounds an lot like an assumption that TypeInfo is unique in a process. Which may not be true.
>> If it was true, we could compare pointers and don't need any of this logic.
> 
> It isn't true because of shared libraries.

Or incremental compilation, the symbol could at least in theory be hidden in the object file and not be available for merging in the binary.

Regardless it can't be assumed if you like your sanity!
January 23
On Monday, 22 January 2024 at 22:44:20 UTC, Walter Bright wrote:
> On 1/22/2024 1:11 PM, Bruce Carneal wrote:
>> Blake3 might be worth a look.  It's reportedly faster and stronger than md5.
>> 
>> https://github.com/BLAKE3-team/BLAKE3
>
> Thanks for the tip. I don't know how to evaluate a hash function for uniqueness.

I've not looked into that aspect either.  I'd just assumed that all modern hashes were quite good and was just looking at the time to compute the hash.  I'm not at all concerned about accidental collisions apart from a broken implementation.  As you say, more likely that we get smacked by a meteor.

I've read hash comparisons claiming blake3 is anywhere from 3X to 9X faster than MD5 on a single core wrt throughput.  No idea on how things would work out for this particular use case though, nor even if a 9X throughput upgrade would be noticeable.  I'd suspect not, just wanted to put forward an alternative.