January 15, 2021
On Friday, 15 January 2021 at 18:11:04 UTC, Bruce Carneal wrote:
> Type erasure can be tricky, even when it is restricted to basic value types of the same size.  This shows up when implementing radix sort where one solution is to map to/from whole numbers (NaN semantics being ignored).

Yes, I think maybe it can work if one cannot retrieve the key. Then one can store it in a coded fashion that sorts correctly as byte sequences?

January 16, 2021
On Friday, 15 January 2021 at 18:14:22 UTC, Ola Fosheim Grøstad wrote:
> On Friday, 15 January 2021 at 18:11:04 UTC, Bruce Carneal wrote:
>> Type erasure can be tricky, even when it is restricted to basic value types of the same size.  This shows up when implementing radix sort where one solution is to map to/from whole numbers (NaN semantics being ignored).
>
> Yes, I think maybe it can work if one cannot retrieve the key. Then one can store it in a coded fashion that sorts correctly as byte sequences?

You can transform to a canonical relop form wherever you'd like, of course, but mapping on entry and unmapping on retrieval or exit from a collection library would, as you might imagine, lower conversion overhead for the second and subsequent relops, reduce bloat, and enable kernel fusion.

All of my use cases for this relop type erasure to date have involved fused operations where the overhead of an additional store/load out of an intermediate collection would have been very difficult to swallow.  But if you're looking at something less predictable I'm guessing that type erased collections could be a win.

If you decide to pursue it, a report on the good, bad and ugly would be appreciated.

Good luck.


January 16, 2021
On Saturday, 16 January 2021 at 00:03:00 UTC, Bruce Carneal wrote:
> You can transform to a canonical relop form wherever you'd like, of course, but mapping on entry and unmapping on retrieval or exit from a collection library would, as you might imagine, lower conversion overhead for the second and subsequent relops, reduce bloat, and enable kernel fusion.

The problem is little endian. Ascii strings will have to be reversed. The order of two 32 bit ints have to be reversed. These issues are not present for big endian...
January 16, 2021
On Saturday, 16 January 2021 at 05:39:02 UTC, Ola Fosheim Grostad wrote:
> On Saturday, 16 January 2021 at 00:03:00 UTC, Bruce Carneal wrote:
>> You can transform to a canonical relop form wherever you'd like, of course, but mapping on entry and unmapping on retrieval or exit from a collection library would, as you might imagine, lower conversion overhead for the second and subsequent relops, reduce bloat, and enable kernel fusion.
>
> The problem is little endian. Ascii strings will have to be reversed. The order of two 32 bit ints have to be reversed. These issues are not present for big endian...

It sounds like mappings to arbitrary precision big endian unsigned is where you're headed currently.

Implementing relop-unifying xforms to whatever form you actually choose should not be difficult.  You have the types in hand so you're looking at a specialized binary serialization problem.  Coming back out again is trickier I think.


January 16, 2021
On Saturday, 16 January 2021 at 06:33:13 UTC, Bruce Carneal wrote:
> It sounds like mappings to arbitrary precision big endian unsigned is where you're headed currently.

More like converting to ulong snd ucent. But CPUs have instructions for endian conversion so that is ok. Converting from signed to unsigned is just one add.

> Implementing relop-unifying xforms to whatever form you actually choose should not be difficult.  You have the types in hand so you're looking at a specialized binary serialization problem.  Coming back out again is trickier I think.

I think most conversion will only take 1-4 cycles. The advantage is that once everything is uniform you can use handcoded SIMD in the container algorithms, which would be too much work otherwise... Dunno if the trade off is worth it,but it might?

Btw, one challenge for getting compiler level type erasure is that function addresses cannot be used for type identity, maybe a language spec issue?


January 16, 2021
On Saturday, 16 January 2021 at 06:57:53 UTC, Ola Fosheim Grostad wrote:
> On Saturday, 16 January 2021 at 06:33:13 UTC, Bruce Carneal wrote:
>> It sounds like mappings to arbitrary precision big endian unsigned is where you're headed currently.
>
> More like converting to ulong snd ucent. But CPUs have instructions for endian conversion so that is ok. Converting from signed to unsigned is just one add.

Or, more precisely, make it architecture dependent. I guess that is the crux.

The library has to provide inline-able conversion operations.
January 16, 2021
On Friday, 15 January 2021 at 18:11:04 UTC, Bruce Carneal wrote:
> value types of the same size.  This shows up when implementing radix sort where one solution is to map to/from whole numbers (NaN semantics being ignored).

If we accept the following encoding for floating point then it should work out ok?

INPUT               OUTPUT
s  exp  sig         s  exp  sig

1  1100 1111110     0  0011 0000001
1  0011 0000001     0  1100 1111110
1  0000 0000000     0  1111 1111111

0  0000 0000000     1  0000 1111111
0  0011 0000011     1  0011 0000001
0  1100 1111110     1  1100 1111110


Algorithm:
1. flip the signbit
2. if the flipped sign is 0 then negate the exponent and signficand/mantissa.

And it can be reversed just as easily?

Not too bad!?

January 16, 2021
On Saturday, 16 January 2021 at 08:41:18 UTC, Ola Fosheim Grøstad wrote:
> On Friday, 15 January 2021 at 18:11:04 UTC, Bruce Carneal wrote:
>> value types of the same size.  This shows up when implementing radix sort where one solution is to map to/from whole numbers (NaN semantics being ignored).
>
> If we accept the following encoding for floating point then it should work out ok?
>
> INPUT               OUTPUT
> s  exp  sig         s  exp  sig
>
> 1  1100 1111110     0  0011 0000001
> 1  0011 0000001     0  1100 1111110
> 1  0000 0000000     0  1111 1111111
>
> 0  0000 0000000     1  0000 1111111

Typo:
  0  0000 0000000     1  0000 0000000

Maybe I am overlooking something... This is like 1-2 cycles.

January 16, 2021
On Saturday, 16 January 2021 at 08:45:23 UTC, Ola Fosheim Grøstad wrote:
> Maybe I am overlooking something... This is like 1-2 cycles.

I haven't tested, so I could be in error, but it seems like these two might work?

ulong x = floatingpoint double value;

1: x ^ ((0ULL - (x>>63)) | (1ULL<63))

or

2: x & (1ULL<63) ? ~x : x | (1ULL<63)


So basically in generic assembly:

1:  shift right, sub, or, xor  => 4 very fast ops

2:  tst, cmp, (neg / or) => 2 fast ops + 1 branch

If this works, then there is really no reason not to use ulong/uint for floating point keys, except -0.0 and 0.0 will be sorted, but that is desirable sometimes, so could be a plus as well. If people don't want that then they should normalize to 0.0 before using the key.


January 16, 2021
On Saturday, 16 January 2021 at 09:28:22 UTC, Ola Fosheim Grøstad wrote:
> On Saturday, 16 January 2021 at 08:45:23 UTC, Ola Fosheim Grøstad wrote:
>> Maybe I am overlooking something... This is like 1-2 cycles.
>
> I haven't tested, so I could be in error, but it seems like these two might work?
>
> ulong x = floatingpoint double value;
>
> 1: x ^ ((0ULL - (x>>63)) | (1ULL<63))
>
> or
>
> 2: x & (1ULL<63) ? ~x : x | (1ULL<63)
>
>
> So basically in generic assembly:
>
> 1:  shift right, sub, or, xor  => 4 very fast ops
>
> 2:  tst, cmp, (neg / or) => 2 fast ops + 1 branch
>
> If this works, then there is really no reason not to use ulong/uint for floating point keys, except -0.0 and 0.0 will be sorted, but that is desirable sometimes, so could be a plus as well. If people don't want that then they should normalize to 0.0 before using the key.

Here's a link on the topic: http://stereopsis.com/radix.html.  Excerpts from that for the 32 bit float case follow:

Converting over:
	uint32 mask = -int32(f >> 31) | 0x80000000;
	return f ^ mask;
and back:
	uint32 mask = ((f >> 31) - 1) | 0x80000000;
	return f ^ mask;

This form is branchless and is easily (SIMD) vectorized.

As you've noted, there are some oddities that arise when using unsigned relops against the mapped values.  Negative and positive zeroes are distinct and NaNs take on a definite, and split, ordering.  For my current work I prefer this to traditional float behavior and find it a useful bijection overall.  YMMV.