July 09, 2014
On Wednesday, 9 July 2014 at 17:13:21 UTC, H. S. Teoh via Digitalmars-d-learn wrote:
> The branched version would look something like this:
>
> 	mov	eax, [<address of u>]
> 	mov	ebx, [<address of s>]
> 	cmp	ebx, $#0
> 	jge	label1		; first branch
> 	mov	eax, $#FFFFFFFF
> 	jmp	label2		; 2nd branch
> label1:
> 	sub	eax, ebx
> label2:
> 	(ret)
Why?

I would say:
        mov     eax, [<adress of s>] ; mov directly compares to zero
        jl      lable                ; less -> jump to return
        sub     eax, [<adress of u>]
        neg     eax                  ; because we subtracted in the wrong order
lable:  ret
July 09, 2014
On Wed, Jul 09, 2014 at 05:43:15PM +0000, Dominikus Dittes Scherkl via Digitalmars-d-learn wrote:
> On Wednesday, 9 July 2014 at 17:13:21 UTC, H. S. Teoh via Digitalmars-d-learn wrote:
[..]
> >Yeah, I don't see what's the problem with comparing signed and unsigned values, as long as the result is as expected. Currently, however, this code asserts, which is wrong:
> >
> >	uint x = uint.max;
> >	int y = -1;
> >	assert(x > y);
>
> Yes, this is really bad.
> But last time I got the response that this is so to be compatible with
> C.  That is what I really thought was the reason why D throw away
> balast from C, to fix bugs.

I think the slogan was that if something in D looks like C, then it should either have C semantics or not compile. According to this logic, the only recourse here is to prohibit comparison of signed with unsigned, which I don't think is workable because there are many valid use cases for it (plus, it will break a ton of code and people will be very unhappy).

I don't like the current behaviour, though. It just reeks of wrong in so many different ways. If you *really* want semantives like the above, you really should be casting y to unsigned so that it's clear what exactly you're trying to achieve.


[...]
> >Hmm. I wonder if there's a more efficient way to do this.
> I'm sure. But I think it should be done at the compiler, not in a library.

Obviously, yes. But I wasn't thinking about implementing opCmp in the library -- that would be strange since ints, of all things, need to have native compiler support. I was thinking more about how the compiler would implement "safe" signed/unsigned comparisons.


[...]
> >So I submit that the unbranched version is better. ;-)
> I don't think so, because the branch will only be taken if the signed type is >= 0 (in fact unsigned). So if the signed/unsigned comparison is by accident, you pay the extra runtime. But if it is intentional the signed value is likely to be negative, so you get a correct result with no extra cost.

Good point. Moreover, I have discovered multiple bugs in my proposed implementation; the correct implementation should be as follows:

	int compare(int x, uint y)
	{
		enum signbitMask = 1u << (int.sizeof*8 - 1);
		static assert(signbitMask == 0x80000000);

		// The (x|y) & signbitMask basically means that if either x is negative
		// or y > int.max, then the result will always be negative (sign bit
		// set).
		return (cast(uint)x - y) | ((x | y) & signbitMask);
	}

	unittest
	{
		// Basic cases
		assert(compare(5, 10u) < 0);
		assert(compare(5, 5u) == 0);
		assert(compare(10, 5u) > 0);

		// Large cases
		assert(compare(0, uint.max) < 0);
		assert(compare(50, uint.max) < 0);

		// Sign-dependent cases
		assert(compare(-1, 0u) < 0);
		assert(compare(-1, 10u) < 0);
		assert(compare(-1, uint.max) < 0);
	}

	int compare(uint x, int y)
	{
		enum signbitMask = 1u << (int.sizeof*8 - 1);
		static assert(signbitMask == 0x80000000);
		return ((x - y) & ~(x & signbitMask)) | ((cast(uint)y & signbitMask) >> 1);
	}

	unittest
	{
		// Basic cases
		assert(compare(0u, 10) < 0);
		assert(compare(10u, 10) == 0);
		assert(compare(10u, 5) > 0);

		// Large cases
		assert(compare(uint.max, 10) > 0);
		assert(compare(uint.max, -10) > 0);

		// Sign-dependent cases
		assert(compare(0u, -1) > 0);
		assert(compare(10u, -1) > 0);
		assert(compare(uint.max, -1) > 0);
	}


Using gdc -O3, I managed to get a very good result for
compare(int,uint), only 5 instructions long.

However, for compare(uint,int), there is the annoying special case of
compare(uint.max, -1), which can only be fixed by the hack
... | ((y & signbitMask) >> 1).  Unfortunately, this makes it 11
instructions long, which is unacceptable. So it looks like a simple
compare and branch would be far better in the compare(uint,int) case --
it's far more costly to avoid the branch than to live with it.


> Even better for constants, where the compiler can not only evaluate expressions like (uint.max > -1) correct, but it should optimize them completely away!

Actually, with gdc -O3, I found that the body of the above unittests got completely optimized away at compile-time, so that the unittest body is empty in the executable! So even with a library implementation the compiler was able to maximize performance. DMD left the assert calls in, but then it's not exactly known for generating optimal code anyway.


> >(So much for premature optimization... now lemme go and actually benchmark this stuff and see how well it actually performs in practice.
> Yes, we should do this.
>
> >Often, such kinds of hacks often perform more poorly than expected due to unforeseen complications with today's complex CPU's. So for all I know, I could've just been spouting nonsense above. :P)
> I don't see such a compiler change as a hack. It is a strong improvement IMHO.

I was talking about using | and & to get rid of the branch in signed/unsigned comparison. As it turns out, the compare(uint,int) case seems far more costly than a simple compare-and-branch as you had it at the beginning. So at least that part of what I wrote is probably nonsense.  :P

But I can't say for sure until I actually run some benchmarks on it.


T

-- 
"The number you have dialed is imaginary. Please rotate your phone 90 degrees and try again."
July 09, 2014
On Wed, Jul 09, 2014 at 11:29:06AM -0700, H. S. Teoh via Digitalmars-d-learn wrote:
> On Wed, Jul 09, 2014 at 05:43:15PM +0000, Dominikus Dittes Scherkl via Digitalmars-d-learn wrote:
> > On Wednesday, 9 July 2014 at 17:13:21 UTC, H. S. Teoh via Digitalmars-d-learn wrote:
[...]
> > >Often, such kinds of hacks often perform more poorly than expected due to unforeseen complications with today's complex CPU's. So for all I know, I could've just been spouting nonsense above. :P)
> > I don't see such a compiler change as a hack. It is a strong improvement IMHO.
> 
> I was talking about using | and & to get rid of the branch in signed/unsigned comparison. As it turns out, the compare(uint,int) case seems far more costly than a simple compare-and-branch as you had it at the beginning. So at least that part of what I wrote is probably nonsense.  :P
> 
> But I can't say for sure until I actually run some benchmarks on it.
[...]

Hmph. I'm having trouble coming up with a fair benchmark, because I realized that D doesn't actually have a way of expressing opCmp for unsigned int's in a minimal way! The problem is that the function needs to return int, but given two uints, their difference may be greater than int.max, so simply subtracting them will not work. So the best I can come up with is:

	int compare2(int x, uint y)
	{
		return (x < 0) ? -1 :
			(y > int.max) ? -1 :
			(x - y);
	}

which requires 2 comparisons.

Similarly, for the uint-int case:

	int compare2(uint x, int y)
	{
		return (y < 0) ? 1 :
			(x > int.max) ? 1 :
			(x - y);
	}

If you have a better implementation in mind, I'm all ears.

In any case, I went ahead and benchmarked the above two functions along with my branchless implementations, and here are the results:

(with dmd -O -unittest:)

	non-branching compare(signed,unsigned): 5513 msecs
	branching compare(signed,unsigned): 5442 msecs
	non-branching compare(unsigned,signed): 5441 msecs
	branching compare(unsigned,signed): 5744 msecs
	Optimizer-thwarting value: 0

(with gdc -O3 -funittest:)

	non-branching compare(signed,unsigned): 516 msecs
	branching compare(signed,unsigned): 1209 msecs
	non-branching compare(unsigned,signed): 453 msecs
	branching compare(unsigned,signed): 756 msecs
	Optimizer-thwarting value: 0

(Ignore the last lines of each output; that's just a way to prevent gdc -O3 from being over-eager and optimizing out the entire test so that everything returns 0 msecs.)

Interestingly, with dmd, the branching compare for the signed-unsigned case is faster than my non-branching one, but the order is reversed for the unsigned-signed case. They're pretty close, though, and on some runs the order of the latter case is reversed.

With gdc, however, it seem the non-branching versions are clearly better, even in the unsigned-signed case, which I thought would be inferior. So clearly, these results are very optimizer-dependent.

Keep in mind, though, that this may not necessarily reflect actual performance when the compiler generates the equivalent code for the built-in integer comparison operators, because in codegen the compiler can take advantage of the CPU's carry and overflow bits, and can elide the actual return values of opCmp. This may skew the results enough to reverse the order of some of these cases.

Anyway, here's the code (for independent verification):

	int compare(int x, uint y)
	{
		enum signbitMask = 1u << (int.sizeof*8 - 1);
		static assert(signbitMask == 0x80000000);

		// The (x|y) & signbitMask basically means that if either x is negative
		// or y > int.max, then the result will always be negative (sign bit
		// set).
		return (cast(uint)x - y) | ((x | y) & signbitMask);
	}

	unittest
	{
		// Basic cases
		assert(compare(5, 10u) < 0);
		assert(compare(5, 5u) == 0);
		assert(compare(10, 5u) > 0);

		// Large cases
		assert(compare(0, uint.max) < 0);
		assert(compare(50, uint.max) < 0);

		// Sign-dependent cases
		assert(compare(-1, 0u) < 0);
		assert(compare(-1, 10u) < 0);
		assert(compare(-1, uint.max) < 0);
	}

	int compare2(int x, uint y)
	{
		return (x < 0) ? -1 :
			(y > int.max) ? -1 :
			x - y;
	}

	unittest
	{
		// Basic cases
		assert(compare2(5, 10u) < 0);
		assert(compare2(5, 5u) == 0);
		assert(compare2(10, 5u) > 0);

		// Large cases
		assert(compare2(0, uint.max) < 0);
		assert(compare2(50, uint.max) < 0);

		// Sign-dependent cases
		assert(compare2(-1, 0u) < 0);
		assert(compare2(-1, 10u) < 0);
		assert(compare2(-1, uint.max) < 0);
	}

	int compare(uint x, int y)
	{
		enum signbitMask = 1u << (int.sizeof*8 - 1);
		static assert(signbitMask == 0x80000000);
		return ((x - y) & ~(x & signbitMask)) | ((cast(uint)y & signbitMask) >> 1);
	}

	unittest
	{
		// Basic cases
		assert(compare(0u, 10) < 0);
		assert(compare(10u, 10) == 0);
		assert(compare(10u, 5) > 0);

		// Large cases
		assert(compare(uint.max, 10) > 0);
		assert(compare(uint.max, -10) > 0);

		// Sign-dependent cases
		assert(compare(0u, -1) > 0);
		assert(compare(10u, -1) > 0);
		assert(compare(uint.max, -1) > 0);
	}

	int compare2(uint x, int y)
	{
		return (y < 0) ? 1 :
			(x > int.max) ? 1 :
			x - y;
	}

	unittest
	{
		// Basic cases
		assert(compare2(0u, 10) < 0);
		assert(compare2(10u, 10) == 0);
		assert(compare2(10u, 5) > 0);

		// Large cases
		assert(compare2(uint.max, 10) > 0);
		assert(compare2(uint.max, -10) > 0);

		// Sign-dependent cases
		assert(compare2(0u, -1) > 0);
		assert(compare2(10u, -1) > 0);
		assert(compare2(uint.max, -1) > 0);
	}

	void main()
	{
		import std.datetime;
		import std.stdio : writefln;

		enum numRepeat = 5;

		int dummy; // deoptimizer

		auto nobranch_su = numRepeat.benchmark!(()
		{
			foreach (s; -10_000 .. 10_000)
			{
				foreach (uint u; 0 .. 20_000)
					dummy ^= compare(s, u);
			}
		});
		writefln("non-branching compare(signed,unsigned): %d msecs",
			nobranch_su[0].msecs);

		auto branch_su = numRepeat.benchmark!(()
		{
			foreach (s; -10_000 .. 10_000)
			{
				foreach (uint u; 0 .. 20_000)
					dummy ^= compare2(s, u);
			}
		});
		writefln("branching compare(signed,unsigned): %d msecs",
			branch_su[0].msecs);

		auto nobranch_us = numRepeat.benchmark!(()
		{
			foreach (s; -10_000 .. 10_000)
			{
				foreach (uint u; 0 .. 20_000)
					dummy ^= compare(u, s);
			}
		});
		writefln("non-branching compare(unsigned,signed): %d msecs",
			nobranch_us[0].msecs);

		auto branch_us = numRepeat.benchmark!(()
		{
			foreach (s; -10_000 .. 10_000)
			{
				foreach (uint u; 0 .. 20_000)
					dummy ^= compare2(u, s);
			}
		});
		writefln("branching compare(unsigned,signed): %d msecs",
			branch_us[0].msecs);

		writefln("Optimizer-thwarting value: %d", dummy);
	}


T

-- 
The best way to destroy a cause is to defend it poorly.
July 09, 2014
On Wed, Jul 09, 2014 at 12:53:08PM -0700, H. S. Teoh via Digitalmars-d-learn wrote: [...]
> (with gdc -O3 -funittest:)
> 
> 	non-branching compare(signed,unsigned): 516 msecs
> 	branching compare(signed,unsigned): 1209 msecs
> 	non-branching compare(unsigned,signed): 453 msecs
> 	branching compare(unsigned,signed): 756 msecs
> 	Optimizer-thwarting value: 0
> 
> (Ignore the last lines of each output; that's just a way to prevent gdc -O3 from being over-eager and optimizing out the entire test so that everything returns 0 msecs.)
[...]

Argh. I just looked at the disassembly, and unfortunately, we have to discard the test results for gdc, because gdc -O3 has apparently turned on auto-*vectorising* optimizations, so the reason the non-branching implementation runs so fast, is because multiple calls are being run in parallel in the xmm* registers!

While this is certainly an impressive feat for gdc's optimizer, it unfortunately also means the above benchmark doesn't reflect the actual performance of standalone int/uint comparisons. :-(


T

-- 
I see that you JS got Bach.
July 10, 2014
On Wednesday, 9 July 2014 at 19:54:47 UTC, H. S. Teoh via Digitalmars-d-learn wrote:
> [...]
> The problem is that the function needs to return
> int, but given two uints, their difference may be
> greater than int.max, so simply subtracting them
> will not work. So the best I can come up with is:
>
> 	int compare2(int x, uint y)
> 	{
> 		return (x < 0) ? -1 :
> 			(y > int.max) ? -1 :
> 			(x - y);
> 	}
>
> which requires 2 comparisons.

Hmm. Diff works for compare(int,int).
So how about this:

int compare2(int x, uint y)
{
   return (y > int.max) ? -1 : (x - cast(int)y);
}

int compare2(uint x, int y)
{
   return (x > int.max) ? -1 : (cast(int)x - y);
}
July 10, 2014
Should of course be:

int compare2(uint x, int y)
{
   return (x > int.max) ? 1 : (cast(int)x - y);
}
July 10, 2014
So at all the implementation will look something like this:

int opCmp(T, U)(const(T) a, const(U) b) @primitive if(isIntegral!T && isIntegral!U)
{
   alias CommonType!(Signed!T, Signed!U) C;

   static if(isSigned!T && isUnsigned!U)
   {
      return (b > cast(Unsigned!C)C.max) ? -1 : cast(C)a - cast(C)b;
   }
   else static if(isUnsigned!T && isSigned!U)
   {
      return (a > cast(Unsigned!C)C.max) ? 1 : cast(C)a - cast(C)b;
   }
   else // both signed or both unsigned
   {
      return cast(C)a - cast(C)b;
   }
}

And it will be just as fast as ever, except if you compare apples with peaches where it take a tick longer but give the correct result anyway
July 10, 2014
On Monday, 7 July 2014 at 23:47:26 UTC, Aerolite wrote:
> So, if you would be so kind, give me a bullet list of the aspects of D you believe to be good, awesome, bad, and/or ugly. If you have the time, some code examples wouldn't go amiss either! Try not to go in-depth to weird edge cases - remain general, yet informative. E.g. I consider D's string mixins to be in the 'awesome' category, but its reliance on the GC for large segments of the standard library to be in the 'ugly' category.

I'm a big fan of (templated) UFCS. Along with parameterless function calls it goes a long way toward improving readability with its pipe-like flow. And like with all templates, allowing for breaking out common functionality *without* resorting to inheritance. Templates overall are sexy.


void main() {
    import std.stdio;
    import std.conv;
    import std.string;
    import std.range;
    import core.thread;

    // values known at compile-time --> CTFE kicks in~

    static assert(12345.to!string == "12345");

    immutable period = 1.seconds + 100.msecs + 100.usecs;
    writeln(period);
    Thread.sleep(period);

    immutable line = "fedcba"
        .retro
        .to!string
        .toUpper;

    // wish this worked: typeof(line).is!string;
    static assert(is(typeof(line) : string));
    static assert(line == "ABCDEF");
}
1 2
Next ›   Last »