Major performance problem with std.array.front() (page 2)

On 3/6/2014 7:31 PM, H. S. Teoh wrote: > Whoa. You're not serious about changing this now, are you? Because even > though I would support such a change, you have to realize the magnitude > of code breakage that will happen. A lot of code that iterates over > narrow strings will break, and worse yet, they will break *silently*. > Calling count() on a narrow string will not return the expected value, > for example. And existing code that iterates over narrow strings > expecting dchars to come out of it will suddenly silently convert to > char, and may pass by unnoticed until somebody runs the program with a > multibyte character in the input. I understand this all too well. (Note that we currently have a different silent problem: unnoticed large performance problems.) > This is very high risk change IMO. > > You're welcome to create a (temporary) Phobos fork that reverts narrow > string auto-decoding, of course, and people can try it out to see how > much actual breakage is happening. If you really want to push for this, > that might be the safest way to test the waters before committing to > such a major change. Silent breakage is not easy to test for, > unfortunately. :( I posted a plan in another message in this thread. It'll be a long process, but I think it's doable.

Walter Bright: > I understand this all too well. (Note that we currently have a different silent problem: unnoticed large performance problems.) On the other hand your change could introduce Unicode-related bugs in future code (that the current Phobos avoids) (and here I am not talking about code breakage). Bye, bearophile

March 07, 2014

Re: Major performance problem with std.array.front()

Posted by Adam D. Ruppe
in reply to Walter Bright

Permalink

Adam D. Ruppe

Posted in reply to Walter Bright

Permalink

On Friday, 7 March 2014 at 02:57:38 UTC, Walter Bright wrote:
> Yes, so that the user selects it, rather than having it wired in everywhere and the user has to figure out how to defeat it.

BTW you know what would help this? A pragma we can attach to a struct which makes it a very thin value type.

pragma(thin_struct)
struct A {
   int a;
   int foo() { return a; }
   static A get() { A(10); }
}

void test() {
    A a = A.get();
    printf("%d", a.foo());
}

With the pragma, A would be completely indistinguishable from int in all ways.

What do I mean?
$ dmd -release -O -inline test56 -c

Let's look at A.foo:

A.foo:
   0:   55                      push   ebp
   1:   8b ec                   mov    ebp,esp
   3:   50                      push   eax
   4:   8b 00                   mov    eax,DWORD PTR [eax] ; waste!
   6:   8b e5                   mov    esp,ebp
   8:   5d                      pop    ebp
   9:   c3                      ret


It is line four that bugs me: the struct is passed as a *pointer*, but its only contents are an int, which could just as well be passed as a value. Let's compare it to an identical function in operation:

int identity(int a) { return a; }

00000000 <_D6test568identityFiZi>:
   0:   55                      push   ebp
   1:   8b ec                   mov    ebp,esp
   3:   83 ec 04                sub    esp,0x4
   6:   c9                      leave
   7:   c3                      ret

lol it *still* wastes time, setting up a stack frame for nothing. But we could just as well write asm { naked; ret; } and it would work as expected: the argument is passed in EAX and the return value is expected in EAX. The function doesn't actually have to do anything.


Anywho, the struct could work the same way. Now, I understand that we can't just change this unilaterally since it would break interaction with the C ABI, but we could opt in to some thinner stuff with a pragma.


Ideally, the thin struct would generate this code:

void A.get() {
   naked { // no need for stack frame here
       mov EAX, 10;
       ret;
   }
}

return A(10); when A is thin should be equal to return 10;. No need for NRVO, the object is super thin.

void A.foo() {
   naked { // no locals, no stack frame
       ret; // the last argument (this) is passed in EAX
            // and the return value goes in EAX
            // so we don't have to do anything
   }
}

Without the thin_struct thing, this would minimally look like

mov EAX, [EAX];
ret;

Having to load the value from the this pointer. But since it is thin, it is generated identically to an int, like the identity function above, so the value is already in the register!

Then, test:

void test() {
    naked { // don't need a stack frame here either!
        call A.get;
        // a is now in EAX, the value loaded right up
        call A.foo; // the this is an int and already
                    // where it needs to be, so just go
        // and finally, go ahead and call printf
        push EAX;
        push "%d".ptr;
        call printf;
        ret;
    }
}


Then, naturally, inlining A.get and A.foo might be possible (though I'd love to write them in assembly myself* and the compiler prolly can't inline them) but call/ret is fairly cheap, especially when compared to push/pop, so just keeping all the relevant stuff right in registers with no need to reference can really help us.

pragma(thin_struct)
struct RangedInt {
  int a;
  RangedInt opBinary(string op : "+")(int rhs) {
   asm {
     naked;
     add EAX, [rhs]; // or RDI on 64 bit! Don't even need to touch the stack! **
     jo throw_exception;
     ret;
   }
  }
}


Might still not be as perfect as intrinsics like bearophile is thinking of... but we'd be getting pretty close. And this kind of thing would be good for other thin wrappers too, we could magically make smart pointers too! (This can't be done now since returning a struct is done via hidden pointer argument instead of by register like a naked pointer).

** i'd kinda love it if we had an all-register calling convention on 32 bit too.... but eh oh well

What about this?: Anywhere we currently have a front() that decodes, such as your example: > @property dchar front(T)(T[] a) @safe pure if (isNarrowString!(T[])) > { > assert(a.length, "Attempting to fetch the front of an empty array > of " ~ > T.stringof); > size_t i = 0; > return decode(a, i); > } > We rip out that front() entirely. The result is *not* technically a range...yet! We could call it a protorange. Then we provide two functions: auto decode(someStringProtoRange) {...} auto raw(someStringProtoRange) {...} These convert the protoranges into actual ranges by adding the missing front() function. The 'decode' adds a front() which decodes into dchar, while the 'raw' adds a front() which simply returns the raw underlying type. I imagine the decode/raw would probably also handle any "length" property (if it exists in the protorange) accordingly. This way, the user is forced to specify "myStringRange.decode" or "myStringRange.raw" as appropriate, otherwise myStringRange can't be used since it isn't technically a range, only a protorange. (Naturally, ranges of dchar would always have front, since no decoding is ever needed for them anyway. For these ranges, the decode/raw funcs above would simply be no-ops.)

On 3/6/2014 7:59 PM, bearophile wrote: > Walter Bright: > >> I understand this all too well. (Note that we currently have a different >> silent problem: unnoticed large performance problems.) > > On the other hand your change could introduce Unicode-related bugs in future > code (that the current Phobos avoids) (and here I am not talking about code > breakage). This comes up repeatedly as justification for D trying to hide the UTF-8 nature of strings that I discussed upthread. To my mind it's like trying to pretend that floating point doesn't have roundoff issues, integers have infinite range, memory is infinite, etc. That has a place in other languages, but not in a systems/native language.

On 3/6/2014 8:01 PM, Adam D. Ruppe wrote: > BTW you know what would help this? A pragma we can attach to a struct which > makes it a very thin value type. I'd rather fix the compiler's codegen than add a pragma.

On 3/6/2014 11:11 PM, Nick Sabalausky wrote: > What about this?: > > Anywhere we currently have a front() that decodes, such as your example: > >> @property dchar front(T)(T[] a) @safe pure if (isNarrowString!(T[])) >> { >> assert(a.length, "Attempting to fetch the front of an empty array >> of " ~ >> T.stringof); >> size_t i = 0; >> return decode(a, i); >> } >> > > We rip out that front() entirely. The result is *not* technically a > range...yet! We could call it a protorange. > > Then we provide two functions: > > auto decode(someStringProtoRange) {...} > auto raw(someStringProtoRange) {...} > > These convert the protoranges into actual ranges by adding the missing > front() function. The 'decode' adds a front() which decodes into dchar, > while the 'raw' adds a front() which simply returns the raw underlying > type. > > I imagine the decode/raw would probably also handle any "length" > property (if it exists in the protorange) accordingly. > > This way, the user is forced to specify "myStringRange.decode" or > "myStringRange.raw" as appropriate, otherwise myStringRange can't be > used since it isn't technically a range, only a protorange. > > (Naturally, ranges of dchar would always have front, since no decoding > is ever needed for them anyway. For these ranges, the decode/raw funcs > above would simply be no-ops.) > Of course, I just realized that these front()s can't be added unless there's already a front to be called in the first place... So instead of ripping out the current front() functions entirely, we replace "front" with some sort of "rawFront" which the raw/decode versions of front() can query in order to provide actual decoding/non-decoding ranges.

On Thu, Mar 06, 2014 at 08:19:18PM -0800, Walter Bright wrote: > On 3/6/2014 8:01 PM, Adam D. Ruppe wrote: > >BTW you know what would help this? A pragma we can attach to a struct which makes it a very thin value type. > > I'd rather fix the compiler's codegen than add a pragma. [...] >From what I understand, structs are *supposed* to be thin value types. I would say that if a struct is under a certain size (determined by the compiler), and doesn't have complicated semantics like dtors and stuff like that, then it should be treated like a POD (passed in registers, etc). T -- Ruby is essentially Perl minus Wall.

On 3/6/2014 10:12 PM, H. S. Teoh wrote: > From what I understand, structs are *supposed* to be thin value types. I > would say that if a struct is under a certain size (determined by the > compiler), and doesn't have complicated semantics like dtors and stuff > like that, then it should be treated like a POD (passed in registers, > etc). Yes, that's right.

Forums