November 30, 2017
On 11/29/17 11:50 PM, Nicholas Wilson wrote:
> On Thursday, 30 November 2017 at 04:21:20 UTC, Steven Schveighoffer wrote:
>> On 11/29/17 10:29 PM, Walter Bright wrote:
>>> Another issue (I should check this again) was doing null checks on member function calls, which is not necessary since if they're null it'll seg fault.
>>
>> Just happened last release.
>>
>> https://dlang.org/changelog/2.077.0.html#removePreludeAssert
> 
> That was specifically for constructors and destructors (i.e. (cast(Foo)null).__ctor() was the only way to trigger that assert) not member functions (of classes), which I believe is still part of the compiler.

Then either the changelog entry is wrong, or the fix was more encompassing than the author/reviewers thought:

Stevens-MacBook-Pro:testd steves$ cat testprelude.d
struct S
{
    int foo() { return 1; }
}

void main()
{
    S s;
    auto x = s.foo;
}
Stevens-MacBook-Pro:testd steves$ dvm use 2.076.1
Stevens-MacBook-Pro:testd steves$ dmd -vcg-ast testprelude.d
Stevens-MacBook-Pro:testd steves$ cat testprelude.d.cg
import object;
struct S
{
	int foo()
	{
		assert(&this, "null this");
		return 1;
	}
}
void main()
{
	S s = 0;
	int x = s.foo();
	return 0;
}
RTInfo!(S)
{
	enum typeof(null) RTInfo = null;

}
Stevens-MacBook-Pro:testd steves$ dvm use 2.077.0
Stevens-MacBook-Pro:testd steves$ dmd -vcg-ast testprelude.d
Stevens-MacBook-Pro:testd steves$ cat testprelude.d.cg
import object;
struct S
{
	int foo()
	{
		return 1;
	}
}
void main()
{
	S s = 0;
	int x = s.foo();
	return 0;
}
RTInfo!(S)
{
	enum typeof(null) RTInfo = null;

}

-Steve
November 30, 2017
On Wed, Nov 29, 2017 at 07:29:56PM -0800, Walter Bright via Digitalmars-d wrote:
> On 11/29/2017 7:15 PM, Jonathan M Davis wrote:
> > I wouldn't have expected assertions to cost much more than however much it costs to evaluate the expression being asserted unless the assertion fails.  Now, even that can slow down a program a fair bit, depending on what's being asserted and how many assertions there are, but it's not something that I would have expected to vary particular between C and D. It doesn't surprise me that the generated code would be larger than you'd get for the same assertions in C because how assertions are handled when they fail is quite different, but I would expect the assertions themselves to cost about the same in terms of performance as long as they don't fail. What's going on that's making them so much worse?
> 
> The code *size* causes problems because it pushes the executing code out of the cache. Another issue (I should check this again) was doing null checks on member function calls, which is not necessary since if they're null it'll seg fault.

Can you elaborate?  Because in my current understanding, assert(expr) is implemented by evaluating expr, which is unavoidable, and if it fails, calls a function in druntime to handle the failure. So as far as the user's code is concerned, there shouldn't be any performance issue -- presumably, druntime's assert() implementation shouldn't even be in the cache because it's not being used (up to that point).  It's just a single function call in the user's code, which is at most just a few bytes.

What happens inside the assert() implementation seems to be irrelevant because at that point your program is going to terminate anyway. So a cache miss for the assert() implementation isn't going to be a big deal (unless your program is asserting at a high frequency, in which case you have bigger problems than performance!).

Unless you're talking about applications where the entire program must fit in cache or flash or SRAM or whatever. In that case, perhaps the solution is to have a different druntime that has a leaner implementation of assert().

Which brings us to the implementation of assert() itself. What about it makes it so big? I suspect most of the bloat comes from throwing AssertError, which pulls in the stack-unwinding code, which, if my memory is still up to date, suffers from performance issues where it tries to construct the stacktrace regardless of whether or not the catch block actually wants the stacktrace.  I vaguely remember suggesting that this should be done lazily, so that the actual construction of the stacktrace (including symbol lookups, etc.) isn't done until somebody actually asks for it. You'd still have to save the addresses of the call somewhere, since otherwise it might get overwritten by the time the stack unwinding is done, but it should be a lot cheaper than doing symbol lookups eagerly. But perhaps this issue has since been fixed?

But of course, this assumes that we even need to throw AssertError in the first place.  If this can be made optional, we can skip the stack unwinding code altogether. (But I can see that this will only work for specific applications, since you may not be able to avoid the need for the unwinding code to call dtors and stuff to free up allocated resources, etc., which, if it's necessary, means you can't avoid linking in the stack unwinding code. But it *can*, at least in theory, be something separate from the stacktrace construction code, so you can still save a bit of code there. Make the stacktrace construction code a zero-argument template, then it won't get linked into the executable unless it's actually used.)


T

-- 
Let's not fight disease by killing the patient. -- Sean 'Shaleh' Perry
November 30, 2017
On Thursday, 30 November 2017 at 16:12:04 UTC, H. S. Teoh wrote:
> Can you elaborate?

As I understand it, DWARF exception handling even on the non-exceptional case, is a bit more expensive than our old way. Not hugely expensive but for dmd where we count milliseconds it might add up.

> Which brings us to the implementation of assert() itself. What about it makes it so big? I suspect most of the bloat comes from throwing AssertError, which pulls in the stack-unwinding code, which, if my memory is still up to date, suffers from performance issues where it tries to construct the stacktrace regardless of whether or not the catch block actually wants the stacktrace.

That's false, I changed that many years ago myself, unless the DWARF change involved that too, but I don't think so.

What happens is the exception constructor walks the stack and copies the addresses to a local, static buffer. This is very fast - just walking a linked list and copying some void* into a void*[64] or whatever - and little code.

The expensive part of formatting it to a string, actually looking up the debug info, parsing out the addresses, etc., is done lazily when it is printed, which occurs only on demand or right before the program terminates.


November 30, 2017
On Thursday, November 30, 2017 10:39:07 Ola Fosheim Grøstad via Digitalmars- d wrote:
> On Thursday, 30 November 2017 at 09:01:20 UTC, Jonathan M Davis
>
> wrote:
> > It's close enough. Instead of segfaulting when the member function is called, it'll segfault when it tries to access one of the member variables or non-final member functions inside the member function. So, there isn't any more need to add null checks for final member functions than there is for non-final member functions.
>
> Err... wait. What if you have a conditional:
>
>      if(input == 0) { do something bad }
>      access field
>
> Seems like you would be better off by injecting:
>
>     assert this not null
>
> at the beginning of all final methods and remove the assertion if all paths will lead to a field access before something bad can happen.
>
> Adding checks and then only remove them if they provably have no consequence tend to be the safer approach.

All we need to do is insert null checks before calling any member function on an object that is large enough where segfaulting won't happen if the reference or pointer is null. Whether the function is virtual or not really doesn't matter, and there's no need to add checks if segfaults are going to happen on null. That would mean that for large objects that have a path in a non-virtual function that would not access any members, you'd end up with an Error being thrown or a HLT being triggered (or whatever the compiler inserted check did), whereas it would have squeaked by in a smaller object, but it's really a bug to be calling a member function on a null object anyway.

The key thing here is that we properly guaranteed @safe, and normally that doesn't require null checks thanks to the CPU doing that for you and segfaulting. It's just the overly large objects where that's not going to work, and we can add null checks then - which hopefully the compiler can optimize out in at least some cases, but even if it can't, that's just the price of having that code be @safe; most code wouldn't be affected anyway, since it only applies to particularly large objects.

- Jonathan M Davis


November 30, 2017
On Thursday, November 30, 2017 16:48:10 Adam D. Ruppe via Digitalmars-d wrote:
> On Thursday, 30 November 2017 at 16:12:04 UTC, H. S. Teoh wrote:
> > Which brings us to the implementation of assert() itself. What about it makes it so big? I suspect most of the bloat comes from throwing AssertError, which pulls in the stack-unwinding code, which, if my memory is still up to date, suffers from performance issues where it tries to construct the stacktrace regardless of whether or not the catch block actually wants the stacktrace.
>
> That's false, I changed that many years ago myself, unless the DWARF change involved that too, but I don't think so.
>
> What happens is the exception constructor walks the stack and copies the addresses to a local, static buffer. This is very fast - just walking a linked list and copying some void* into a void*[64] or whatever - and little code.
>
> The expensive part of formatting it to a string, actually looking up the debug info, parsing out the addresses, etc., is done lazily when it is printed, which occurs only on demand or right before the program terminates.

Yeah, that change was a _huge_ speedup. It had a significant impact on std.datetime's unit tests.

- Jonathan M Davis

November 30, 2017
On Thursday, November 30, 2017 08:12:04 H. S. Teoh via Digitalmars-d wrote:
> But of course, this assumes that we even need to throw AssertError in the first place.  If this can be made optional, we can skip the stack unwinding code altogether. (But I can see that this will only work for specific applications, since you may not be able to avoid the need for the unwinding code to call dtors and stuff to free up allocated resources, etc., which, if it's necessary, means you can't avoid linking in the stack unwinding code. But it *can*, at least in theory, be something separate from the stacktrace construction code, so you can still save a bit of code there. Make the stacktrace construction code a zero-argument template, then it won't get linked into the executable unless it's actually used.)

If we're not talking about unit tests, then we could pretty much just print everything out on the failed assertion and kill the program right then and there rather throwing an Error - and really for most Errors, we could do exactly that. It would be really annoying for unit tests though, since being able to do scope(error) to print out extra information on the failure can be really useful. But otherwise, printing out the stack trace and killing the application right then and there rather than throwing an Error shouldn't be expensive at all, since it's only going to happen once.

But I have a hard time believing that the cost of assertions relates to constructing an AssertError unless the compiler is inlining a bunch of stuff at the assertion site. If that's what's happening, then it would increase the code size around assertions and potentially affect performance.

- Jonathan M Davis

November 30, 2017
On Thursday, 30 November 2017 at 18:10:01 UTC, Jonathan M Davis wrote:
> whereas it would have squeaked by in a smaller object, but it's really a bug to be calling a member function on a null object anyway.

Well, it is a bug, but the member-function may have been written with an invariant in mind, so it would then go undetected on a small object and continue running with broken invariants (state outside the object).

So without such a check there would be reduced value in builds with contracts. E.g. there could be a global involved that now has a broken invariant. Maybe contracts aren't really a major feature anyway, but such gotchas should be listed in the spec at least.

> doing that for you and segfaulting. It's just the overly large objects where that's not going to work, and we can add null checks then

I think the objection is that small objects with non-virtual member-functions and a path that does not dereference the this-pointer will pass incorrectly if this is null.

Assume you  add a non-nullable pointer type.

Then you probably would want to assume that the this pointer is never null so that you don't have to test it manually before assigning it to a non-nullable pointer variable. Or you risk getting null into non-nullable pointers…

But it really depends on how strong you want the type system to be.

November 30, 2017
On Thursday, November 30, 2017 19:14:50 Ola Fosheim Grøstad via Digitalmars- d wrote:
> On Thursday, 30 November 2017 at 18:10:01 UTC, Jonathan M Davis
>
> wrote:
> > whereas it would have squeaked by in a smaller object, but it's really a bug to be calling a member function on a null object anyway.
>
> Well, it is a bug, but the member-function may have been written with an invariant in mind, so it would then go undetected on a small object and continue running with broken invariants (state outside the object).
>
> So without such a check there would be reduced value in builds with contracts. E.g. there could be a global involved that now has a broken invariant. Maybe contracts aren't really a major feature anyway, but such gotchas should be listed in the spec at least.

If there's an invariant, it's going to segfault as soon as it accesses any member variables, and actually, it wouldn't surprise me if the invariant were virtual given that I would expect a base class invariant to be run in a derived class. And if the invariant is virtual, then you'll get a segfault when it's called on a null reference even if the function itself isn't virtual. In the case of pointers to structs, the invariant definitely wouldn't be virtual, and the invariant would be executed, but it would segfault as soon as it accessed a member.

Ultimately, I think that the main concern here is ensuring that @safe code is actually @safe. As long it segfaults (or HLTs or whatever if an explicit check is added) when it tries to use a null pointer, I don't think that it really matters much whether it fails at the call point of the member function or when accessing a member inside, and in my experience, having a member function that doesn't use member variables is so rare as to be pretty much a non-issue. Sure, having a code path that shortcuts things under some set of circumstances and returns rather than accessing any members does increase how often it happens, but arguably in that case, it also didn't matter that the pointer/reference was null, since the object itself wasn't actually needed.

- Jonathan M Davis


November 30, 2017
On Thursday, 30 November 2017 at 18:18:41 UTC, Jonathan M Davis wrote:
> But I have a hard time believing that the cost of assertions relates to constructing an AssertError unless the compiler is inlining a bunch of stuff at the assertion site. If that's what's happening, then it would increase the code size around assertions and potentially affect performance.
>
> - Jonathan M Davis

Indeed, if DMD is not marking the conditional call to _d_assert (or whatever it is) 'cold' and the call itself `pragma(inline, false)` then it needs to be changed to do so.
November 30, 2017
On Thu, Nov 30, 2017 at 11:12:58AM -0700, Jonathan M Davis via Digitalmars-d wrote:
> On Thursday, November 30, 2017 16:48:10 Adam D. Ruppe via Digitalmars-d wrote:
> > On Thursday, 30 November 2017 at 16:12:04 UTC, H. S. Teoh wrote:
> > > Which brings us to the implementation of assert() itself. What about it makes it so big? I suspect most of the bloat comes from throwing AssertError, which pulls in the stack-unwinding code, which, if my memory is still up to date, suffers from performance issues where it tries to construct the stacktrace regardless of whether or not the catch block actually wants the stacktrace.
> >
> > That's false, I changed that many years ago myself, unless the DWARF change involved that too, but I don't think so.
> >
> > What happens is the exception constructor walks the stack and copies the addresses to a local, static buffer. This is very fast - just walking a linked list and copying some void* into a void*[64] or whatever - and little code.
> >
> > The expensive part of formatting it to a string, actually looking up the debug info, parsing out the addresses, etc., is done lazily when it is printed, which occurs only on demand or right before the program terminates.
> 
> Yeah, that change was a _huge_ speedup. It had a significant impact on std.datetime's unit tests.
[...]

Ah yes, now I vaguely remember that somebody, I guess it was you Adam, fixed that stacktrace thing.

Still, I think Walter's complaint wasn't the *performance* of assert() per se, since the cost of evaluating the expression should be pretty small, and the only real bottleneck is the stack unwinding, and I doubt anybody actually cares about the *performance* of that. But the complaint was about the code bloat of linking druntime into the executable.  Before Adam's fix, assert() would create the entire the stacktrace upon constructing the AssertError, which means it has to pull in a whole bunch of code for decoding stack addresses and cross-referencing them with symbols, etc., but all that code would be useless if the user didn't care about the stacktrace to begin with.

With Adam's fix, there's now the possibility of templatizing the stacktrace code so that the code won't even be compiled into the executable until you actually use it.

I just took a quick glance at druntime, and it seems that the stacktrace symbol lookup code currently is referenced by module static ctor, so it will be included by default whether or not you use it.  I haven't looked further but perhaps it's possible to refactor some of the code around this to make it lazy / lazier, so that the code isn't actually instantiated until your program actually calls it.

In general, I think druntime / Phobos should adopt the policy of zero cost until first use / first reference, where possible.  Things like the GC are probably too tightly integrated for this to be possible, but I think there's still plenty of room for improvement with other parts of druntime and Phobos.


T

-- 
Three out of two people have difficulties with fractions. -- Dirk Eddelbuettel