November 01, 2014
On 11/1/2014 10:04 AM, David Nadlinger wrote:
> Agreed. Its API also has severe safety problems.

Way overblown. And it's nothing that druntime developers cannot handle easily. druntime is full of system programming.

November 01, 2014
On 11/1/2014 6:35 AM, Manu via Digitalmars-d wrote:
> I'd say that I'd be nervous to see druntime chockers full of templates...?

What's a chocker?

Why would templates make you nervous? They're not C++ templates!

November 02, 2014
On Saturday, 1 November 2014 at 17:50:33 UTC, Walter Bright wrote:
>> It is not the same thing as ref/out buffer argument.
>
> Don't understand your comment.

Steven comment has mentioned two things about Tango approach - using stack buffer as initial buffer and extensive usage of ref parameters for such arguments. std.internal.scopebuffer on its own only addresses the former.

>> We have been running
>> ping-pong comments about it for a several times now. All
>> std.internal.scopebuffer does is reducing heap allocation count at cost of stack
>> consumption (and switching to raw malloc for heap) - it does not change big-O
>> estimate of heap allocations unless it is used as a buffer argument - at which
>> point it is no better than plain array.
>
> 1. stack allocation is not much of an issue here because these routines in druntime are not recursive, and there's plenty of stack available for what druntime is using toString for.

Fibers.

Ironically one more problematic parts of Tango for us is the Layout one which uses somewhat big stack buffer for formatting the arguments into string. It is a very common reason for fiber stack overflows (and thus segfaults) unless default size is changed.

With 64-bit systems and lazy stack page allocation it is not a critical problem anymore but it is still important problem because any growth of effective fiber stack prevents one from using it as truly cheap context abstraction.

Persistent thread-local heap buffer that gets reused by many bound fibers can be much better in that regard.

> 2. "All it does" is an inadequate summary. Most of the time it completely eliminates the need for allocations. This is a BIG deal, and the source of much of the speed of Warp. The cases where it does fall back to heap allocation do not leave GC memory around.

We have a very different applications in mind. Warp is "normal" single user application and I am speaking about servers. Those have very different performance profiles and requirements, successful experience in one domain is simply irrelevant to other.

> 3. There is big-O in worst case, which is very unlikely, and big-O in 99% of the cases, which for scopebuffer is O(1).

And with servers you need to always consider worst case as default or meet the DoS attack. I am not even sure where 99% comes from because input length is usually defined by application domain and stack usage of scopebuffer is set inside library code.

> 4. You're discounting the API of scopebuffer which is usable as an output range, fitting it right in with Phobos' pipeline design.

Any array is output range :)

November 03, 2014
On 11/2/2014 3:57 PM, Dicebot wrote:
> On Saturday, 1 November 2014 at 17:50:33 UTC, Walter Bright wrote:
>>> It is not the same thing as ref/out buffer argument.
>>
>> Don't understand your comment.
>
> Steven comment has mentioned two things about Tango approach - using stack
> buffer as initial buffer and extensive usage of ref parameters for such
> arguments. std.internal.scopebuffer on its own only addresses the former.

I still have no idea how this would apply here.


>>> We have been running
>>> ping-pong comments about it for a several times now. All
>>> std.internal.scopebuffer does is reducing heap allocation count at cost of stack
>>> consumption (and switching to raw malloc for heap) - it does not change big-O
>>> estimate of heap allocations unless it is used as a buffer argument - at which
>>> point it is no better than plain array.
>>
>> 1. stack allocation is not much of an issue here because these routines in
>> druntime are not recursive, and there's plenty of stack available for what
>> druntime is using toString for.
>
> Fibers.
>
> Ironically one more problematic parts of Tango for us is the Layout one which
> uses somewhat big stack buffer for formatting the arguments into string. It is a
> very common reason for fiber stack overflows (and thus segfaults) unless default
> size is changed.
>
> With 64-bit systems and lazy stack page allocation it is not a critical problem
> anymore but it is still important problem because any growth of effective fiber
> stack prevents one from using it as truly cheap context abstraction.
>
> Persistent thread-local heap buffer that gets reused by many bound fibers can be
> much better in that regard.

There is no problem with having the max stack allocation for scopebuffer use set smaller for 32 bit code than 64 bit code.


>> 2. "All it does" is an inadequate summary. Most of the time it completely
>> eliminates the need for allocations. This is a BIG deal, and the source of
>> much of the speed of Warp. The cases where it does fall back to heap
>> allocation do not leave GC memory around.
> We have a very different applications in mind. Warp is "normal" single user
> application and I am speaking about servers. Those have very different
> performance profiles and requirements, successful experience in one domain is
> simply irrelevant to other.

What part of druntime would be special case for servers?


>> 3. There is big-O in worst case, which is very unlikely, and big-O in 99% of
>> the cases, which for scopebuffer is O(1).
> And with servers you need to always consider worst case as default or meet the
> DoS attack.

Minimizing allocations is about dealing with the most common cases.


> I am not even sure where 99% comes from because input length is
> usually defined by application domain and stack usage of scopebuffer is set
> inside library code.

It's not that hard.


>> 4. You're discounting the API of scopebuffer which is usable as an output
>> range, fitting it right in with Phobos' pipeline design.
>
> Any array is output range :)

The point is what to do when the array gets full.
November 03, 2014
On 2 November 2014 04:15, Walter Bright via Digitalmars-d <digitalmars-d@puremagic.com> wrote:
> On 11/1/2014 6:35 AM, Manu via Digitalmars-d wrote:
>>
>> I'd say that I'd be nervous to see druntime chockers full of templates...?
>
>
> What's a chocker?

It's Australian for 'lots'.


> Why would templates make you nervous? They're not C++ templates!

What do you mean? How are D templates any different than C++ templates in a practical sense?

I want a binary lib to be a binary lib. I don't think it's good form
for the lowest level library in the language ecosystem to depend on
templates (ie, client-side code generation).
This is the fundamental lib that will be present in every D
application there is. If it is not a binary lib, then it can't be
updated.

Consider performance improvements are made to druntime, which every
application should enjoy. If the code is templates, then the old
version at time of compiling is embedded into existing client
software, the update will have no effect unless the client software is
rebuilt.
More important, what about security fixes in druntime... imagine a
critical security problem in druntime (I bet there's lots!); if we
can't update druntime, then *every* D application is an exploit. Very
shaky foundation for an ecosystem...

druntime is a fundamental ecosystem library. It should be properly semantically version-ed, and particularly for security reasons, I think this should be taken very very seriously.

This argument could equally be applicable to phobos, and I've always been nervous about it too for the same reasons... but I'll draw a line there, in that phobos is not critical for an application to build and link, and so much of the API is already templates, it would be impossible to change that now.
November 03, 2014
On 11/3/14 8:45 AM, Manu via Digitalmars-d wrote:
> On 2 November 2014 04:15, Walter Bright via Digitalmars-d
> <digitalmars-d@puremagic.com> wrote:
>> Why would templates make you nervous? They're not C++ templates!
>
> What do you mean? How are D templates any different than C++ templates
> in a practical sense?

Probably about three times easier to read and write.

> I want a binary lib to be a binary lib. I don't think it's good form
> for the lowest level library in the language ecosystem to depend on
> templates (ie, client-side code generation).
> This is the fundamental lib that will be present in every D
> application there is. If it is not a binary lib, then it can't be
> updated.
>
> Consider performance improvements are made to druntime, which every
> application should enjoy. If the code is templates, then the old
> version at time of compiling is embedded into existing client
> software, the update will have no effect unless the client software is
> rebuilt.
> More important, what about security fixes in druntime... imagine a
> critical security problem in druntime (I bet there's lots!); if we
> can't update druntime, then *every* D application is an exploit. Very
> shaky foundation for an ecosystem...

The same argument goes for all statically linked libraries.

> druntime is a fundamental ecosystem library. It should be properly
> semantically version-ed, and particularly for security reasons, I
> think this should be taken very very seriously.
>
> This argument could equally be applicable to phobos, and I've always
> been nervous about it too for the same reasons... but I'll draw a line
> there, in that phobos is not critical for an application to build and
> link, and so much of the API is already templates, it would be
> impossible to change that now.

Within reason, most of the runtime and standard library ought to be generic so as to adapt best to application needs. Generics are a very powerful mechanism for libraries.


Andrei

November 03, 2014
On 11/2/2014 11:45 PM, Manu via Digitalmars-d wrote:
> On 2 November 2014 04:15, Walter Bright via Digitalmars-d
> <digitalmars-d@puremagic.com> wrote:
>> Why would templates make you nervous? They're not C++ templates!
> What do you mean? How are D templates any different than C++ templates
> in a practical sense?

They're much more straightforward to use, syntactically and semantically.


> I want a binary lib to be a binary lib. I don't think it's good form
> for the lowest level library in the language ecosystem to depend on
> templates (ie, client-side code generation).

What's the problem with that?


> This is the fundamental lib that will be present in every D
> application there is. If it is not a binary lib, then it can't be
> updated.

Have you ever looked at the C openssl.lib? The .h files with it are loaded with metaprogramming done with C macros. Yet I've never heard anyone complain about it. C .h files for DLLs are typically stuffed with C macros.


> Consider performance improvements are made to druntime, which every
> application should enjoy. If the code is templates, then the old
> version at time of compiling is embedded into existing client
> software, the update will have no effect unless the client software is
> rebuilt.
> More important, what about security fixes in druntime... imagine a
> critical security problem in druntime (I bet there's lots!); if we
> can't update druntime, then *every* D application is an exploit. Very
> shaky foundation for an ecosystem...

The defense presents openssl as Exhibit A!

(The templates really only present the interface to the dll, not the guts of it.)


> druntime is a fundamental ecosystem library. It should be properly
> semantically version-ed, and particularly for security reasons, I
> think this should be taken very very seriously.

openssl!!!

BTW, you should know that if a template is instantiated by the library itself, the compiler won't re-instantiate it and insert that in the calling code. It'll just call the instantiated binary.

November 03, 2014
On 10/31/14 4:50 PM, Jonathan Marler wrote:

> I wrote a Windows CE app to run on our printers here at HP to test what
> the Microsoft ARM compiler does with virtual function calls.  I had to
> do an operation with a global volatile variable to prevent the compiler
> from inlining the non-virtual function call but I finally got it to work.
>
> Calling the function 100 million times yielded the following times:
>
> Windows Compiler on ARM (Release)
> -------------------------------------------
> NonVirtual: 0.537000 seconds
> Virtual   : 1.281000 seconds
>
> Windows Compiler on x86 (Release)
> -------------------------------------------
> NonVirtual: 0.226000 seconds
> Virtual   : 0.226000 seconds
>
> Windows Compiler on x86 (Debug)
> -------------------------------------------
> NonVirtual: 2.940000 seconds
> Virtual   : 3.204000 seconds
>
>
> Here's the link to the code:
>
> http://marler.info/virtualtest.c
>
>

Thanks, this is helpful.

-Steve
November 03, 2014
On 11/1/14 9:30 AM, Manu via Digitalmars-d wrote:
> On 31 October 2014 01:30, Steven Schveighoffer via Digitalmars-d

>>
>> Sorry, I meant future *D supported* platforms, not future not-yet-existing
>> platforms.
>
> I'm not sure what you mean. I've used D on current and existing games
> consoles. I personally think it's one of D's most promising markets...
> if not for just a couple of remaining details.

I don't think D officially supports these platforms. I could be wrong.

> Also, my suggestion will certainly perform better on all platforms.
> There is no platform that can benefit from the existing proposal of an
> indirect function call per write vs something that doesn't.

Performance isn't the only consideration. In your case, it has a higher priority than ease of implementation, flexibility, or usability. But that's not the case everywhere.

Consider the flip-side: on x86, your mechanism may be a hair faster than just having a delegate. Is it worth all the extra trouble for those folks to have to save some state or deal with reallocating buffers in their toString functions?

>> Before we start ripping apart our existing APIs, can we show that the
>> performance is really going to be so bad? I know virtual calls have a bad
>> reputation, but I hate to make these choices absent real data.
>
> My career for a decade always seems to find it's way back to fighting
> virtual calls. (in proprietary codebases so I can't easily present
> case studies)
> But it's too late now I guess. I should have gotten in when someone
> came up with the idea... I thought it was new.

At the moment, you are stuck with most toString calls allocating on the GC every time they are called. I think the virtual call thing should be a pleasant improvement :)

But in all seriousness, I am not opposed to an alternative API, but the delegate one seems to find the right balance of flexibility and ease of implementation.

I think we can use any number of toString APIs, and in fact, we should be able to build on top of the delegate version a mechanism to reduce (but not eliminate obviously) virtual calls.

>> For instance, D's underlying i/o system uses FILE *, which is about as
>> virtual as you can get. So are you avoiding a virtual call to use a buffer
>> to then pass to a virtual call later?
>
> I do a lot of string processing, but it never finds it's way to a
> FILE*. I don't write console based software.

Just an example. Point taken.

>> A reentrant function has to track the state of what has been output, which
>> is horrific in my opinion.
>
> How so? It doesn't seem that bad to me. We're talking about druntime
> here, the single most used library in the whole ecosystem... that shit
> should be tuned to the max. It doesn't matter how pretty the code is.

Keep in mind that any API addition is something that all users have to deal with. If we are talking about a specialized, tuned API that druntime and phobos can use, I don't think it would be impossible to include this.

But to say we only support horrible allocate-every-toString-call mechanism, and please-keep-your-own-state-machine mechanism is not good. The main benefit of the delegate approach is that it's easy to understand, easy to use, and reasonably efficient. It's a good middle ground. It's also easy to implement a sink. Both sides are easy, it makes the whole thing more approachable.

>> The largest problem I see is, you may not know before you start generating
>> strings whether it will fit in the buffer, and therefore, you may still end
>> up eventually calling the sink.
>
> Right. The api should be structured to make a virtual call _only_ in
> the rare instance the buffer overflows. That is my suggestion.
> You can be certain to supply a buffer that will not overflow in many/most cases.

I, and I'm sure most of the developers, are open to new ideas to make something like this as painless as possible. I still think we should keep the delegate mechanism.

>> Note, you can always allocate a stack buffer, use an inner function as a
>> delegate, and get the inliner to remove the indirect calls. Or use an
>> alternative private mechanism to build the data.
>
> We're talking about druntime specifically. It is a binary lib. The
> inliner won't save you.

Let's define the situation here -- there is a boundary in druntime in across which no inlining can occur. Before the boundary or after the boundary, inlining is fair game.

So for instance, if a druntime object has 3 members it needs to toString in order to satisfy it's own toString, those members will probably all be druntime objects as well. In which case it can optimize those sub-calls.

And let's also not forget that druntime has template objects in it as well, which are ripe for inlining.

This is what I meant.

>> Would you say that *one* delegate call per object output is OK?
>
> I would say that an uncontrollable virtual call is NEVER okay,
> especially in otherwise trivial and such core functions like toString
> in druntime. But one is certainly better than many.

I'm trying to get a feel for how painful this has to be :) If we can have one virtual call, it means you can build a mechanism that works with delegates + a more efficient one, by just sinking the result of the efficient one. This means you can work with the existing APIs right now.

>> This is a typical mechanism that Tango used -- pass in a ref to a dynamic
>> array referencing a stack buffer. If it needed to grow, just update the
>> length, and it moves to the heap. In most cases, the stack buffer is enough.
>> But the idea is to try and minimize the GC allocations, which are
>> performance killers on the current platforms.
>
> I wouldn't hard-code to overflow to the GC heap specifically. It
> should be an API that the user may overflow to wherever they like.

Just keep in mind the clients of this API are on 3 sides:

1. Those who implement the toString call.
2. Those who implement a place for those toString calls to go.
3. Those who wish to put the 2 together.

We want to reduce the burden as much as possible on all of them. We also don't want to require implementing ALL these different toString mechanisms -- I should be able to implement one of them, and all can use it.

>
>> I think adding the option of using a delegate is not limiting -- you can
>> always, on a platform that needs it, implement a alternative protocol that
>> is internal to druntime. We are not preventing such protocols by adding the
>> delegate version.
>
> You're saying that some platform may need to implement a further
> completely different API? Then no existing code will compile for that
> platform. This is madness. We already have more than enough API's.

You just said you don't use FILE *. Why do we have to ensure all pieces of Phobos implement everything you desire when you aren't going to use it? I don't think it's madness to *provide* a mechanism for more efficient (on some platforms) code, and then ask those who are interested to use that mechanism, while not forcing it on all those who aren't. You can always submit a pull request to add it where you need it! But having an agreed upon API is an important first step. So let's get that done.

>> But on our currently supported platforms, the delegate vs. GC call is soo
>> much better. I can't see any reason to avoid the latter.
>
> The latter? (the GC?) .. Sorry, I'm confused.
>

My bad, I meant the former. No wonder you were confused ;)

-Steve
November 03, 2014
On 10/31/14 3:04 PM, Walter Bright wrote:
> On 10/27/2014 12:42 AM, Benjamin Thaut wrote:
>> I'm planning on doing a pull request for druntime which rewrites every
>> toString
>> function within druntime to use the new sink signature. That way
>> druntime would
>> cause a lot less allocations which end up beeing garbage right away.
>> Are there
>> any objections against doing so? Any reasons why such a pull request
>> would not
>> get accepted?
>
> Why a sink version instead of an Output Range?
>

A sink is an output range. Supporting all output ranges isn't necessary.

-Steve