toString refactor in druntime (page 8) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » toString refactor in druntime (page 8)

November 04, 2014

Re: toString refactor in druntime

Posted by Walter Bright
in reply to Steven Schveighoffer

Walter Bright

Posted in reply to Steven Schveighoffer

On 11/3/2014 2:33 PM, Steven Schveighoffer wrote:
> On 11/3/14 4:40 PM, Walter Bright wrote:
>> On 11/3/2014 8:09 AM, Steven Schveighoffer wrote:
>>> It is a huge difference to say EVERYONE who implements toString will
>>> take any
>>> templated type that purports to be an output range, vs giving one case
>>> to handle.
>>
>> All an output range is is a type with a 'put' method. That's it. You're
>> making it out to be far more complex than it is.
>>
>
> Directly from the docs: (http://dlang.org/phobos/std_range.html#isOutputRange)
>
> void myprint(in char[] s) { }
> static assert(isOutputRange!(typeof(&myprint), char));
>
> No 'put' in sight, except as a substring of isOutputRange.
>
> I don't think you realize what a beast supporting all output ranges is, or using
> them (hint: calling r.put for a generic output range is an ERROR).

The documentation says, more specifically, that the requirement is that it support put(r,e). The array operands are output ranges NOT because the output ranges need to know about arrays, but because arrays themselves have a put() operation (as defined in std.array).

All the algorithm (such as a toString()) needs to do is call put(r,e). It doesn't need to do anything else. It isn't any more complicated than the sink interface.

November 04, 2014

Re: toString refactor in druntime

Posted by Steven Schveighoffer
in reply to Walter Bright

Steven Schveighoffer

Posted in reply to Walter Bright

On 11/3/14 8:09 PM, Walter Bright wrote:
> On 11/3/2014 2:28 PM, Steven Schveighoffer wrote:
>> I had a very nasty experience with using a template-based API. I vowed
>> to avoid
>> it wherever possible.
>>
>> The culprit was std::string -- it changed something internally from
>> one version
>> of libc++ to the next on Linux. So I had to recompile everything, but
>> the whole
>> system I was using was with .so objects.
>>
>> templates do NOT make good API types IMO.
>
> It seems this is blaming templates for a different problem.
>
> If I have:
>
>    struct S { int x; };
>
> in my C .h file, and I change it to:
>
>    struct S { int x,y; };
>
> Then all my API functions that take S as a value argument will require
> recompilation of any code that uses it.
>
> Would you conclude that C sux for making APIs? Of course not. You'd say
> that a stable API should use reference types, not value types.

a string is a reference type, the data is on the heap.

But that is not the issue. The issue is that it's IMPOSSIBLE for me to ensure std::string remains stable, because it's necessarily completely exposed. There is no encapsulation.

Even if I used a pointer to a std::string, the implementation change is going to cause issues.

On the contrary, having C-strings as a parameter type has never broken for me. In later projects, I have added simple "immutable string" type modeled after D, which works so much better :)

-Steve

November 04, 2014

Re: toString refactor in druntime

Posted by Steven Schveighoffer
in reply to Walter Bright

Steven Schveighoffer

Posted in reply to Walter Bright

On 11/3/14 8:16 PM, Walter Bright wrote:
> On 11/3/2014 2:33 PM, Steven Schveighoffer wrote:
>> On 11/3/14 4:40 PM, Walter Bright wrote:
>>> On 11/3/2014 8:09 AM, Steven Schveighoffer wrote:
>>>> It is a huge difference to say EVERYONE who implements toString will
>>>> take any
>>>> templated type that purports to be an output range, vs giving one case
>>>> to handle.
>>>
>>> All an output range is is a type with a 'put' method. That's it. You're
>>> making it out to be far more complex than it is.
>>>
>>
>> Directly from the docs:
>> (http://dlang.org/phobos/std_range.html#isOutputRange)
>>
>> void myprint(in char[] s) { }
>> static assert(isOutputRange!(typeof(&myprint), char));
>>
>> No 'put' in sight, except as a substring of isOutputRange.
>>
>> I don't think you realize what a beast supporting all output ranges
>> is, or using
>> them (hint: calling r.put for a generic output range is an ERROR).
>
> The documentation says, more specifically, that the requirement is that
> it support put(r,e). The array operands are output ranges NOT because
> the output ranges need to know about arrays, but because arrays
> themselves have a put() operation (as defined in std.array).

You can't import std.array from druntime.

>
> All the algorithm (such as a toString()) needs to do is call put(r,e).
> It doesn't need to do anything else. It isn't any more complicated than
> the sink interface.

Again, std.range.put isn't defined in druntime.

And neither is isOutputRange. Are you planning on moving these things to druntime?

-Steve

November 04, 2014

Re: toString refactor in druntime

Posted by Walter Bright
in reply to Steven Schveighoffer

Walter Bright

Posted in reply to Steven Schveighoffer

On 11/3/2014 5:47 PM, Steven Schveighoffer wrote:
> Again, std.range.put isn't defined in druntime.

True.

> And neither is isOutputRange.

Wouldn't really be needed.

> Are you planning on moving these things to druntime?

This illustrates a long running issue about what goes in druntime and what in phobos.

It's not much of an argument for having a different API that does the same thing, only incompatible.

November 04, 2014

Re: toString refactor in druntime

Posted by H. S. Teoh
in reply to Walter Bright

H. S. Teoh

Posted in reply to Walter Bright

On Mon, Nov 03, 2014 at 05:56:56PM -0800, Walter Bright via Digitalmars-d wrote:
> On 11/3/2014 5:47 PM, Steven Schveighoffer wrote:
> >Again, std.range.put isn't defined in druntime.
> 
> True.
> 
> >And neither is isOutputRange.
> 
> Wouldn't really be needed.
> 
> >Are you planning on moving these things to druntime?
> 
> This illustrates a long running issue about what goes in druntime and what in phobos.
[...]

Another prime example is std.typecons.Tuple, which blocked the implementation of byPair in AA's.


T

-- 
Political correctness: socially-sanctioned hypocrisy.

November 04, 2014

Re: toString refactor in druntime

Posted by Steven Schveighoffer
in reply to Jonathan Marler

Steven Schveighoffer

Posted in reply to Jonathan Marler

On 11/3/14 6:05 PM, Jonathan Marler wrote:
> On Monday, 3 November 2014 at 22:33:25 UTC, Steven Schveighoffer wrote:
>> On 11/3/14 4:40 PM, Walter Bright wrote:
>>> On 11/3/2014 8:09 AM, Steven Schveighoffer wrote:
>>>> It is a huge difference to say EVERYONE who implements toString will
>>>> take any
>>>> templated type that purports to be an output range, vs giving one case
>>>> to handle.
>>>
>>> All an output range is is a type with a 'put' method. That's it. You're
>>> making it out to be far more complex than it is.
>>>
>>
>> Directly from the docs:
>> (http://dlang.org/phobos/std_range.html#isOutputRange)
>>
>> void myprint(in char[] s) { }
>> static assert(isOutputRange!(typeof(&myprint), char));
>>
>> No 'put' in sight, except as a substring of isOutputRange.
>>
>> I don't think you realize what a beast supporting all output ranges
>> is, or using them (hint: calling r.put for a generic output range is
>> an ERROR).
>>
>> -Steve
>
> In many cases templates are good because they provide the a way for the
> programmer to use a library optimized for their particular application.
> This is the case for the toString function.  An argument can be made
> that using templates is dangerous because if they are used incorrectly,
> the number of template instantiates can blow up.  But this can always be
> solved by the programmer by changing all their template calls to use the
> same template parameters.  This allows the template solution to
> simultaneously support a sink that represents a real function, or a
> delegate, or whatever the application needs.

If we make toString a template, we precludes it as a virtual function, and we force the object to expose its inner workings.

I think the template solution has advantages, one being the possibility for optimization. But I don't think the gains are significant enough. It's also more complex than necessary.

> I understand that people like having a binary library that instantiates
> it's own functions that have a static interface and I think there's
> value to that.  But most of the value is in dynamic libraries that the
> compiler cannot optimize.  When the compiler can optimize, let it:)
>
> I updated my test code to use a templated sink, here the link:
>
> http://marler.info/dtostring.d
>
>
>     Method 1: ReturnString
>               string toString();
>     Method 2: SinkDelegate
>               void toString(void delegate(const(char)[]) sink);
>     Method 3: SinkTemplate
>               void toString(T)(T sink) if(isOutputRange!(T,const(char)[]));
>     Method 4: SinkDelegateWithStaticHelperBuffer
>               struct SinkStatic { char[64] buffer; void
> delegate(const(char)[]) sink; }
>           void toString(ref SinkStatic sink);
>     Method 5: SinkDelegateWithDynamicHelperBuffer
>               struct SinkDynamic { char[] buffer; void
> delegate(const(char)[]) sink; }
>           void toString(ref SinkDynamic sink);
>           void toString(SinkDynamic sink);
>
>
> (DMD Compiler on x86) "dmd dtostring.d"
> RuntimeString run 1 (loopcount 10000000)
>    Method 1     : 76 ms
>    Method 2     : 153 ms
>    Method 3     : 146 ms
>    Method 4     : 157 ms
>    Method 5ref  : 165 ms
>    Method 5noref: 172 ms
> StringWithPrefix run 1 (loopcount 1000000)
>    Method 1     : 149 ms
>    Method 2     : 22 ms
>    Method 3     : 21 ms
>    Method 4     : 80 ms
>    Method 5ref  : 81 ms
>    Method 5noref: 82 ms
> ArrayOfStrings run 1 (loopcount 1000000)
>    Method 1     : 1 sec
>    Method 2     : 81 ms
>    Method 3     : 77 ms
>    Method 4     : 233 ms
>    Method 5ref  : 232 ms
>    Method 5noref: 223 ms
>
>
> (DMD Compiler on x86 with Optimization) "dmd -O dtostring.d"
> RuntimeString run 1 (loopcount 10000000)
>    Method 1     : 30 ms
>    Method 2     : 65 ms
>    Method 3     : 55 ms
>    Method 4     : 68 ms
>    Method 5ref  : 68 ms
>    Method 5noref: 67 ms
> StringWithPrefix run 1 (loopcount 1000000)
>    Method 1     : 158 ms
>    Method 2     : 9 ms
>    Method 3     : 8 ms
>    Method 4     : 63 ms
>    Method 5ref  : 64 ms
>    Method 5noref: 66 ms
> ArrayOfStrings run 1 (loopcount 1000000)
>    Method 1     : 1 sec, 292 ms
>    Method 2     : 35 ms
>    Method 3     : 34 ms
>    Method 4     : 193 ms
>    Method 5ref  : 198 ms
>    Method 5noref: 200 ms
>
> The results aren't suprising.  The template out performs the delegate
> sink.  In a very big project one might try to limit the number of
> instantiations of toString by using a specific toString instance that
> accepts some type common OutputRange wrapper which would make the
> template version perform the same as the sink delegate version, but for
> projects that don't need to worry about that, you will get better
> performance from more compiler optimization.

I think the performance gains are minimal. The only one that is significant is StringWithPrefix, which has a 11% gain. But that's still only 1ms, and 1ms on a PC can be attributed to external forces. I would increase the loop count on that one.

Note, if you really want to see gains, use -inline.

-Steve

November 04, 2014

Re: toString refactor in druntime

Posted by Jonathan Marler
in reply to Steven Schveighoffer

Jonathan Marler

Posted in reply to Steven Schveighoffer

On Tuesday, 4 November 2014 at 02:49:55 UTC, Steven Schveighoffer wrote:
> On 11/3/14 6:05 PM, Jonathan Marler wrote:
>> In many cases templates are good because they provide the a way for the
>> programmer to use a library optimized for their particular application.
>> This is the case for the toString function.  An argument can be made
>> that using templates is dangerous because if they are used incorrectly,
>> the number of template instantiates can blow up.  But this can always be
>> solved by the programmer by changing all their template calls to use the
>> same template parameters.  This allows the template solution to
>> simultaneously support a sink that represents a real function, or a
>> delegate, or whatever the application needs.
>
> If we make toString a template, we precludes it as a virtual function, and we force the object to expose its inner workings.
>
> I think the template solution has advantages, one being the possibility for optimization. But I don't think the gains are significant enough. It's also more complex than necessary.
>

I was thinking you could have the best of both worlds with templates.  For example, you could define the toString template like this:

  void toStringTemplate(T)(T sink) if(isOutputRange!(T,const(char)[]))

Then you could declare an alias like this:

  alias toString = toStringTemplate!(void delegate(const(char)[]));

Which (correct me if I'm wrong) I believe is equivalent to the original sink delegate function.  This allows programmers to write the logic for toString once and allow a developer using the library to choose whether they want to use the delegate version or the generic output range version.

This gives the user of the library the ability to choose the best version for their own application.

Note: I added this "alias" method to my dtostring.d test code and it wasn't as fast as the delegate version.  I'm not sure why as I thought the generated code would be identical.  If anyone has any insight as to why this happened let me know.

code is at http://marler.info/dtostring.d

November 04, 2014

Re: toString refactor in druntime

Posted by John Colvin
in reply to Jonathan Marler

John Colvin

Posted in reply to Jonathan Marler

On Tuesday, 4 November 2014 at 04:34:09 UTC, Jonathan Marler wrote:
> On Tuesday, 4 November 2014 at 02:49:55 UTC, Steven Schveighoffer wrote:
>> On 11/3/14 6:05 PM, Jonathan Marler wrote:
>>> In many cases templates are good because they provide the a way for the
>>> programmer to use a library optimized for their particular application.
>>> This is the case for the toString function.  An argument can be made
>>> that using templates is dangerous because if they are used incorrectly,
>>> the number of template instantiates can blow up.  But this can always be
>>> solved by the programmer by changing all their template calls to use the
>>> same template parameters.  This allows the template solution to
>>> simultaneously support a sink that represents a real function, or a
>>> delegate, or whatever the application needs.
>>
>> If we make toString a template, we precludes it as a virtual function, and we force the object to expose its inner workings.
>>
>> I think the template solution has advantages, one being the possibility for optimization. But I don't think the gains are significant enough. It's also more complex than necessary.
>>
>
> I was thinking you could have the best of both worlds with templates.  For example, you could define the toString template like this:
>
>   void toStringTemplate(T)(T sink) if(isOutputRange!(T,const(char)[]))
>
> Then you could declare an alias like this:
>
>   alias toString = toStringTemplate!(void delegate(const(char)[]));
>
> Which (correct me if I'm wrong) I believe is equivalent to the original sink delegate function.  This allows programmers to write the logic for toString once and allow a developer using the library to choose whether they want to use the delegate version or the generic output range version.
>
> This gives the user of the library the ability to choose the best version for their own application.
>
> Note: I added this "alias" method to my dtostring.d test code and it wasn't as fast as the delegate version.  I'm not sure why as I thought the generated code would be identical.  If anyone has any insight as to why this happened let me know.
>
> code is at http://marler.info/dtostring.d

I'm sure it's been mentioned before, but you should try ldc/gdc as they have much more capable optimisers.

November 08, 2014

Re: toString refactor in druntime

Posted by Manu
in reply to Walter Bright

Manu

Posted in reply to Walter Bright

On 3 November 2014 19:55, Walter Bright via Digitalmars-d <digitalmars-d@puremagic.com> wrote:
> On 11/2/2014 11:45 PM, Manu via Digitalmars-d wrote:
>>
>> On 2 November 2014 04:15, Walter Bright via Digitalmars-d <digitalmars-d@puremagic.com> wrote:
>>>
>>> Why would templates make you nervous? They're not C++ templates!
>>
>> What do you mean? How are D templates any different than C++ templates in a practical sense?
>
>
> They're much more straightforward to use, syntactically and semantically.

This isn't anything to do with what I'm talking about. I'm not nervous because I don't like the D template syntax, it's because I don't feel it's a good idea for druntime (specifically) to have non-static interfaces which may expose, or create dependencies on druntime internals.

>> I want a binary lib to be a binary lib. I don't think it's good form for the lowest level library in the language ecosystem to depend on templates (ie, client-side code generation).
>
>
> What's the problem with that?

Templates which operate on library internals within client code create more dependencies on the library. It obscures the clarity of the API.

>> This is the fundamental lib that will be present in every D application there is. If it is not a binary lib, then it can't be updated.
>
>
> Have you ever looked at the C openssl.lib? The .h files with it are loaded with metaprogramming done with C macros. Yet I've never heard anyone complain about it. C .h files for DLLs are typically stuffed with C macros.

I'm not familiar with openssl, but regardless, I wouldn't consider
openssl to lie at the same level in the ecosystem as druntime.
The only library I would use as comparison is the CRT, which is firmly
self-contained, with a well defined API.

I understand your point in principle; if it is effectively 'helper' code which may be embedded in the client, *but still operates exclusively via the API*, then I have no issue. It doesn't appear to me that this is the case here...?

>> Consider performance improvements are made to druntime, which every
>> application should enjoy. If the code is templates, then the old
>> version at time of compiling is embedded into existing client
>> software, the update will have no effect unless the client software is
>> rebuilt.
>> More important, what about security fixes in druntime... imagine a
>> critical security problem in druntime (I bet there's lots!); if we
>> can't update druntime, then *every* D application is an exploit. Very
>> shaky foundation for an ecosystem...
>
>
> The defense presents openssl as Exhibit A!
>
> (The templates really only present the interface to the dll, not the guts of
> it.)

That's fine. But how is that the case here? Performing a string conversion implies direct access to the data being converted. If the function is a template, then my app is directly accessing internal druntime data.

I don't like being classified as 'the offense', in fact, I'm really,
really tired of unfairly being assigned this position whenever I try
and make a stand for things that matter to me and my kind.
I'm just highlighting an issue I recognise in the API, where there is
no way to get a string out of druntime without GC or indirect function
calls.
It seems to be the appropriate time for me to raise this opinion,
since we seem to be in a full-throttle state of supporting @nogc,
which is excellent, but replacing it with indirect function calls
isn't awesome. This takes one problem and replaces it with a different
problem with different characteristics.

I would like to see an overload like the C lib; API receives memory,
writes to it. This may not be the API that most people use, but I
think it should exist.
I then suggested that this API may be crafted in such a way that the
higher-level goals can also be expressed through it. It could be
wrapped in a little thing that may request a memory expansion if the
buffer supplied wasn't big enough:

struct OutputBuffer
{
  char[] buffer;
  bool function(size_t size) extendBuffer; // <- user-supplied
(indirect) function that may expand the buffer in some app-specific
way.
}

toString writes to 'buffer', if it's not big enough, ONLY THEN make an indirect function call to get more memory.  API is the similar to the synk callback, but only used in the rare case of overflow.

Perhaps my little example could be re-jigged to support the synk delegate approach somehow, or perhaps it's just a different overload.

I don't know exactly, other people have much more elaborate needs than
myself. I'm just saying I don't like where this API is headed. It
doesn't satisfy my needs, and I'm super-tired of being vilified for
having that opinion!
I'd be happy with 'toString(char[] outputBuffer)', but I think a
design may exist where all requirements are satisfied, rather than
just dismissing my perspective as niche and annoying, and rolling with
what's easy.

I would propose these criteria:
  * function is not a template exposing druntime internals to the host
application
  * toString should be capable of receiving a pre-allocated(/stack)
buffer and just write to it
  * indirect function calls should only happen only in the rare case
of output buffer overflow, NOT in all cases

>> druntime is a fundamental ecosystem library. It should be properly semantically version-ed, and particularly for security reasons, I think this should be taken very very seriously.
>
>
> openssl!!!
>
> BTW, you should know that if a template is instantiated by the library itself, the compiler won't re-instantiate it and insert that in the calling code. It'll just call the instantiated binary.

I don't think that's what's on offer here. If toString is a template
(ie, receives an OutputRange, which is a user-defined type), how can
it be that the user's instantiation would already be present in
druntime?
It seems highly unlikely to me. At best, it's unreliable.

November 08, 2014

Re: toString refactor in druntime

Posted by Walter Bright
in reply to Manu

Walter Bright

Posted in reply to Manu

On 11/7/2014 5:41 PM, Manu via Digitalmars-d wrote:
> On 3 November 2014 19:55, Walter Bright via Digitalmars-d
> This isn't anything to do with what I'm talking about. I'm not nervous
> because I don't like the D template syntax, it's because I don't feel
> it's a good idea for druntime (specifically) to have non-static
> interfaces which may expose, or create dependencies on druntime
> internals.

My point with the C macro interfaces is it's not the "template" that makes an API design non-static, it's the way the API is designed. Such a determination can only be made on a case by case basis, not a blanket templates-are-bad.

> The only library I would use as comparison is the CRT, which is firmly
> self-contained, with a well defined API.

Hardly. It still uses macros, and it's still quite sensitive to various struct declarations which are in the corresponding .h files. If you don't agree, take a look in druntime at all the conditional compilation for the various CRTs which presumably have the same API.

Check out fileno(), for example. Or errno. Or heck, just grep for #define in the C .h files, or grep for "struct{".

> That's fine. But how is that the case here? Performing a string
> conversion implies direct access to the data being converted. If the
> function is a template, then my app is directly accessing internal
> druntime data.

It all depends on how you design it. Recall that what defines an Output Range is the existence of put(r,e). There's no dependency on internals unless you deliberately expose it.

> I don't like being classified as 'the offense', in fact, I'm really,
> really tired of unfairly being assigned this position whenever I try
> and make a stand for things that matter to me and my kind.

Sorry, I thought you'd find the metaphor amusing. I didn't mean it to be offensive (!). Oops, there I go again!

> which is excellent, but replacing it with indirect function calls
> isn't awesome. This takes one problem and replaces it with a different
> problem with different characteristics.

I think this is a misunderstanding of output ranges.

> I would like to see an overload like the C lib; API receives memory,
> writes to it. This may not be the API that most people use, but I
> think it should exist.
> I then suggested that this API may be crafted in such a way that the
> higher-level goals can also be expressed through it. It could be
> wrapped in a little thing that may request a memory expansion if the
> buffer supplied wasn't big enough:
>
> struct OutputBuffer
> {
>    char[] buffer;
>    bool function(size_t size) extendBuffer; // <- user-supplied
> (indirect) function that may expand the buffer in some app-specific
> way.
> }
>
> toString writes to 'buffer', if it's not big enough, ONLY THEN make an
> indirect function call to get more memory.  API is the similar to the
> synk callback, but only used in the rare case of overflow.

We're reinventing Output Ranges again.

> I don't know exactly, other people have much more elaborate needs than
> myself. I'm just saying I don't like where this API is headed. It
> doesn't satisfy my needs, and I'm super-tired of being vilified for
> having that opinion!

Well, my opinion that sink should be replaced with output range isn't very popular here either, join the club.

>    * toString should be capable of receiving a pre-allocated(/stack)
> buffer and just write to it
>    * indirect function calls should only happen only in the rare case
> of output buffer overflow, NOT in all cases

Again, this is reinventing output ranges. It's exactly the niche they serve.

> I don't think that's what's on offer here. If toString is a template
> (ie, receives an OutputRange, which is a user-defined type), how can
> it be that the user's instantiation would already be present in
> druntime?

If one uses an output range that is imported from druntime, which is quite likely, then it is also quite likely that the instantiation with that type will already be present in druntime.

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation