February 03, 2012
On 03-02-2012 11:08, Artur Skawina wrote:
> On 02/03/12 00:20, Jonathan M Davis wrote:
>> in is pointless on value types. All it does is make the function parameter
>> const, which really doesn't do much for you, and in some instances, is really
>> annoying. Personally, I see no point in using in unless the parameter is a
>> reference type, and even then, it's often a bad idea with reference types,
>> because in is really const scope, and the scope is problematic if you want to
>> return anything from that variable. It's particularly problematic with arrays,
>> since it's frequently desirable to return slices of them, and scope (and
>> therefore in) would prevent that. It's useful in some instances (particularly
>> with delegates), but I'd use in _very_ sparingly. It's almost always more
>> trouble than it's worth IMHO.
>
> BTW, scope should have been the default for *all* reference type function
> arguments, with an explicit modifier, say "esc", required to let the thing
> escape. It's an all-or-nothing thing, just like immutable strings - not using
> it everywhere is painful, but once you switch everything over you get the
> benefits.
>
> If it isn't obvious why - GC. The compiler can optimize the cases where it
> knows a newly allocated object can't escape and reduce or omit the GC overhead.
> And yes, it can also do this automatically - but that requires analyzing the
> whole call chain, which is a) not always possible and b) much more expensive.
>
> artur

It is not that simple.

If the class's constructor passes 'this' off to some arbitrary code, this optimization breaks completely. You would need whole-program analysis to have the slightest hope of doing this optimization correctly.

--
- Alex
February 03, 2012
On 02/03/12 11:21, Alex Rønne Petersen wrote:
> On 03-02-2012 11:08, Artur Skawina wrote:
>> On 02/03/12 00:20, Jonathan M Davis wrote:
>>> in is pointless on value types. All it does is make the function parameter const, which really doesn't do much for you, and in some instances, is really annoying. Personally, I see no point in using in unless the parameter is a reference type, and even then, it's often a bad idea with reference types, because in is really const scope, and the scope is problematic if you want to return anything from that variable. It's particularly problematic with arrays, since it's frequently desirable to return slices of them, and scope (and therefore in) would prevent that. It's useful in some instances (particularly with delegates), but I'd use in _very_ sparingly. It's almost always more trouble than it's worth IMHO.
>>
>> BTW, scope should have been the default for *all* reference type function arguments, with an explicit modifier, say "esc", required to let the thing escape. It's an all-or-nothing thing, just like immutable strings - not using it everywhere is painful, but once you switch everything over you get the benefits.
>>
>> If it isn't obvious why - GC. The compiler can optimize the cases where it knows a newly allocated object can't escape and reduce or omit the GC overhead. And yes, it can also do this automatically - but that requires analyzing the whole call chain, which is a) not always possible and b) much more expensive.
>>
>> artur
> 
> It is not that simple.
> 
> If the class's constructor passes 'this' off to some arbitrary code, this optimization breaks completely. You would need whole-program analysis to have the slightest hope of doing this optimization correctly.

It's about enabling the optimization for as much code as possible. And probably the most interesting cases are strings/arrays - the GC overhead can be huge if you do a lot of concatenation etc.

Would marking the ctor as "scope" (similarly to "const" or "pure") work for your case? (it is reasonable to expect that the compiler checks this by itself; it's per-type, so not nearly as expensive as analyzing the flow)

artur
February 03, 2012
On 03-02-2012 11:41, Artur Skawina wrote:
> On 02/03/12 11:21, Alex Rønne Petersen wrote:
>> On 03-02-2012 11:08, Artur Skawina wrote:
>>> On 02/03/12 00:20, Jonathan M Davis wrote:
>>>> in is pointless on value types. All it does is make the function parameter
>>>> const, which really doesn't do much for you, and in some instances, is really
>>>> annoying. Personally, I see no point in using in unless the parameter is a
>>>> reference type, and even then, it's often a bad idea with reference types,
>>>> because in is really const scope, and the scope is problematic if you want to
>>>> return anything from that variable. It's particularly problematic with arrays,
>>>> since it's frequently desirable to return slices of them, and scope (and
>>>> therefore in) would prevent that. It's useful in some instances (particularly
>>>> with delegates), but I'd use in _very_ sparingly. It's almost always more
>>>> trouble than it's worth IMHO.
>>>
>>> BTW, scope should have been the default for *all* reference type function
>>> arguments, with an explicit modifier, say "esc", required to let the thing
>>> escape. It's an all-or-nothing thing, just like immutable strings - not using
>>> it everywhere is painful, but once you switch everything over you get the
>>> benefits.
>>>
>>> If it isn't obvious why - GC. The compiler can optimize the cases where it
>>> knows a newly allocated object can't escape and reduce or omit the GC overhead.
>>> And yes, it can also do this automatically - but that requires analyzing the
>>> whole call chain, which is a) not always possible and b) much more expensive.
>>>
>>> artur
>>
>> It is not that simple.
>>
>> If the class's constructor passes 'this' off to some arbitrary code, this optimization breaks completely. You would need whole-program analysis to have the slightest hope of doing this optimization correctly.
>
> It's about enabling the optimization for as much code as possible. And probably
> the most interesting cases are strings/arrays - the GC overhead can be huge if
> you do a lot of concatenation etc.
>
> Would marking the ctor as "scope" (similarly to "const" or "pure") work for your
> case? (it is reasonable to expect that the compiler checks this by itself; it's
> per-type, so not nearly as expensive as analyzing the flow)
>
> artur

Well, you would have to mark methods as scope too, as they could be passing off 'this' as well.

It's probably doable that way, but explicit annotations kind of suck. :(

--
- Alex
February 03, 2012
On 02/03/12 11:41, Artur Skawina wrote:
> On 02/03/12 11:21, Alex Rønne Petersen wrote:
>> On 03-02-2012 11:08, Artur Skawina wrote:
>>> On 02/03/12 00:20, Jonathan M Davis wrote:
>>>> in is pointless on value types. All it does is make the function parameter const, which really doesn't do much for you, and in some instances, is really annoying. Personally, I see no point in using in unless the parameter is a reference type, and even then, it's often a bad idea with reference types, because in is really const scope, and the scope is problematic if you want to return anything from that variable. It's particularly problematic with arrays, since it's frequently desirable to return slices of them, and scope (and therefore in) would prevent that. It's useful in some instances (particularly with delegates), but I'd use in _very_ sparingly. It's almost always more trouble than it's worth IMHO.
>>>
>>> BTW, scope should have been the default for *all* reference type function arguments, with an explicit modifier, say "esc", required to let the thing escape. It's an all-or-nothing thing, just like immutable strings - not using it everywhere is painful, but once you switch everything over you get the benefits.
>>>
>>> If it isn't obvious why - GC. The compiler can optimize the cases where it knows a newly allocated object can't escape and reduce or omit the GC overhead. And yes, it can also do this automatically - but that requires analyzing the whole call chain, which is a) not always possible and b) much more expensive.
>>>
>>> artur
>>
>> It is not that simple.
>>
>> If the class's constructor passes 'this' off to some arbitrary code, this optimization breaks completely. You would need whole-program analysis to have the slightest hope of doing this optimization correctly.
> 
> It's about enabling the optimization for as much code as possible. And probably the most interesting cases are strings/arrays - the GC overhead can be huge if you do a lot of concatenation etc.
> 
> Would marking the ctor as "scope" (similarly to "const" or "pure") work for your case? (it is reasonable to expect that the compiler checks this by itself; it's per-type, so not nearly as expensive as analyzing the flow)

Actually, passing 'this' to some "some arbitrary code" isn't a problem, unless
the code in question has the "esc" annotation, in which case you need to mark
the ctor (or any other method) as "esq" too; that will turn off the optimization,
for this struct/class, obviously.
That's why "scope" needs to be the default - mixing it with code that does not
guarantee that the object does not escape does not really work - you cannot call
anything not marked with "scope" with an already scoped object. Which means you
need to mark practically every function argument as scope - this doesn't scale well.

artur
February 03, 2012
On Friday, February 03, 2012 11:08:54 Artur Skawina wrote:
> BTW, scope should have been the default for *all* reference type function arguments, with an explicit modifier, say "esc", required to let the thing escape. It's an all-or-nothing thing, just like immutable strings - not using it everywhere is painful, but once you switch everything over you get the benefits.

That would destroy slicing. I'm firmly of the opinion that scope should be used sparingly.

- Jonathan M Davis
February 03, 2012
On 02/03/12 13:06, Jonathan M Davis wrote:
> On Friday, February 03, 2012 11:08:54 Artur Skawina wrote:
>> BTW, scope should have been the default for *all* reference type function arguments, with an explicit modifier, say "esc", required to let the thing escape. It's an all-or-nothing thing, just like immutable strings - not using it everywhere is painful, but once you switch everything over you get the benefits.
> 
> That would destroy slicing. I'm firmly of the opinion that scope should be used sparingly.

Well, not doing it destroys performance. [1] It's a trade-off. Also, i don't know if "destroy slicing" is accurate.

Things like 'string f(string s) { return s[1..$]; }' needs to continue to work;
the object does not really "escape" from the POV of f(), but the caller has to
assume it's not dead after returning from the function. Doing this by default for
any functions returning refs that could potentially hold on to the passed object
would make things work. For the cases that where the called function knows that
it will always return unique objects the signature could look like
'new string f(string s);', but that's only an optimization.

Any other problematic slicing use, that i'm not thinking of right now?

artur


[1] I had a case, where turning on logging in some code made the program unusable, because instead of IIRC ~40s it took 40+ minutes, at which point i gave up and killed it... The profile looked like this:

37.62%  uint gc.gcx.Gcx.fullcollect(void*)
20.47%  uint gc.gcbits.GCBits.test(uint)
13.80%  uint gc.gcbits.GCBits.testSet(uint)
10.15%  pure nothrow @safe bool std.uni.isGraphical(dchar)
 3.33%  _D3std5array17__T8AppenderTAyaZ8Appender10__T3putTwZ3putMF
 2.78%  0x11025a
 2.13%  _D3std6format65__T13formatElementTS3std5array17__T8Appende
 1.64%  _D3std6format56__T10formatCharTS3std5array17__T8AppenderTA
 1.37%  pure @safe uint std.utf.encode(ref char[4], dchar)
 0.88%  void* gc.gcx.GC.malloc(uint, uint, uint*)
 0.50%  pure nothrow @safe bool std.uni.binarySearch2(dchar, immutable(dchar[2][]))
 0.45%  void gc.gcbits.GCBits.set(uint)
 0.37%  void gc.gcbits.GCBits.clear(uint)
 0.34%  __divdi3

That shows several problems, but even after fixing the obvious ones (inlining
GCBits, making std.uni.isGraphical sane (this, btw, reduced its cost to ~1%))
GC still takes up most of time (not remembering the details, but certainly >50%,
it could have been >80%).
Some slowdown from the IO and formatting is expected, but spending most cycles
on GC is not reasonable, when most objects never leave the scope (in this case
it was just strings passed to writeln etc IIRC).
February 03, 2012
Jonathan M Davis:

> in is pointless on value types. All it does is make the function parameter const, which really doesn't do much for you, and in some instances, is really annoying.

Having const value types is useful because you can't change them later inside the method. This helps you avoid bugs like:


void foo(int n) {
  // uses n here
  // modifies n here by mistake
  // uses n here again, assuming it's the 'real' n argument
}


When you program you think of arguments as the inputs of your algorithm, so if you mutate them by mistake, this sometimes causes bugs if later you think they are the real inputs of your algorithm still.

Generally in D code all variables that can be const/immutable should be const/immutable, unless this causes problems or is impossible or it causes signficant performance troubles. This avoids some bugs, helps DMD optimize better (I have seen this), and helps the person that reads the code to understand the code better (because he/she/shi is free to focus on just the mutable variables).

It's better to have const function arguments, unless this is not possible, or for not common situations where a mutable input helps you optimize your algorithm better (especially if the profiler has told you so).


> Personally, I see no point in using in unless the parameter is a
> reference type, and even then, it's often a bad idea with reference types,
> because in is really const scope, and the scope is problematic if you want to
> return anything from that variable.

Think of returning a part of a mutable input argument as an optimization, to be used when you know you need the extra speed. Otherwise where performance is not a problem it's often safer to return a const value or to return something new created inside the function/method. This programming style avoids many mistakes (it's useful in Java coding too).
From what I've seen, in my D code only a small percentage of the program lines need to be optimized and use C-style coding. For most of the lines of code a more functional D style is enough, and safer. The idea is "mutability where needed, and a bit more functional-style everywhere else" :-)

Bye,
bearophile
February 03, 2012
Artur Skawina:

> Would marking the ctor as "scope" (similarly to "const" or "pure") work for your case? (it is reasonable to expect that the compiler checks this by itself; it's per-type, so not nearly as expensive as analyzing the flow)

Maybe this is a topic worth discussing in the main D newsgroup (and maybe later worth an enhancement request).

Bye,
bearophile
February 03, 2012
Al 02/02/12 20:11, En/na Ali Çehreli ha escrit:
> On 02/02/2012 11:00 AM, xancorreu wrote:
> > Al 02/02/12 19:18, En/na bearophile ha escrit:
>
> > Can I say "serialize the first, second and third arguments as Class
> > Person"?
> >
> > I mean, if you define a class Person like:
> >
> > class Person {
> > string name
> > uint age
> > dead bool
> > }
> >
> > could you serialize the input from console, like
> > Std.in.serialize(Person, args(0), args(1), args(2))?
>
> I haven't used it but there is Orange:
>
>   https://github.com/jacob-carlborg/orange
>
> I think it will be included in Phobos.
>
> > You could do that "manually" checking each paramm, but it's a 
> tedious task.
>
> If the input is exactly in the format that a library like Orange expects, then it's easy.
>
> To me, constructing an object from user input is conceptually outside of OO, because there is no object at that point yet. It makes sense to me to read the input and then make an object from the input.

For my it could be put in a outside library, not in the class of the object. And if it's well designed, it could be informed of all exceptions....

>
> Depending on the design, the input may be rejected by the function that reads the input, by the constructor of the type, or by both.
>
> > Thanks,
> > Xan.
>
> Ali
>

February 03, 2012
Al 02/02/12 20:40, En/na Jonathan M Davis ha escrit:
> And whether that's the best way to handle it depends on what you're trying to do in terms of user input and error messages. How on earth is all of that going to be handled generically? It all depends on what the programmer is trying to do. Switching to use command-line switches and getopt would help some, but you still have to deal with the error messages yourself. Creating the Person is the easy part. - Jonathan M Davis 

I think it as a tool, not solve-everything-thing, it just a tool for easy your job. Yeah, you could manually do that (thanks for the code) but really you, maybe, want to do it _fastest_ and _easyest_.

Thanks,
Xan.