October 08, 2014
On Tuesday, 7 October 2014 at 21:59:08 UTC, Peter Alexander wrote:
> On Tuesday, 7 October 2014 at 20:13:32 UTC, Jacob Carlborg wrote:
>> I didn't look at any source code to see what "new" is actually allocating, for example.
>
> I did some random sampling, and it's 90% exceptions, with the occasional array allocation.
>

That's interesting. I suspected around 50%. Well that's even better since if we do ref-counted exceptions we solve 90% of problem ;)

> I noticed that a lot of the ~ and ~= complaints are in code that only ever runs at compile time (generating strings for mixin). I wonder if there's any way we can silence these false positives.

I'm going to use blacklist for these as compiler can't in general know if it is going to be used exclusively at CTFE or not.

Okay,  I think I should go a bit futher with the second version of the tool.

Things on todo list:
 - make tool general enough to work for any GitHub based project (and hackable for other hostings)
 - use Brian's D parser to accurately find artifacts
 - detect "throw new SomeStuff" pattern and automatically populate potential fix line
 - list all source links in one coulmn for the same function (this needs proper parser)
 - use blacklist of <module-name>:<artifact name> to filter out CTFE
 - use current data from wiki for "potential fix" column if present

Holy grail is:
 - plot DOT call-graph of GC-users, with leafs being the ones reported by -vgc. So I start with this list then add functions them, then functions that use these functions and so on.

October 08, 2014
On Tuesday, 7 October 2014 at 16:23:19 UTC, grm wrote:
> 1.) It may be helpful to reduce the noise in that every match after a new is ignored (and probaly multiple 'operator ~' alarms within the same statement).

The tool currently is quick line-based hack, hence no notion of statement.
It's indeed a good idea to merge all messages for one statement and de-duplicate on per statement basis.

> 2.) There seems to be a problem with repeated alarms:
> When viewing the page source, this link shows up numerous times. See
> https://github.com/D-Programming-Language//phobos/blob/d4d98124ab6cbef7097025a7cfd1161d1963c87e/std/conv.d#L688

There are lots of toImpl overloads, deduplication is done on module:LOC basis so the all show up. Going to fix in v2 to merge all of them in one row.

>
> /Gerhard

October 08, 2014
Am Tue, 07 Oct 2014 21:59:05 +0000
schrieb "Peter Alexander" <peter.alexander.au@gmail.com>:

> On Tuesday, 7 October 2014 at 20:13:32 UTC, Jacob Carlborg wrote:
> > I didn't look at any source code to see what "new" is actually allocating, for example.
> 
> I did some random sampling, and it's 90% exceptions, with the occasional array allocation.
> 
> I noticed that a lot of the ~ and ~= complaints are in code that only ever runs at compile time (generating strings for mixin). I wonder if there's any way we can silence these false positives.

Code in if(__ctfe) blocks could be (and should be) allowed:
https://github.com/D-Programming-Language/dmd/pull/3572

But if you have got a normal function (string generateMixin()) the compiler can't really know that it's only used at compile time. And if it's not a template the code using the GC will be compiled, even if it's never called. This might be enough to get undefined symbol errors if you don't have an GC, so the error messages are kinda valid.
October 08, 2014
Am Tue, 07 Oct 2014 15:57:58 +0000
schrieb "Dmitry Olshansky" <dmitry.olsh@gmail.com>:

> 
> Instead we need to observe patterns and label it automatically until the non-trivial subset remains. So everybody, please take time and identify simple patterns and post back your ideas on solution(s).
> 

I just had a look at all closure allocations and identified these patterns:


1) Fixable by manually stack-allocating closure
   A delegate is passed to some function which stores this delegate and
   therefore correctly doesn't mark the parameter as scope. However,
   the lifetime of the stored delegate is still limited to the current
   function (e.g. it's stored in a struct instance, but on the stack).

   Can be fixed by creating a static struct{T... members; void
   doSomething(){access members}} instance on stack and passing
   &stackvar.doSomething as delegate.

2) Using delegates to add state to ranges
   ----
   return iota(dim).
     filter!(i => ptr[i])().
     map!(i => BitsSet!size_t(ptr[i], i * bitsPerSizeT))().
     joiner();
   ----
   This code adds state to ranges without declaring a new type: the ptr
   variable is not accessible and needs to be move into a closure.
   Declaring a custom range type is a solution, but not
   straightforward: If the ptr field is moved into the range a closure
   is not necessary. But if the range is copied, it's address changes
   and the delegate passed to map is now invalid.

3) Functions taking delegates as generic parameters
   receiveTimeout,receive,formattedWrite accept different types,
   including delegates. The delegates can all be scope to avoid the
   allocation but is void foo(T)(scope T) a good idea? The alternative
   is probably making an overload for delegates with scope attribute.

   (The result is that all functions calling receiveTimeout,... with a
   delegate allocate a closure)

4) Solvable with manual memory management
   Some specific functions can't be easily fixed, but the delegates
   they create have a well defined lifetime (for example spawn creates
   a delegate which is only needed at the startup of a new thread, it's
   never used again). These could be malloc+freed.

5) Design issue
   These functions generally create a delegate using variables passed
   in as parameters. There's no way to avoid closures here. Although
   manual allocation is an possible, the lifetime is undefined and can
   only be managed by the GC.

6) Other
   Two cases can be fixed by moving a buffer into a struct or moving a
   function out of a member function into it's surrounding class.


Also notable: 17 out of 35 cases are in std.net.curl. This is because curl heavily uses delegates and wrapper delegates.
October 08, 2014
On Wednesday, 8 October 2014 at 11:25:25 UTC, Johannes Pfau wrote:
> Am Tue, 07 Oct 2014 15:57:58 +0000
> schrieb "Dmitry Olshansky" <dmitry.olsh@gmail.com>:
>
>> 
>> Instead we need to observe patterns and label it automatically until the non-trivial subset remains. So everybody, please take time and identify simple patterns and post back your ideas on solution(s).
>> 
>
> I just had a look at all closure allocations and identified these
> patterns:

Awesome! This is exactly the kind of help I wanted.

>
>
> 1) Fixable by manually stack-allocating closure
>    A delegate is passed to some function which stores this delegate and
>    therefore correctly doesn't mark the parameter as scope. However,
>    the lifetime of the stored delegate is still limited to the current
>    function (e.g. it's stored in a struct instance, but on the stack).
>
>    Can be fixed by creating a static struct{T... members; void
>    doSomething(){access members}} instance on stack and passing
>    &stackvar.doSomething as delegate.

Hm... Probably we can create a template for this.

>
> 2) Using delegates to add state to ranges
>    ----
>    return iota(dim).
>      filter!(i => ptr[i])().
>      map!(i => BitsSet!size_t(ptr[i], i * bitsPerSizeT))().
>      joiner();
>    ----
>    This code adds state to ranges without declaring a new type: the ptr
>    variable is not accessible and needs to be move into a closure.
>    Declaring a custom range type is a solution, but not
>    straightforward: If the ptr field is moved into the range a closure
>    is not necessary. But if the range is copied, it's address changes
>    and the delegate passed to map is now invalid.
>

Indeed, such code is fine in "user-space" but have no place in the library.

> 3) Functions taking delegates as generic parameters
>    receiveTimeout,receive,formattedWrite accept different types,
>    including delegates. The delegates can all be scope to avoid the
>    allocation but is void foo(T)(scope T) a good idea? The alternative
>    is probably making an overload for delegates with scope attribute.
>
>    (The result is that all functions calling receiveTimeout,... with a
>    delegate allocate a closure)
>
> 4) Solvable with manual memory management
>    Some specific functions can't be easily fixed, but the delegates
>    they create have a well defined lifetime (for example spawn creates
>    a delegate which is only needed at the startup of a new thread, it's
>    never used again). These could be malloc+freed.
>

I think this and (2) can be solved if we come up with solid support for RC-closures.

> 5) Design issue
>    These functions generally create a delegate using variables passed
>    in as parameters. There's no way to avoid closures here. Although
>    manual allocation is an possible, the lifetime is undefined and can
>    only be managed by the GC.
>
> 6) Other
>    Two cases can be fixed by moving a buffer into a struct or moving a
>    function out of a member function into it's surrounding class.
>

Yeah, there are always outliers ;)

> Also notable: 17 out of 35 cases are in std.net.curl. This is because
> curl heavily uses delegates and wrapper delegates.

Interesting... it must be due to cURL callback-based API.
All in all, std.net.curl is a constant source of complaints, it may need some work to fix other issues anyway.

October 08, 2014
On Wednesday, 8 October 2014 at 12:09:00 UTC, Dmitry Olshansky wrote:
> I think this and (2) can be solved if we come up with solid support for RC-closures.

Delegates don't obey data sharing type checks though. A long standing language issue.
October 08, 2014
Was in a slight hurry and forgot to mention that I (quite sure: we all) very much appreciate the hands-on mentality your approach shows.

looking forward to v2

/Gerhard
October 08, 2014
On 10/8/14, 1:13 AM, Johannes Pfau wrote:
> Am Tue, 07 Oct 2014 21:59:05 +0000
> schrieb "Peter Alexander" <peter.alexander.au@gmail.com>:
>
>> On Tuesday, 7 October 2014 at 20:13:32 UTC, Jacob Carlborg wrote:
>>> I didn't look at any source code to see what "new" is actually
>>> allocating, for example.
>>
>> I did some random sampling, and it's 90% exceptions, with the
>> occasional array allocation.
>>
>> I noticed that a lot of the ~ and ~= complaints are in code that
>> only ever runs at compile time (generating strings for mixin). I
>> wonder if there's any way we can silence these false positives.
>
> Code in if(__ctfe) blocks could be (and should be) allowed:
> https://github.com/D-Programming-Language/dmd/pull/3572
>
> But if you have got a normal function (string generateMixin()) the
> compiler can't really know that it's only used at compile time. And if
> it's not a template the code using the GC will be compiled, even if
> it's never called. This might be enough to get undefined symbol errors
> if you don't have an GC, so the error messages are kinda valid.

That's a bummer. Can we get the compiler to remove the "if (__ctfe)" code after semantic checking?

Andrei
October 08, 2014
On 10/8/14, 1:01 PM, Andrei Alexandrescu wrote:
> On 10/8/14, 1:13 AM, Johannes Pfau wrote:
>> Am Tue, 07 Oct 2014 21:59:05 +0000
>> schrieb "Peter Alexander" <peter.alexander.au@gmail.com>:
>>
>>> On Tuesday, 7 October 2014 at 20:13:32 UTC, Jacob Carlborg wrote:
>>>> I didn't look at any source code to see what "new" is actually
>>>> allocating, for example.
>>>
>>> I did some random sampling, and it's 90% exceptions, with the
>>> occasional array allocation.
>>>
>>> I noticed that a lot of the ~ and ~= complaints are in code that
>>> only ever runs at compile time (generating strings for mixin). I
>>> wonder if there's any way we can silence these false positives.
>>
>> Code in if(__ctfe) blocks could be (and should be) allowed:
>> https://github.com/D-Programming-Language/dmd/pull/3572
>>
>> But if you have got a normal function (string generateMixin()) the
>> compiler can't really know that it's only used at compile time. And if
>> it's not a template the code using the GC will be compiled, even if
>> it's never called. This might be enough to get undefined symbol errors
>> if you don't have an GC, so the error messages are kinda valid.
>
> That's a bummer. Can we get the compiler to remove the "if (__ctfe)"
> code after semantic checking?

Or would "static if (__ctfe)" work? -- Andrei

October 08, 2014
On 10/8/14 4:10 PM, Andrei Alexandrescu wrote:
> On 10/8/14, 1:01 PM, Andrei Alexandrescu wrote:

>>
>> That's a bummer. Can we get the compiler to remove the "if (__ctfe)"
>> code after semantic checking?
>
> Or would "static if (__ctfe)" work? -- Andrei

Please don't ask me to explain why, because I still don't know. But _ctfe is a normal runtime variable :) It has been explained to me before, why it has to be a runtime variable. I think Don knows the answer.

-Steve