April 03, 2017
https://issues.dlang.org/show_bug.cgi?id=17294

          Issue ID: 17294
           Summary: Incorrect -profile=gc data
           Product: D
           Version: D2
          Hardware: All
                OS: All
            Status: NEW
          Severity: normal
          Priority: P1
         Component: druntime
          Assignee: nobody@puremagic.com
          Reporter: mihails.strasuns.contractor@sociomantic.com

Existing implementation of -profile=gc is somewhat naive in a sense that it assumes that any relevant function call only results in direct immediate allocation for exact data being requested. It can differ from real GC stats a lot, simple example:

====
void main ( )
{
    void[] buffer;
    buffer.length = 20;
    buffer.length = 60;
    buffer.length = 10;
    buffer ~= "abcd".dup;
}
====

Currently reported trace will look like this:

             60                  1    void[] D main ./sample.d:7
             20                  1    void[] D main ./sample.d:6
             10                  1    void[] D main ./sample.d:8
              4                  1    void[] D main ./sample.d:9

Which is wrong for variety of reasons:

1) runtime will allocate more data than was requested (32 and 64 bytes for
first two length assignments)
2) third length assignment shrinks the array and thus will not result in any
allocations despite being reported in log
3) last append will result in re-allocating the array and will thus allocate
more than just 4 bytes for "abcd"

There are other similar issues which all come from the fact that `-profile=gc` does not in fact track real GC allocations. One idea how that can be fixed without major changes in runtime API is to rely on `synchronized` + `GC.stats`:

```
extern (C) void[] _d_arraysetlengthTTrace(string file, int line, string
funcname, const TypeInfo ti, size_t newlength, void[]* p)
{
    import core.memory;

    synchronized (global_rt_lock)
    {
        auto oldstats = GC.stats();
        auto result = _d_arraysetlengthT(ti, newlength, p);
        auto newstats = GC.stats();
        if (newstats.usedSize > oldstats.usedSize)
        {
            accumulate(file, line, funcname, ti.toString(),
                newstats.usedSize - oldstats.usedSize);
        }
        return result;
    }
}
```

This gives perfect precision of reported allocations but this simple solution comes at cost of considerably changing scheduling of multi-threaded programs with `-profile=gc`. I would be interested to hear if there are any other ideas to fix the problem.

--