May 24, 2013
On Thursday, 23 May 2013 at 23:42:22 UTC, Manu wrote:
> I've always steered away from things like this because it creates a
> double-indirection.
> I have thought of making a similar RefCounted template, but where the
> refCount is stored in a hash table, and the pointer is used to index the
> table.
> This means the refCount doesn't pollute the class/structure being
> ref-counted, or avoids a double-indirection on general access.
> It will be slightly slower to inc/decrement, but that's a controlled
> operation.
> I would use a system like this for probably 80% of resources.
>

Reference counting also tend to create die in mass effect (objects tends to die in cluster) and freeze program for a while. I'm not sure it is that better (better than current D's GC for sure, but I'm not sure it is better than a good GC). It probably depends on the usage pattern.
May 24, 2013
On Friday, 24 May 2013 at 00:44:14 UTC, Andrei Alexandrescu wrote:
>> Custom allocators will probably be very useful, but if there's one thing
>> STL has taught me, it's hard to use them effectively, and in practise,
>> nobody ever uses them.
>
> Agreed.
>

To benefit from a custom allocator, you need to be under a very specific use case. Generic allocator are pretty good in most cases.
May 24, 2013
On 24 May 2013 15:21, deadalnix <deadalnix@gmail.com> wrote:

> On Thursday, 23 May 2013 at 23:42:22 UTC, Manu wrote:
>
>> I've always steered away from things like this because it creates a
>> double-indirection.
>> I have thought of making a similar RefCounted template, but where the
>> refCount is stored in a hash table, and the pointer is used to index the
>> table.
>> This means the refCount doesn't pollute the class/structure being
>> ref-counted, or avoids a double-indirection on general access.
>> It will be slightly slower to inc/decrement, but that's a controlled
>> operation.
>> I would use a system like this for probably 80% of resources.
>>
>>
> Reference counting also tend to create die in mass effect (objects tends to die in cluster) and freeze program for a while. I'm not sure it is that better (better than current D's GC for sure, but I'm not sure it is better than a good GC). It probably depends on the usage pattern.
>

In my experience that's fine.
In realtime code, you tend not to allocate/deallocate at runtime. Unless
it's some short lived temp's, which tend not to cluster how you describe...
When you eventually do free some big resources, causing a cluster free, you
will have probably done it at an appropriate time where you intend such a
thing to happen.


May 24, 2013
On Friday, 24 May 2013 at 05:02:33 UTC, Manu wrote:
> On 24 May 2013 14:11, Marco Leise <Marco.Leise@gmx.de> wrote:
> I don't think it's hack-ish at all, that's precisely what the stack is
> there for. It would be awesome for people to use alloca in places that it
> makes sense.
> Especially in cases where the function is a leaf or leaf-stem (ie, if there
> is no possibility of recursion), then using the stack should be encouraged.
> For safety, obviously phobos should do something like:
>   void[] buffer = bytes < reasonable_anticipated_buffer_size ?
> alloca(bytes) : new void[bytes];
>

That is probably something that could be handled in the optimizer in many cases.
May 24, 2013
On 24 May 2013 15:29, deadalnix <deadalnix@gmail.com> wrote:

> On Friday, 24 May 2013 at 05:02:33 UTC, Manu wrote:
>
>> On 24 May 2013 14:11, Marco Leise <Marco.Leise@gmx.de> wrote:
>> I don't think it's hack-ish at all, that's precisely what the stack is
>> there for. It would be awesome for people to use alloca in places that it
>> makes sense.
>> Especially in cases where the function is a leaf or leaf-stem (ie, if
>> there
>> is no possibility of recursion), then using the stack should be
>> encouraged.
>> For safety, obviously phobos should do something like:
>>   void[] buffer = bytes < reasonable_anticipated_buffer_**size ?
>> alloca(bytes) : new void[bytes];
>>
>>
> That is probably something that could be handled in the optimizer in many cases.
>

The optimiser probably can't predict if the function may recurse, and as
such, the amount of memory you feel is reasonable to take from the stack is
hard to predict...
It could possibly do so for leaf functions only, but then most of the
opportunities aren't in leaf functions. I'd say a majority of phobos
allocations are created when passing strings through to library/system
calls.


May 24, 2013
On Friday, May 24, 2013 15:37:39 Manu wrote:
> I'd say a majority of phobos
> allocations are created when passing strings through to library/system
> calls.

That does sound probable, as toStringz will often (and unpredictably) result in allocations, and it does seem like a prime location for at least attempting to use a static array instead as you suggested. But if toStringz _wouldn't_ result in an allocation, then copying to a static array would be inadvisable, so we're probably going to need a function which does toStringz's test so that it can be used outside of toStringz.

- Jonathan M Davis
May 24, 2013
On 5/24/2013 12:25 AM, deadalnix wrote:
> On Friday, 24 May 2013 at 00:44:14 UTC, Andrei Alexandrescu wrote:
>>> Custom allocators will probably be very useful, but if there's one thing
>>> STL has taught me, it's hard to use them effectively, and in practise,
>>> nobody ever uses them.
>>
>> Agreed.
>>
>
> To benefit from a custom allocator, you need to be under a very specific
> use case. Generic allocator are pretty good in most cases.


Most general allocators choke on multi-threaded code, so a large part of customizing allocations is to get rid lock contention.

While STL containers can have basic allocator templates assigned to them, if you really need performance you typically need to control all the different kinds of allocations a container does.

For example, a std::unordered_set allocates a ton of link list nodes to keep iterators stable inserts and removes, but the actual data payload is another separate allocation, as is some kind of root data structure to hold the hash tables.  In STL land this is all allocated through a single allocator object, making it very difficult (nearly impossible in a clean way) to allocate the payload data with some kind of fixed size block allocator, and allocate the metadata and link list nodes with a different allocator.  Some people would complain this exposes implementation details of a class, but the class is a template, it should be able to be configured to work the way you need it to.


class tHashMapNodeDefaultAllocator
{
public:
    static void* allocateMemory(size_t size, size_t alignment)
    {
        return mAlloc(size, alignment);
    }
    static void freeMemory(void* pointer) NOEXCEPT
    {
        mFree(pointer);
    }
};


template <typename DefaultKeyType, typename DefaultValueType>
class tHashMapConfiguration
{
public:
    typedef typename tHashClass<DefaultKeyType> HashClass;
    typedef typename tEqualsClass<DefaultKeyType> EqualClass;
    typedef tHashMapNodeDefaultAllocator NodeAllocator;
    typedef typename tDynamicArrayConfiguration<typename tHashMapNode<DefaultKeyType, DefaultValueType>> NodeArrayConfiguration;
};


template <typename KeyType, typename ValueType, typename HashMapConfiguration = tHashMapConfiguration<KeyType, ValueType>>
class tHashMap
{
};


// the tHashMap also has an array inside, so there is a way to configure that too:


class tDynamicArrayDefaultAllocator
{
public:
    static void* allocateMemory(size_t size, size_t alignment)
    {
        return mAlloc(size, alignment);
    }
    static void freeMemory(void* pointer) NOEXCEPT
    {
        mFree(pointer);
    }
};


class tDynamicArrayDefaultStrategy
{
public:
    static size_t nextAllocationSize(size_t currentSize, size_t objectSize, size_t numNewItemsRequested)
    {
        // return some size to grow the array by when the capacity is reached
        return currentSize + numNewItemsRequested * 2;
    }
}


template <typename DefaultObjectType>
class tDynamicArrayConfiguration
{
public:
    typedef tDynamicArrayDefaultStrategy DynamicArrayStrategy;
    typedef tDynamicArrayDefaultAllocator DynamicArrayAllocator;
};





May 24, 2013
On 24 May 2013 15:44, Jonathan M Davis <jmdavisProg@gmx.com> wrote:

> On Friday, May 24, 2013 15:37:39 Manu wrote:
> > I'd say a majority of phobos
> > allocations are created when passing strings through to library/system
> > calls.
>
> That does sound probable, as toStringz will often (and unpredictably)
> result
> in allocations, and it does seem like a prime location for at least
> attempting
> to use a static array instead as you suggested. But if toStringz _wouldn't_
> result in an allocation, then copying to a static array would be
> inadvisable,
> so we're probably going to need a function which does toStringz's test so
> that
> it can be used outside of toStringz.
>

Yeah, an alloca based cstring helper which performs the zero-terminate
check, then if it's not terminated, and short enough, alloca and copy, else
if too long, new.
I'm sure that would be a handy little template, and improve phobos a lot.


May 24, 2013
On Thursday, 23 May 2013 at 18:13:17 UTC, Brad Anderson wrote:
> Johannes Pfau's work in progress -vgc command line option [3] would be another great tool that would help people identify GC allocations.  This or something similar could also be used to document throughout phobos when GC allocations can happen (and help eliminate it where it makes sense to).
>

I have yet to look at any of these entries but I went ahead and built phobos with Johannes' -vgc and put the output into a spreadsheet.

http://goo.gl/HP78r (google spreadsheet)

I'm not exactly sure if this catches templates or not.  This wasn't a unittest build, just building phobos.  I did try to build the unittests with -vgc but it runs out of memory trying to build std/algorithm.d.  There is substantially more -vgc output when building the unit tests though.

Obviously a lot of these aren't going anywhere but there's probably some interesting things to be found wading through this.
May 24, 2013
On Friday, 24 May 2013 at 05:49:18 UTC, Sean Cavanaugh wrote:
> Most general allocators choke on multi-threaded code, so a large part of customizing allocations is to get rid lock contention.
>

It is safe to assume that the future is multithreaded and that general allocator won't choke on that for long. They already exists, you probably don't need (and don't want if your are not affected by NIH syndrome) to roll your own here.