Using tasks without GC?

Jan 04, 2020

Chris Katko

Jan 04, 2020

H. S. Teoh

Jan 04, 2020

Elronnd

Jan 04, 2020

dwdv

Jan 06, 2020

Chris Katko

When I program, it's usually videogame ideas. That implies a soft, real-time requirement. In general, that requires the mantra "allocations are evil, use object pools whenever possible." [storing data in static arrays and 'deleting' is usually just marking an entry as is_deleted=true and re-using "dead" ones.] I'm looking through D's parallelism module and the docs state, up-front: >Creates a Task on the GC heap that calls an alias. The modern, scalable way to make a parallel game engine uses tasks. (as opposed to functional decomposition where 1 thread is networking 1 thread is physics, etc.) That requires making LOTS of tasks (_per frame_!) and dispatching them. And a 60 FPS frametime is... 16 ms or less. So my question is: Has anyone done any analysis over how "dangerous" it is to use GC'd tasks for _small_ tasks (in terms of milliseconds)? Is the GC going to fire off all the time and send jitter off the charts? Because while freeze-the-world for 20 milliseconds would probably be unnoticable for many business apps, it would completely break a videogame. I wonder how difficult it would be to either modify the existing parallel task codebase (or, write my own?) to use static pools instead. Allocate once an array of "MAX_NUM_TASKS" tasks (eating the minor memory hit) to prevent touching any allocation. [Even if it wasn't GC, allocating every frame in say, C++, is dangerous. malloc/new is slow and subject to fragmentation and permissions checks.] Any advice, thoughts? Thanks, --Chris Katko

On Sat, Jan 04, 2020 at 12:51:15AM +0000, Chris Katko via Digitalmars-d-learn wrote: [...] > I'm looking through D's parallelism module and the docs state, up-front: > > >Creates a Task on the GC heap that calls an alias. > > The modern, scalable way to make a parallel game engine uses tasks. (as opposed to functional decomposition where 1 thread is networking 1 thread is physics, etc.) That requires making LOTS of tasks (_per frame_!) and dispatching them. And a 60 FPS frametime is... 16 ms or less. [...] I don't know the inner workings of std.parallelism well enough to answer your question directly, but couldn't you just spawn a bunch of workers that can be assigned tasks? So you only allocate up-front when first spawning the workers, but during the main loop you have a fixed number of workers that can receive and execute tasks. So you don't have to allocate at all after the initial up-front allocation. Generally, doing a lot of small GC allocations is bad for performance. I would avoid doing that, and use a fixed number of workers that you create, say at the beginning of every level or something, and thereafter you never deallocate them, just leave them idle if there are no tasks, otherwise have some dispatcher to assign tasks to them. T -- Latin's a dead language, as dead as can be; it killed off all the Romans, and now it's killing me! -- Schoolboy

You can control when the gc runs. So if you know the allocations are small enough that they won't OOM, then you can say GC.disable, and it straight up won't run at all. But you can manually run a collection cycle (during a loading screen or whatever) with GC.collect. See http://dpldocs.info/experimental-docs/core.memory.GC.html

> Creates a Task on the GC heap that calls an alias. If possible, there's also scopedTask, which allocates on the stack: https://dlang.org/phobos/std_parallelism.html#.scopedTask > So my question is: Has anyone done any analysis over how "dangerous" it is to use GC'd tasks for _small_ tasks (in terms of milliseconds)? Nothing major, but https://github.com/mratsim/weave/tree/master/benchmarks/fibonacci puts quite a bit of pressure on various implementations. You might want to profile ./pfib 40: import std; ulong fib(uint n) { if (n < 2) return n; auto x = scopedTask!fib(n-1); // vs. Task!fib(n-1); taskPool.put(x); auto y = fib(n-2); return x.yieldForce + y; // {yield,spin,work}Force } void main(string[] args) { enforce(args.length == 2, "Usage: fib <n-th fibonacci number requested>"); auto n = args[1].to!uint; // defaultPoolThreads(totalCPUs); writefln!"fib(%d) = %d"(n, fib(n)); } At least D isn't locking up beyond 12 tasks; looking at you, stdlib-nim. :)