Thread overview
How do I limit the number of active threads (queuing spawn calls)
Mar 27, 2011
Andrej Mitovic
Mar 27, 2011
Jonathan M Davis
Mar 27, 2011
Andrej Mitrovic
Mar 27, 2011
Andrej Mitrovic
Mar 27, 2011
Brad Roberts
March 27, 2011
I'm testing out some various compilation schemes with DMD. Right now I'm spawning multiple threads which simply do a `system` call with a string like "DMD -c somefile.d". I'd like to limit the number of active threads to something my CPU can handle (4 in this case since I've got 4 cores..).

How do I go about doing this?

Here's the function which I spawn:
void compileObjfile(string name)
{
    shell(r"dmd -od" ~ r".\cache\" ~ r" -c -version=Unicode -version=WindowsNTonly -version=Windows2000 -version=WindowsXP -I..\ " ~ name ~ " ");
}

So I just need to pass the module name to it. The trouble is, if I spawn this function inside a foreach loop, I'll inadvertently create a few dozen threads. This hogs the system for a while. :) (although this does seem to create some rather impressive compilation speeds, LOL!)

This is what the main function might look like:
void main()
{
    foreach (string name; dirEntries(curdir, SpanMode.shallow))
    {
        if (name.isfile && name.getExt == "d")
        {
            spawn(&compileObjfile, name);
        }
    }
}

Sidenotes: So I've tried compiling the win32 libraries via `DMD -lib`. DMD eats up over 300 Megs of memory, and its quite scary how fast that number grows. It took over 25 seconds to compile a lib file.

On the other hand, compiling .obj files one by one by blocking a single thread on system calls (in other words, single-threaded version), it takes about 15 seconds to create a library file. In each instantiation DMD wastes only about a dozen or so Mbytes, maybe less.

When I spawn an unlimited number of threads via a foreach loop, again compiling object-by-object, the lib file is generated in only 5(!) seconds. I'm running a quad-core on XP32, btw.

So I'm a little perplexed, because according to Tomasz (maker of xfBuild) and his various posts, compiling .obj by .obj file should apparently be really really slow and -lib makes the fastest builds. But I'm getting the exact opposite results.
March 27, 2011
On 2011-03-26 18:15, Andrej Mitovic wrote:
> I'm testing out some various compilation schemes with DMD. Right now I'm spawning multiple threads which simply do a `system` call with a string like "DMD -c somefile.d". I'd like to limit the number of active threads to something my CPU can handle (4 in this case since I've got 4 cores..).
> 
> How do I go about doing this?
> 
> Here's the function which I spawn:
> void compileObjfile(string name)
> {
>     shell(r"dmd -od" ~ r".\cache\" ~ r" -c -version=Unicode
> -version=WindowsNTonly -version=Windows2000 -version=WindowsXP -I..\ " ~
> name ~ " "); }
> 
> So I just need to pass the module name to it. The trouble is, if I spawn this function inside a foreach loop, I'll inadvertently create a few dozen threads. This hogs the system for a while. :) (although this does seem to create some rather impressive compilation speeds, LOL!)
> 
> This is what the main function might look like:
> void main()
> {
>     foreach (string name; dirEntries(curdir, SpanMode.shallow))
>     {
>         if (name.isfile && name.getExt == "d")
>         {
>             spawn(&compileObjfile, name);
>         }
>     }
> }
> 
> Sidenotes: So I've tried compiling the win32 libraries via `DMD -lib`. DMD eats up over 300 Megs of memory, and its quite scary how fast that number grows. It took over 25 seconds to compile a lib file.
> 
> On the other hand, compiling .obj files one by one by blocking a single thread on system calls (in other words, single-threaded version), it takes about 15 seconds to create a library file. In each instantiation DMD wastes only about a dozen or so Mbytes, maybe less.
> 
> When I spawn an unlimited number of threads via a foreach loop, again compiling object-by-object, the lib file is generated in only 5(!) seconds. I'm running a quad-core on XP32, btw.
> 
> So I'm a little perplexed, because according to Tomasz (maker of xfBuild) and his various posts, compiling .obj by .obj file should apparently be really really slow and -lib makes the fastest builds. But I'm getting the exact opposite results.

I don't believe that std.concurrency has any way to manage the number of threads that are running. It gives you the means to communicate between threads and gives you a nice to spawn a thread, but it doesn't really do much with thread management. You could use core.thread.Thread.getAll to get an array of all of the Threads, and spin until the number is below whatever the threshold is that you want, but that's not terribly efficient, since then you're going to have a thread spinning, eating up CPU as it waits for the others to finish.

What I have done when I've wanted to do something like this is to have each spawned thread send a message back when it's done. Then, I increment a thread count when I spawn a thread and decrement it when I receive a message indicating that a thread has terminated. In the loop that I have running which is processing whatever list of things I want processed, it will only spawn a thread if the thread count is below the chosen threshhold. Otherwise it sits there waiting to receive a message. So, it would do something like this

foreach(string name; dirEntries(curdir, SpanMode.shallow))
{
    if(name.isfile && name.getExt == "d")
    {
        if(currThreads < maxThreads)
            receiveTimeout(1, recProc);
        else
            receive(recProc;

        spawn(&compileObjfile, name);
        ++currThreads;
    }
}

recProc is then a function which handles receiving messages, and it decrements currThreads when it receives the message that a thread has been terminated.

std.concurrency does not manage threads. It only gives you tools for creating them and communicating between them. So, you need to manage the threads yourself if you want to manage them.

However, it should be noted that the task that you're looking to solve here may be better solved by std.parallelism, which David has been working on, and has been being reviewed on the main list.

- Jonathan M Davis
March 27, 2011
Well I've worked around this by polling a variable which holds the number of active threads. It's not a pretty solution, and I'd probably be best with using std.parallelism or some upcoming module. My solution for now is:

import std.stdio;
import std.file;
import std.path;
import std.process;
import std.concurrency;
import core.thread;

shared int threadsCount;

void compileObjfile(string name)
{
    system(r"dmd -od" ~ r".\cache\" ~ r" -c -version=Unicode
-version=WindowsNTonly -version=Windows2000 -version=WindowsXP -I..\ "
~ name ~ " ");
    atomicOp!"-="(threadsCount, 1);
}

int main()
{
    string libfileName = r".\cache\win32.lib ";
    string objFiles;
	foreach (string name; dirEntries(curdir, SpanMode.shallow))
    {
        if (name.isfile && name.basename.getName != "build" &&
(name.getExt == "d" || name.getExt == "di"))
        {
            string objfileName = r".\cache\" ~ name.basename.getName ~ ".obj";
            objFiles ~= objfileName ~ " ";

            atomicOp!"+="(threadsCount, 1);
            while (threadsCount > 3)
            {
                Thread.sleep(dur!("msecs")(1));
            }
            spawn(&compileObjfile, name);
        }
    }

    while (threadsCount)
    {
        Thread.sleep(dur!("msecs")(1));  // wait for threads to finish
before call to lib
    }
    system(r"lib -c -n -p64 " ~ objFiles);

    return 0;
}

The timing:

D:\dev\projects\win32\win32>timeit build
Digital Mars Librarian Version 8.02n
Copyright (C) Digital Mars 2000-2007 All Rights Reserved
http://www.digitalmars.com/ctg/lib.html
Digital Mars Librarian complete.

Version Number:   Windows NT 5.1 (Build 2600)
Exit Time:        3:49 am, Sunday, March 27 2011
Elapsed Time:     0:00:06.437
Process Time:     0:00:00.062
System Calls:     627101
Context Switches: 123883
Page Faults:      734997
Bytes Read:       93800813
Bytes Written:    7138927
Bytes Other:      1043652

So about ~6.5 seconds. Now compare this to this build script which simply invokes DMD with -lib and all the modules:

import std.stdio;
import std.process;
import std.path;
import std.file;

void main()
{
    string files;
	foreach (string name; dirEntries(curdir, SpanMode.shallow))
    {
        if (name.isfile && name.basename.getName != "build" &&
name.getExt == "d")
            files ~= name ~ " ";
    }

    system(r"dmd -lib -I..\ -version=Unicode -version=WindowsNTonly
-version=Windows2000 -version=WindowsXP " ~ files);
}

D:\dev\projects\win32\win32>timeit build.exe

Version Number:   Windows NT 5.1 (Build 2600)
Exit Time:        3:54 am, Sunday, March 27 2011
Elapsed Time:     0:00:25.750
Process Time:     0:00:00.015
System Calls:     139172
Context Switches: 44648
Page Faults:      87440
Bytes Read:       7427284
Bytes Written:    7413372
Bytes Other:      45798

Compiling object by object is almost exactly 4 times faster with threading than using -lib on all module files. And my multithreaded script is probably wasting some time by calling thread.sleep(), but I'm new to threading and I don't know how else to limit the number of threads.
March 27, 2011
Edit: It looks like I did almost the same as Jonathan advised.

I'm looking forward to std.parallelism though. I'm thinking I'd probably use some kind of parallel foreach loop that iterates over 4 files at once, and letting it do its work by spawning 4 threads. Or something like that. We'll see.
March 27, 2011
On 3/26/2011 7:00 PM, Andrej Mitrovic wrote:
> Edit: It looks like I did almost the same as Jonathan advised.
> 
> I'm looking forward to std.parallelism though. I'm thinking I'd probably use some kind of parallel foreach loop that iterates over 4 files at once, and letting it do its work by spawning 4 threads. Or something like that. We'll see.

The way I've typically done this sort of pattern is with a thread pool that gets its work from a queue.  The main thread shoves work into the queue and then calls a .join or .waitForEmpty sort of api on the pool.  So it'd look something like:

    void workerFunc(string str) { ... }

    auto tp = new ThreadPool(getNumCpus(), &workerFunc);

    foreach(...)
        tp.push(str);

    tp.join();

This can suffer from queue size problems if the amount of work is awful, but that's not a problem for the vast majority
of the cases I've had, so never worried about having the push capable of blocking or otherwise throttling the producer side.