Jump to page: 1 2
Thread overview
Death by concurrency
Nov 08, 2005
Manfred Nowak
Nov 08, 2005
Ben Hinkle
Nov 08, 2005
Sean Kelly
Nov 08, 2005
Sean Kelly
Nov 08, 2005
Sean Kelly
Nov 08, 2005
Ben Hinkle
Nov 08, 2005
Sean Kelly
Nov 08, 2005
pragma
Nov 08, 2005
Sean Kelly
Nov 08, 2005
Sean Kelly
Nov 09, 2005
Georg Wrede
November 08, 2005
The well known shootout shows a negative mark for concurrency for D:

http://shootout.alioth.debian.org/benchmark.php? test=message&lang=all&sort=fullcpu

What is the reason?

-manfred
November 08, 2005
"Manfred Nowak" <svv1999@hotmail.com> wrote in message news:Xns97086B99494CDsvv1999hotmailcom@63.105.9.61...
> The well known shootout shows a negative mark for concurrency for D:
>
> http://shootout.alioth.debian.org/benchmark.php? test=message&lang=all&sort=fullcpu
>
> What is the reason?
>
> -manfred

It could be the busy-waiting. Instead of looping and yielding a waiting thread should park itself. The ReentrantLock and Condition classes from http://home.comcast.net/~benhinkle/locks/locks.html should help - but I don't know if user libraries are allowed in the shootout like that.


November 08, 2005
Ben Hinkle wrote:
> "Manfred Nowak" <svv1999@hotmail.com> wrote in message news:Xns97086B99494CDsvv1999hotmailcom@63.105.9.61...
> 
>>The well known shootout shows a negative mark for concurrency for D:
>>
>>http://shootout.alioth.debian.org/benchmark.php?
>>test=message&lang=all&sort=fullcpu
>>
>>What is the reason?
 >
> It could be the busy-waiting. Instead of looping and yielding a waiting thread should park itself. The ReentrantLock and Condition classes from http://home.comcast.net/~benhinkle/locks/locks.html should help - but I don't know if user libraries are allowed in the shootout like that. 

I'm not sure what's wrong with their test.  I modified the shootout code to run on Ares with DMD .139 (since I'm too lazy to rebuild Phobos just for this test), and ptime reported it completing in 0.625 seconds on my laptop.  And this was with quite a lot of stuff running in  the background.  In case anyone is interested, here is the test code.  I simply renamed 'wait' to 'join' and did output via printf instead of streams:

import std.thread, std.c.stdio, std.c.stdlib;

int main(char[][] args)
{
    const int length = 500;
    int n = args.length > 1 ? atoi(args[1]) : 1;

    EndLink chainEnd = new EndLink(length * n);
    chainEnd.start();

    Link chain = chainEnd;
    while(n--)
    {
        for(int i = 1; i < length; i++)
        {
            Link link = new Link(chain);
            chain = link;
        }

        chain.put(0);
        while(chain.next)
        {
            chain.start();
            chain.join();
            chain = chain.next;
        }
    }

    chainEnd.join();
    printf("%i\n", chainEnd.count);

    return 0;
}

class Link: Thread
{
private:
    int message = -1;

public:
    Link next;

    this(Link t)
    {
        next = t;
    }

    void run()
    {
        next.put(this.take());
    }

    synchronized void put(int m)
    {
        message = m;
        yield();
    }

protected:
    synchronized int take()
    {
        if(message != -1)
        {
            int m = message;
            message = -1;
            return m + 1;
        }
        yield();
        return 0;
    }
}

class EndLink: Link
{
private:
    int finalCount;

public:
    int count = 0;

    this(int i)
    {
        super(null);
        finalCount = i;
    }

    void run()
    {
        while(count < finalCount)
        {
            count += this.take();
            yield();
        }
    }
}
November 08, 2005
Oh, the shootout says the code should print '5000' and mine printed '500'.  I haven't taken the time to figure out why the result was different, though it's likely a bug in the shootout code.


Sean
November 08, 2005
If the code was making 500 threads it could also be that they ran the
benchmark on linux and bumped into phobos's limitation on the number of
threads allowed at once:
    static Thread[/*_POSIX_THREAD_THREADS_MAX*/ 100] allThreads;


"Sean Kelly" <sean@f4.ca> wrote in message news:dkr4ls$ljk$1@digitaldaemon.com...
> Ben Hinkle wrote:
>> "Manfred Nowak" <svv1999@hotmail.com> wrote in message news:Xns97086B99494CDsvv1999hotmailcom@63.105.9.61...
>>
>>>The well known shootout shows a negative mark for concurrency for D:
>>>
>>>http://shootout.alioth.debian.org/benchmark.php? test=message&lang=all&sort=fullcpu
>>>
>>>What is the reason?
>  >
>> It could be the busy-waiting. Instead of looping and yielding a waiting thread should park itself. The ReentrantLock and Condition classes from http://home.comcast.net/~benhinkle/locks/locks.html should help - but I don't know if user libraries are allowed in the shootout like that.
>
> I'm not sure what's wrong with their test.  I modified the shootout code to run on Ares with DMD .139 (since I'm too lazy to rebuild Phobos just for this test), and ptime reported it completing in 0.625 seconds on my laptop.  And this was with quite a lot of stuff running in  the background.  In case anyone is interested, here is the test code.  I simply renamed 'wait' to 'join' and did output via printf instead of streams:
>
> import std.thread, std.c.stdio, std.c.stdlib;
>
> int main(char[][] args)
> {
>     const int length = 500;
>     int n = args.length > 1 ? atoi(args[1]) : 1;
>
>     EndLink chainEnd = new EndLink(length * n);
>     chainEnd.start();
>
>     Link chain = chainEnd;
>     while(n--)
>     {
>         for(int i = 1; i < length; i++)
>         {
>             Link link = new Link(chain);
>             chain = link;
>         }
>
>         chain.put(0);
>         while(chain.next)
>         {
>             chain.start();
>             chain.join();
>             chain = chain.next;
>         }
>     }
>
>     chainEnd.join();
>     printf("%i\n", chainEnd.count);
>
>     return 0;
> }
>
> class Link: Thread
> {
> private:
>     int message = -1;
>
> public:
>     Link next;
>
>     this(Link t)
>     {
>         next = t;
>     }
>
>     void run()
>     {
>         next.put(this.take());
>     }
>
>     synchronized void put(int m)
>     {
>         message = m;
>         yield();
>     }
>
> protected:
>     synchronized int take()
>     {
>         if(message != -1)
>         {
>             int m = message;
>             message = -1;
>             return m + 1;
>         }
>         yield();
>         return 0;
>     }
> }
>
> class EndLink: Link
> {
> private:
>     int finalCount;
>
> public:
>     int count = 0;
>
>     this(int i)
>     {
>         super(null);
>         finalCount = i;
>     }
>
>     void run()
>     {
>         while(count < finalCount)
>         {
>             count += this.take();
>             yield();
>         }
>     }
> }


November 08, 2005
Oops.  I just noticed that N is a command-line parameter.  For an N of 10, ptime clocks this test at 5.210 seconds on my laptop, and '5000' is printed as expected.


Sean
November 08, 2005
Ben Hinkle wrote:
> If the code was making 500 threads it could also be that they ran the benchmark on linux and bumped into phobos's limitation on the number of threads allowed at once:
>     static Thread[/*_POSIX_THREAD_THREADS_MAX*/ 100] allThreads;

Ah, good point.  Ares doesn't have this limitation as it used an AA for storing thread references.


Sean
November 08, 2005
In article <dkr5ac$me5$2@digitaldaemon.com>, Sean Kelly says...
>
>Ben Hinkle wrote:
>> If the code was making 500 threads it could also be that they ran the
>> benchmark on linux and bumped into phobos's limitation on the number of
>> threads allowed at once:
>>     static Thread[/*_POSIX_THREAD_THREADS_MAX*/ 100] allThreads;
>
>Ah, good point.  Ares doesn't have this limitation as it used an AA for storing thread references.
>

Sean,
Out of curiosity, have you tried using Ares' Atomic lib for this task?  I wonder
what the difference in time would be when compared to 'synchronized'?

- EricAnderton at yahoo
November 08, 2005
Manfred Nowak wrote:
> The well known shootout shows a negative mark for concurrency for D

I don't really like the way this test is structured, as what it is really testing the efficiency of thread creation.  For any language with its roots in OS-level thread code, the performance should be pretty much equivalent.  I suspect the functional languages perform so well because they do user-level concurrency rather than kernel-level concurrency (and probably also because they don't allocate large chunks of memory for stack space and such in the process).  I'm quite surprised by the abysmal performance of the Scheme and OCaml tests however.  Is it simply because their interpreters stink?


Sean
November 08, 2005
pragma wrote:
>
> Out of curiosity, have you tried using Ares' Atomic lib for this task?  I wonder
> what the difference in time would be when compared to 'synchronized'?

See my reply to the OP.  I tried simply removing the 'synchronized' properties entirely and only saw a small performance increase (less than 0.1 seconds average).  I suspect this is because the real time consumer in this case is thread creation.  I also tried disabling the GC and the test ran slower on average than with it enabled.  It would probably be difficult to optimize this test to perform noticeably better as the 500 threads need to be created no matter what.


Sean
« First   ‹ Prev
1 2