June 14, 2020 Re: Oh, my GoD! Goroutines on D | ||||
---|---|---|---|---|
| ||||
Posted in reply to Jin | On Monday, 25 May 2020 at 16:41:06 UTC, Jin wrote: >> FYI: channels are also part of vibe-core since a while: >> >> https://github.com/vibe-d/vibe-core/blob/master/source/vibe/core/channel.d > > Yes. But it uses mutex. My implementation is wait-free (https://en.wikipedia.org/wiki/Non-blocking_algorithm#Wait-freedom). All threads can easy and fast write and read without any locking. So every queue is 1-provider-1-consumer. But Input and Output channels is roundrobin list of queues. You can found some diagrams there: https://github.com/nin-jin/go.d/blob/master/readme.drawio.svg Just saw this, for 1-provider-1-consumer queue, I did some experiments with https://code.dlang.org/packages/lock-free and the result is here: https://github.com/mingwugmail/liblfdsd/tree/master/comparison LDC is ~5x times faster than DMD, I'm not so sure why; and this package may need more stress testing: ------------------------------------------------------------------------ $ make ldcrun ... received 1000000000 messages in 9845 msec sum=499999999500000000 speed=101574 msg/msec $ make dmdrun ... received 1000000000 messages in 53607 msec sum=499999999500000000 speed=18654 msg/msec $ make javamp 10000000 messages received in 1151.0 ms, sum=49999995000000 speed: 0.1151 microsec/message, 8688.097306689835 messages/msec $ make dmp received 100000000 messages in 27574 msec sum=4999999950000000 speed=3626 msg/msec ------------------------------------------------------------------------ |
June 14, 2020 Re: Oh, my GoD! Goroutines on D | ||||
---|---|---|---|---|
| ||||
Posted in reply to mw | On Sunday, 14 June 2020 at 19:49:46 UTC, mw wrote: > On Sunday, 14 June 2020 at 17:10:14 UTC, mw wrote: >> Have you tried lock-free queue? >> >> https://www.liblfds.org/mediawiki/index.php?title=r7.1.1:Queue_(unbounded,_many_producer,_many_consumer) >> >> Java uses the same algorithm for ConcurrentLinkedQueue (in C implementation). >> >> I tried some small examples with liblfds, got slightly better performance than Java. Maybe we don’t want to reinvent the wheels, esp the well tested ones. > > You can try it here: > > https://github.com/mingwugmail/liblfdsd > > only https://www.liblfds.org/mediawiki/index.php?title=r7.1.1:Queue_(bounded,_many_producer,_many_consumer) for now. > > ``` > received 100000000 messages in 4632 msec sum=4999999950000000 speed=21588 msg/msec > ``` My wheels are quite simple: https://github.com/nin-jin/go.d/blob/master/source/jin/go/queue.d I have written your bench (https://github.com/mingwugmail/liblfdsd/blob/master/liblfds.dpp#L67) with go.d: ``` const int n = 100_000_000; void threadProducer(Output!int queue) { foreach (int i; 0..n) { queue.put(i); } } void main() { Input!int queue; go!threadProducer(queue.pair); StopWatch sw; sw.start(); long sum = 0; foreach (p; queue) { sum += p; } sw.stop(); writefln("received %d messages in %d msec sum=%d speed=%d msg/msec", n, sw.peek.total!"msecs", sum, n / sw.peek.total!"msecs"); assert(sum == (n * (n - 1) / 2)); } ``` The code is simpler. On my laptop it gives: ``` > dub --quiet --build=release received 100000000 messages in 10011 msec sum=4999999950000000 speed=9989 msg/msec ``` |
June 14, 2020 Re: Oh, my GoD! Goroutines on D | ||||
---|---|---|---|---|
| ||||
Posted in reply to Jin | On Sunday, 14 June 2020 at 22:57:25 UTC, Jin wrote: >> https://github.com/mingwugmail/liblfdsd >> speed=21588 msg/msec ... > My wheels are quite simple: https://github.com/nin-jin/go.d/blob/master/source/jin/go/queue.d > ... > speed=9989 msg/msec Can you make a new go.d release to https://github.com/nin-jin/go.d/releases, or create a PR using your jin.go.queue to https://github.com/mingwugmail/liblfdsd/tree/master/comparison So either you or me can run the comparison on the same machine? |
June 15, 2020 Re: Oh, my GoD! Goroutines on D | ||||
---|---|---|---|---|
| ||||
Posted in reply to Jin | On Sunday, 14 June 2020 at 22:57:25 UTC, Jin wrote: > My wheels are quite simple: https://github.com/nin-jin/go.d/blob/master/source/jin/go/queue.d > ... > received 100000000 messages in 10011 msec sum=4999999950000000 speed=9989 msg/msec Since you are using 1-provider-1-consumer queue, can you try this package as a drop-in replacement in go.d: https://github.com/MartinNowak/lock-free/blob/master/src/lock_free/rwqueue.d (you may need to add align() to that implementation). according to my test: received 1000000000 messages in 9845 msec sum=499999999500000000 speed=101574 msg/msec you may get ~10x boost. |
June 15, 2020 Re: Oh, my GoD! Goroutines on D | ||||
---|---|---|---|---|
| ||||
Posted in reply to mw | On Monday, 15 June 2020 at 01:55:27 UTC, mw wrote: > On Sunday, 14 June 2020 at 22:57:25 UTC, Jin wrote: >> My wheels are quite simple: https://github.com/nin-jin/go.d/blob/master/source/jin/go/queue.d >> > ... >> received 100000000 messages in 10011 msec sum=4999999950000000 speed=9989 msg/msec > > Since you are using 1-provider-1-consumer queue, can you try this package as a drop-in replacement in go.d: > > https://github.com/MartinNowak/lock-free/blob/master/src/lock_free/rwqueue.d > > (you may need to add align() to that implementation). > > according to my test: > > received 1000000000 messages in 9845 msec sum=499999999500000000 speed=101574 msg/msec > > you may get ~10x boost. I have added a new release and a PR to your repo. But I don't think it's a good idea to replace jin.go.queue by lock_free.rwqueue because: 1. My API is std.range compatible. 2. I use the same API for queue and channels. 3. My API supports finalization (by provider or consumer or both). 4. Your queue is fixed size. But my channels operate with set of queues that can have different sizes. But I would steal some optimization: 1. Power of 2 capacity and bitop instead of division. It can add 10% performance. 2. The cache line size should be a minimum size of the message to prevent false sharing. You are using atomicLoad!(MemoryOrder.acq) at there: https://github.com/MartinNowak/lock-free/blob/master/src/lock_free/rwqueue.d#L41 Is this really required? CPU can't reorder dependent statements. |
June 15, 2020 Re: Oh, my GoD! Goroutines on D | ||||
---|---|---|---|---|
| ||||
Posted in reply to Jin | On Monday, 15 June 2020 at 09:09:23 UTC, Jin wrote: > I have added a new release and a PR to your repo. on my same machine, go.d: received 100000000 messages in 2906 msec sum=4999999950000000 speed=34411 msg/msec so, it's ~2.7x faster than Java: https://github.com/mingwugmail/liblfdsd/tree/master/comparison And your https://github.com/nin-jin/go.d on my machine go.d is 2~4 times slower than Go. 9.7638ms (Go) v.s [19 ms ~ 40 ms] (go.d) Go's speed is consistent in multiple runs, for go.d it can be 2x difference, maybe because of the scheduler is unstable? > But I don't think it's a good idea to replace jin.go.queue by lock_free.rwqueue because: I just want to do a comparison. > You are using atomicLoad!(MemoryOrder.acq) at there: https://github.com/MartinNowak/lock-free/blob/master/src/lock_free/rwqueue.d#L41 > Is this really required? CPU can't reorder dependent statements. It's not mine, but @MartinNowak's. The implementation is based on https://www.codeproject.com/Articles/43510/Lock-Free-Single-Producer-Single-Consumer-Circular |
June 16, 2020 Re: Oh, my GoD! Goroutines on D | ||||
---|---|---|---|---|
| ||||
Posted in reply to mw | On Monday, 15 June 2020 at 22:04:49 UTC, mw wrote: > On Monday, 15 June 2020 at 09:09:23 UTC, Jin wrote: >> I have added a new release and a PR to your repo. > > on my same machine, go.d: > > received 100000000 messages in 2906 msec sum=4999999950000000 speed=34411 msg/msec > > so, it's ~2.7x faster than Java: > > https://github.com/mingwugmail/liblfdsd/tree/master/comparison > > And your https://github.com/nin-jin/go.d on my machine > > go.d is 2~4 times slower than Go. > > 9.7638ms (Go) v.s [19 ms ~ 40 ms] (go.d) > > Go's speed is consistent in multiple runs, for go.d it can be 2x difference, maybe because of the scheduler is unstable? > > >> But I don't think it's a good idea to replace jin.go.queue by lock_free.rwqueue because: > > I just want to do a comparison. > >> You are using atomicLoad!(MemoryOrder.acq) at there: https://github.com/MartinNowak/lock-free/blob/master/src/lock_free/rwqueue.d#L41 >> Is this really required? CPU can't reorder dependent statements. > > It's not mine, but @MartinNowak's. The implementation is based on > > https://www.codeproject.com/Articles/43510/Lock-Free-Single-Producer-Single-Consumer-Circular There's a SPMC/MPSC queue implementation here: https://github.com/weka-io/mecca/blob/master/src/mecca/containers/otm_queue.d that may be also interesting for you guys to check. |
Copyright © 1999-2021 by the D Language Foundation