May 19, 2020
On Tue, 2020-05-19 at 09:15 +0000, Seb via Digitalmars-d wrote: […]
> FYI: channels are also part of vibe-core since a while:
> 
> https://github.com/vibe-d/vibe-core/blob/master/source/vibe/core/channel.d

I will have to investigate. Sounds like vibe.d can be used as a tasks with channels on a threadpool.

-- 
Russel.
===========================================
Dr Russel Winder      t: +44 20 7585 2200
41 Buckmaster Road    m: +44 7770 465 077
London SW11 1EN, UK   w: www.russel.org.uk



May 25, 2020
On Saturday, 16 May 2020 at 20:06:47 UTC, mw wrote:
> On Tuesday, 29 March 2016 at 17:10:02 UTC, Jin wrote:
>>
>> http://wiki.dlang.org/Go_to_D
>
> Any performance comparison with Go? esp. in real word scenario?
>
> Can it easily handle hundreds of (go)routines?

I have updated the code. But it isn't ready to use currently because:

1. I rewrote code to use std.parallelism instead of vibe.d. So, it's difficult to integrate fibers with tasks. Now, every tasks spinlocks on waiting channel and main thread don't useful work.

2. Race condition. I'm going to closely review algorithm.

Currently it's twice slower than Go. On y machine:

>go run app.go --release
Workers Result          Time
4       499500000       27.9226ms

>dub --quiet --build=release
Workers Result          Time
3       499500000       64 ms

It would be cool if someone help me with it. There are docstrings, tests and diagrams. I'll explain more if someone joins.
May 25, 2020
On Tuesday, 19 May 2020 at 09:15:24 UTC, Seb wrote:
> On Sunday, 17 May 2020 at 15:17:44 UTC, Russel Winder wrote:
>> On Sat, 2020-05-16 at 20:06 +0000, mw via Digitalmars-d wrote:
>>> On Tuesday, 29 March 2016 at 17:10:02 UTC, Jin wrote:
>>> > http://wiki.dlang.org/Go_to_D
>>> 
>>> Any performance comparison with Go? esp. in real word scenario?
>>> 
>>> Can it easily handle hundreds of (go)routines?
>>
>> Seems to have been created four years ago and then left fallow. Perhaps it should be resurrected  and integrated into Phobos? Or left as a package in the Dub repository?
>
> FYI: channels are also part of vibe-core since a while:
>
> https://github.com/vibe-d/vibe-core/blob/master/source/vibe/core/channel.d

Yes. But it uses mutex. My implementation is wait-free (https://en.wikipedia.org/wiki/Non-blocking_algorithm#Wait-freedom). All threads can easy and fast write and read without any locking. So every queue is 1-provider-1-consumer. But Input and Output channels is roundrobin list of queues. You can found some diagrams there: https://github.com/nin-jin/go.d/blob/master/readme.drawio.svg
May 26, 2020
On Monday, 25 May 2020 at 16:26:31 UTC, Jin wrote:
> On Saturday, 16 May 2020 at 20:06:47 UTC, mw wrote:
>> On Tuesday, 29 March 2016 at 17:10:02 UTC, Jin wrote:
>>>
>>> http://wiki.dlang.org/Go_to_D
>>
>> Any performance comparison with Go? esp. in real word scenario?
>>
>> Can it easily handle hundreds of (go)routines?
>
> I have updated the code. But it isn't ready to use currently because:
>
> 1. I rewrote code to use std.parallelism instead of vibe.d. So, it's difficult to integrate fibers with tasks. Now, every tasks spinlocks on waiting channel and main thread don't useful work.
>
> 2. Race condition. I'm going to closely review algorithm.
>
> [...]
>
> It would be cool if someone help me with it. There are docstrings, tests and diagrams. I'll explain more if someone joins.

This is a problem that's of interest to me as well, and I've been working on this for a few months (on and off).
I had to eventually ditch `std.concurrency` because of some design decisions that made things hard to work with.

`std.concurrency`'s MessageBox were originally designed to be only between threads. As such, they come with all the locking you'd expect from a cross-thread message-passing data structure. Support for fibers was added as an afterthought. You can even see it in the documentation (https://dlang.org/phobos/std_concurrency.html), where "thread" is mentioned all over the place. The module doc kinda makes it get away with it because it calls fibers "logical threads", but that distinction is not always made. It also have some concept that make a lot of sense for threads, but much less so for Fibers (such as the "owner" concept, which is the task that `spawn`ed you). Finally, it forces messages to be `shared` or isolated (read: with only `immutable` indirections), which doesn't make sense when you're dealing only with Fibers on the same thread.

We found some ridiculous issues when trying to use it. We upstreamed some fixes (https://github.com/dlang/phobos/pull/7096, https://github.com/dlang/phobos/pull/6738) and put a bounty on one of the issue which led to someone finding the bug in `std.concurrency` (https://github.com/Geod24/localrest/pull/5#issuecomment-523707490). After some playing around with it, we just gave up and forked the whole module and started to change it to make it behave more like channels. There are some other issues I found while refactoring which I might upstream in the future, but it needs so much work that I might as well PR a whole new module.

What we're trying to achieve is to move from a MessageBox approach, where there is a 1-to-1 relationship between a task (or logical thread) and a MessageBox, to a channel-like model, where there is a N-to-1 relationship (See Go's select).

In order to achieve Go-like performance, we need a few things though:
- Direct hand-off semantic for same-thread message passing: Meaning that if Fiber A sends a message to Fiber B, and they are both in the same thread, there is an immediate context switch from A to B, without going through the scheduler;
- Thread-level multiplexing of receive: With the current `std.concurrency`, calling `receive` yield the fiber and might block the Thread. The scheduler simply iterate over all Fibers in a linear order, which means you could end up in a situation where, if you have 3 Fibers, and they all `receive` one after the other, you'll end up being blocked on the *first* one receiving a message to wait the other ones up.
- Smaller Fibers: Goroutine can have very, very small stack. They don't stack overflow because they are managed (whenever you need to allocate more stack, there use to be a check for stack overflow, and stack "regions" were/are essentially a linked list and need not be contiguous in memory). On the other hand we use simple regular fiber context switching, which is much more expensive. In that area, I think exploring the idea of a stackless coroutine based scheduler could be worthwhile.

This google doc has a lot of good informations, if you're interested: https://docs.google.com/document/d/1yIAYmbvL3JxOKOjuCyon7JhW4cSv1wy5hC0ApeGMV9s/pub

It's still a problem we're working on, as some issues are unique to D and we haven't found a good solution (e.g. requiring `shared` for same-thread Fiber communication is quite problematic). If we ever reach a satisfying solution I'll try upstreaming it.
May 26, 2020
On Tuesday, 26 May 2020 at 01:27:49 UTC, Mathias LANG wrote:
> - Direct hand-off semantic for same-thread message passing: Meaning that if Fiber A sends a message to Fiber B, and they are both in the same thread, there is an immediate context switch from A to B, without going through the scheduler;

I believe Weka did that with their own fiber implementation in Mecca. I think I remember  Shachar mentioning this during his talk at DConf (2018?)
May 26, 2020
On Tue, 2020-05-26 at 01:27 +0000, Mathias LANG via Digitalmars-d wrote: […]
> 
> This is a problem that's of interest to me as well, and I've been
> working on this for a few months (on and off).
> I had to eventually ditch `std.concurrency` because of some
> design decisions that made things hard to work with.
[…]

I am fairly sure std.parallelism is a better place to get threadpools, tasks, scheduling, work stealing, etc. However it is all packaged with a view to implementing SMP parallelism in D.

I haven't been following, but many others including Vibe.d have implemented either fibres and yield or tasks/threadpools and channels – the bit missing from std.parallelism since it isn't needed for SMP parallelism, but is if you take the tasks and threadpools out of that context.

What has happened in Rust, and to a great extent in the JVM arena is that there has been an implementation of fibres/yield, futures and async, and/or task and threadpool that has been centralised and then everyone else has evolved to use it rather than having multiple implementations of all the ideas. In the JVM milieu there is still a lot of NIH replication but then they have lots of money and resources.

Strategically if there were to be one set of Dub packages doing this low level
stuff that people worked on that then everyone else used that would be good.
Then the question is whether to deprecate std.parallelism and rebuild it based
on the new low level code. Answer yes. Perhaps the std.parallelism stuff could
actually provide a basis for some of this low level code along with vibe.core
stuff and mayhap the Mecca stuff. My feeling is the time for everyone
implements their own is long past, it is time for all to join in on a standard
for set of tools for D. This includes removing the Fibres stuff from
std.concurrency.

So yes I am up for contributing.

-- 
Russel.
===========================================
Dr Russel Winder      t: +44 20 7585 2200
41 Buckmaster Road    m: +44 7770 465 077
London SW11 1EN, UK   w: www.russel.org.uk



June 14, 2020
On Monday, 25 May 2020 at 16:26:31 UTC, Jin wrote:
> On Saturday, 16 May 2020 at 20:06:47 UTC, mw wrote:
>> On Tuesday, 29 March 2016 at 17:10:02 UTC, Jin wrote:
>>>
>>> http://wiki.dlang.org/Go_to_D
>>
>> Any performance comparison with Go? esp. in real word scenario?
>>
>> Can it easily handle hundreds of (go)routines?
>
> I have updated the code. But it isn't ready to use currently because:
>
> 1. I rewrote code to use std.parallelism instead of vibe.d. So, it's difficult to integrate fibers with tasks. Now, every tasks spinlocks on waiting channel and main thread don't useful work.
>
> 2. Race condition. I'm going to closely review algorithm.
>
> Currently it's twice slower than Go. On y machine:
>
>>go run app.go --release
> Workers Result          Time
> 4       499500000       27.9226ms
>
>>dub --quiet --build=release
> Workers Result          Time
> 3       499500000       64 ms
>
> It would be cool if someone help me with it. There are docstrings, tests and diagrams. I'll explain more if someone joins.

I have fixed all issues, and it's usable now. But I had to return vibe-core dependency. Now it's slow down:

> .\compare.cmd
>go run app.go --release
Workers Result          Time
4       4999500000      25.9163ms
>dub --quiet --build=release
Workers Result          Time
4       4999500000      116 ms

And I had to reduce the count of "threads" to 100 because vibe-core fails on 1000.

And I have created thread on dlang/project with an explanation of my vision of concurrency in D: https://github.com/dlang/projects/issues/65
June 14, 2020
On Sunday, 14 June 2020 at 14:24:29 UTC, Jin wrote:
> On Monday, 25 May 2020 at 16:26:31 UTC, Jin wrote:
>> On Saturday, 16 May 2020 at 20:06:47 UTC, mw wrote:
>>> On Tuesday, 29 March 2016 at 17:10:02 UTC, Jin wrote:
>>>>
>>>> http://wiki.dlang.org/Go_to_D
>>>
>>> Any performance comparison with Go? esp. in real word scenario?

...

>> I have updated the code. But it isn't ready to use currently because:
>>
>> 1. I rewrote code to use std.parallelism instead of vibe.d. So, it's difficult to integrate fibers with tasks. Now, every tasks spinlocks on waiting channel and main thread don't useful work.
>>
>> 2. Race condition. I'm going to closely review algorithm.
>>
>> Currently it's twice slower than Go. On y machine:

...

> I have fixed all issues, and it's usable now. But I had to return vibe-core dependency. Now it's slow down:
>
>> .\compare.cmd
>>go run app.go --release
> Workers Result          Time
> 4       4999500000      25.9163ms
>>dub --quiet --build=release
> Workers Result          Time
> 4       4999500000      116 ms
>
> And I had to reduce the count of "threads" to 100 because vibe-core fails on 1000.
>
> And I have created thread on dlang/project with an explanation of my vision of concurrency in D: https://github.com/dlang/projects/issues/65

I haven’t checked your implementation, or vibe’s, but I rediscovered that D’s message passage passing is ~4 times slower than Java:

https://forum.dlang.org/thread/mailman.148.1328778563.20196.digitalmars-d@puremagic.com?page=4

Is this the same problem in GoD?






June 14, 2020
On Tuesday, 26 May 2020 at 01:27:49 UTC, Mathias LANG wrote:
> On Monday, 25 May 2020 at 16:26:31 UTC, Jin wrote:
>> On Saturday, 16 May 2020 at 20:06:47 UTC, mw wrote:
>>> On Tuesday, 29 March 2016 at 17:10:02 UTC, Jin wrote:
>>>>
>>>> http://wiki.dlang.org/Go_to_D
>>>
>>> Any performance comparison with Go? esp. in real word scenario?
...
> This is a problem that's of interest to me as well, and I've been working on this for a few months (on and off).
> I had to eventually ditch `std.concurrency` because of some design decisions that made things hard to work with.
>
> `std.concurrency`'s MessageBox were originally designed to be only between threads. As such, they come with all the locking you'd expect from a cross-thread message-passing data structure.
...
> It's still a problem we're working on, as some issues are unique to D and we haven't found a good solution (e.g. requiring `shared` for same-thread Fiber communication is quite problematic). If we ever reach a satisfying solution I'll try upstreaming it.
...

Have you tried lock-free queue?

https://www.liblfds.org/mediawiki/index.php?title=r7.1.1:Queue_(unbounded,_many_producer,_many_consumer)

Java uses the same algorithm for ConcurrentLinkedQueue (in C implementation).

I tried some small examples with liblfds, got slightly better performance than Java. Maybe we don’t want to reinvent the wheels, esp the well tested ones.



June 14, 2020
On Sunday, 14 June 2020 at 17:10:14 UTC, mw wrote:
> Have you tried lock-free queue?
>
> https://www.liblfds.org/mediawiki/index.php?title=r7.1.1:Queue_(unbounded,_many_producer,_many_consumer)
>
> Java uses the same algorithm for ConcurrentLinkedQueue (in C implementation).
>
> I tried some small examples with liblfds, got slightly better performance than Java. Maybe we don’t want to reinvent the wheels, esp the well tested ones.

You can try it here:

https://github.com/mingwugmail/liblfdsd

only https://www.liblfds.org/mediawiki/index.php?title=r7.1.1:Queue_(bounded,_many_producer,_many_consumer) for now.

```
received 100000000 messages in 4632 msec sum=4999999950000000 speed=21588 msg/msec
```