July 28, 2021

On Wednesday, 21 July 2021 at 22:51:38 UTC, hanabi1224 wrote:

>

Hi, I'm new to D lang and encounter some performance issues with fiber, not sure if there's something obviously wrong with my code.

I took a quick look, and the first problem I saw was that you were using spawnLinked but not replacing the scheduler.
std.concurrency uses a global scheduler variable to do its job.
Hence doing:

- auto scheduler = new FiberScheduler();
+ scheduler = new FiberScheduler();

Will ensure that spawnLinked works as expected.
There are a few other things to consider, w.r.t. fibers:

  • Our Fibers are 4 pages (on Linux) by default;
  • We have an extra guard page, because we are a native language, so we can't do the same trick as Go to auto-grow the stack;
  • Spawning fibers is expensive and other languages reuse fibers (Yes, Go recycle them);

The FiberScheduler implementation is unfortunately pretty bad: it does not re-use fibers. I believe this is the core of the issue.

July 28, 2021
28.07.2021 17:39, Mathias LANG пишет:
> On Wednesday, 21 July 2021 at 22:51:38 UTC, hanabi1224 wrote:
>> Hi, I'm new to D lang and encounter some performance issues with fiber, not sure if there's something obviously wrong with my code.
> 
> I took a quick look, and the first problem I saw was that you were using `spawnLinked` but not replacing the scheduler.
> `std.concurrency` uses a global `scheduler` variable to do its job.
> Hence doing:
> ```diff
> - auto scheduler = new FiberScheduler();
> + scheduler = new FiberScheduler();
> ```
> 
> Will ensure that `spawnLinked` works as expected.
> There are a few other things to consider, w.r.t. fibers:
> - Our Fibers are 4 pages (on Linux) by default;
> - We have an extra guard page, because we are a native language, so we can't do the same trick as Go to auto-grow the stack;
> - Spawning fibers *is* expensive and other languages reuse fibers (Yes, Go recycle them);
> 
> The `FiberScheduler` implementation is unfortunately pretty bad: it [does not re-use fibers](https://github.com/dlang/phobos/blob/b48cca57e8ad2dc56872499836bfa1e70e390abb/std/concurrency.d#L1578-L1599). I believe this is the core of the issue.

I profiled the provided example (not `FiberScheduler`) using perf. Both dmd and ldc2 gave the same result - `void filterInner(int, int)` took ~90% of the run time. The time was divided between:
	`int std.concurrency.receiveOnly!(int).receiveOnly()` - 58%
	`void std.concurrency.send!(int).send(std.concurrency.Tid, int)` - 31%

So most of the time is messages passing.

Between the fibers creating took very few time. Perf output contains information only of `void std.concurrency.FiberScheduler.create(void delegate()).wrap()` which took less than 0.5%. But I wouldn't say that I did the profiling ideally so take it with a grain of salt.
July 28, 2021
On 7/28/21 1:15 AM, hanabi1224 wrote:

> On Wednesday, 28 July 2021 at 01:12:16 UTC, Denis Feklushkin wrote:
>> Spawning fiber is expensive
>
> Sorry but I cannot agree with the logic behind this statement, the whole
> point of using fiber is that, spwaning system thread is expensive, thus
> ppl create lightweight thread 'fiber'

I assume the opposite because normally, the number of times a thread or fiber is spawned is nothing compared to the number of times they are context-switched. So, spawning can be expensive and nobody would realize as long as switching is cheap.

There are other reasons why fibers are faster than threads all related to context switching:

- CPU cache efficiency

- Translation lookaside buffer (TLB) efficiency

- Holding on to the entirety of the time slice given by the OS

Ali

P.S. The little I know on these topics is included in this presentation:

  https://dconf.org/2016/talks/cehreli.html

July 28, 2021

On Wednesday, 28 July 2021 at 14:39:29 UTC, Mathias LANG wrote:

>

Hence doing:

- auto scheduler = new FiberScheduler();
+ scheduler = new FiberScheduler();

Thanks for pointing it out! Looks like I was benchmarking thread instead of fiber. I just made the change you suggest but the result is very similar, that being said, using system thread or fiber does not make any obvious difference in this test case, this fact itself seems problematic, fiber should be much faster than system thread in this test case (as I have proved for many other langs with the same case, I published results here but note that not all of them are implemented with stackful coroutine), unless there's some defect in D's current fiber implementation.

July 28, 2021

On Wednesday, 28 July 2021 at 16:31:49 UTC, Ali Çehreli wrote:

>

I assume the opposite because normally, the number of times a thread or fiber is spawned is nothing compared to the number of times they are context-switched. So, spawning can be expensive and nobody would realize as long as switching is cheap.

You are right, but that's not my point. whether fiber spawning is expensive should be compared to thread, and it should be much less expensive, ppl can expect to create much more fibers at the same time than system thread, even if it's stackful (that should mostly contribute to heap memory usage, fiber stack size should not be a perf bottleneck before running out of mem). And when analyzing perf issues with fiber, it's not a valid reason to me that 'fiber is expensive' because fiber itself is the solution to the expensiveness of thread, and non of the other fiber implementations in other languages/runtime have the same issue with the same test case.

July 28, 2021

On Wednesday, 28 July 2021 at 16:26:49 UTC, drug wrote:

>

I profiled the provided example (not FiberScheduler) using perf. Both dmd and ldc2 gave the same result - void filterInner(int, int) took ~90% of the run time. The time was divided between:
int std.concurrency.receiveOnly!(int).receiveOnly() - 58%
void std.concurrency.send!(int).send(std.concurrency.Tid, int) - 31%

So most of the time is messages passing.

Between the fibers creating took very few time. Perf output contains information only of void std.concurrency.FiberScheduler.create(void delegate()).wrap() which took less than 0.5%. But I wouldn't say that I did the profiling ideally so take it with a grain of salt.

Very interesting findings! After making the Fiber fix, I also made profiling with valgrind, the result shows MessageBox related staff contributes to ~13.7% of total cycles, swapContex related staff add up to a larger percentage (My rough estimation is >50%), I'd like to share the result svg but did not figure out how to upload here.

July 30, 2021
On Wed, Jul 28, 2021 at 11:41 PM hanabi1224 via Digitalmars-d-learn < digitalmars-d-learn@puremagic.com> wrote:

> On Wednesday, 28 July 2021 at 16:26:49 UTC, drug wrote:
> > I profiled the provided example (not `FiberScheduler`) using
> > perf. Both dmd and ldc2 gave the same result - `void
> > filterInner(int, int)` took ~90% of the run time. The time was
> > divided between:
> >       `int std.concurrency.receiveOnly!(int).receiveOnly()` - 58%
> >       `void std.concurrency.send!(int).send(std.concurrency.Tid,
> > int)` - 31%
> >
> > So most of the time is messages passing.
> >
> > Between the fibers creating took very few time. Perf output
> > contains information only of `void
> > std.concurrency.FiberScheduler.create(void delegate()).wrap()`
> > which took less than 0.5%. But I wouldn't say that I did the
> > profiling ideally so take it with a grain of salt.
>
> Very interesting findings! After making the Fiber fix, I also
> made profiling with valgrind, the result shows MessageBox related
> staff contributes to ~13.7% of total cycles, swapContex related
> staff add up to a larger percentage (My rough estimation is
>  >50%), I'd like to share the result svg but did not figure out
> how to upload here.
>

I have rewrite it to be same as dart version

import std;

void main(string[] args) {
    auto n = args.length > 1 ? args[1].to!int() : 5;

    auto r = new Generator!int(
    {
        for(auto i = 2;;i++)
            yield(i);
    });

    for(auto i=0;i<n;i++)
    {

        auto prime = r.front;
        writeln(prime);
        r = filter(r, prime);

    }

}

Generator!int filter(Generator!int input, int prime)
{

    return new Generator!int(
    {
        while (input.empty is false)
        {
            input.popFront();
            auto i = input.front;
            if (i % prime != 0)
            {
                yield(i);
            }
        }
    });
}


July 30, 2021

On Friday, 30 July 2021 at 14:41:06 UTC, Daniel Kozak wrote:

>

I have rewrite it to be same as dart version

Thanks! There're both generator version and fiber version on the site(if possible), the 2 versions are not really comparable to each other (generator solutions should be much faster). There's another dart implementation with Isolate here, it's unlisted because of very bad performance. (Isolate is the closest thing in dart to thread or fiber but it's much much more expensive to even spawn)

I'd like to list D's generator solution but please note that it's only comparable to the kotlin/c#/python generator solutions while the fiber one is still a separate issue.

1 2
Next ›   Last »