April 24, 2018
On Tuesday, 24 April 2018 at 13:36:48 UTC, Steven Schveighoffer wrote:
> On 4/24/18 5:11 AM, bauss wrote:
>> On Tuesday, 24 April 2018 at 07:58:01 UTC, Radu wrote:
>>> On Tuesday, 24 April 2018 at 00:46:39 UTC, Byron Heads wrote:
>>>>
>>>> Fibers on Win32 have a memory leak for sure:
>>>>
>>>> import core.thread : Fiber;
>>>>
>>>> void main() {
>>>>
>>>>     foreach(ulong i; 0..99_999) {
>>>>         auto foo = new Foo();
>>>>         foo.call();
>>>>         foo.call();
>>>>     }
>>>> }
>>>>
>>>>
>>>> class Foo : Fiber {
>>>>     this() {
>>>>         super(&run);
>>>>     }
>>>>
>>>>
>>>>     void run() {
>>>>         Fiber.yield();
>>>>     }
>>>> }
>>>>
>>>>
>>>> Running this with -m64 on windows runs without a problem, but with -m32 it failes aith a Memory Allocation failed error.
>>>
>>> This is not a fiber issue but a more memory management issue. Your run out of address space on win32, the GC will not always collect all those 99999 fibers that you allocate in that loop. As an exercise replace `auto` with `scope` like `scope foo = new Foo();` in that loop - you should see different results.
>
> This shouldn't be a requirement, the 32-bit GC is generally not this bad.
>
>>>
>>> The issue boils down to the call to VirtualAlloc that the fiber constructor makes, which fails as Windows will not allocate any more virtual pages for the process. If you really want that many fibers you should reconsider 32 bit, and definitely use a different allocation strategy.
>> 
>> And in the end of the day it makes no sense to have that many fibers as they would probably perform terrible.
>
> Let's not forget though, that he's executing the fiber completely within the loop. It should be done and collected.
>
> This is not the case of executing 100,000 concurrent fibers, but executing 100,000 *sequential* fibers. It should work just fine.
>
> -Steve

Correct, in a normal run of my system there maybe 10-20 fibers max alive. I was using threads only before, but found the system to execute jobs in a balanced way. But using a few threads to process Fibers keep in a queue balances out the work evenly. It is also easier to track down bugs by just using 1 thread to process the fiber pool.

I would love to use dip1000, Allocators, and shared. But none of that stuff really works beyond trivial examples. (Allocators probably works fine, but there are forum post about it changing and I dont want to refactor it twice..)

I will start ignoring win32 when win64 doesn't require dealing with visual studio installs.
Also I have a feeling a client will ask for it.

April 24, 2018
On 4/24/18 10:16 AM, Radu wrote:
> On Tuesday, 24 April 2018 at 13:36:48 UTC, Steven Schveighoffer wrote:
>> On 4/24/18 5:11 AM, bauss wrote:
>>> On Tuesday, 24 April 2018 at 07:58:01 UTC, Radu wrote:
>>>> On Tuesday, 24 April 2018 at 00:46:39 UTC, Byron Heads wrote:
>>>>> [...]
>>>>
>>>> This is not a fiber issue but a more memory management issue. Your run out of address space on win32, the GC will not always collect all those 99999 fibers that you allocate in that loop. As an exercise replace `auto` with `scope` like `scope foo = new Foo();` in that loop - you should see different results.
>>
>> This shouldn't be a requirement, the 32-bit GC is generally not this bad.
>>
> 
> Allocating so many fibers in a loop produces an OOM error on win32, that's a fact! Event though it doesn't always happen you often get OOM errors with the program above.

I'm not saying it doesn't happen, just that it *shouldn't* happen. At least not for small sized chunks like this.

I want to emphasize that the program is allocating and releasing to the GC 1 fiber at a time -- loop or no loop, this should work (more or less) reliably, or Win32 has some more serious issues.

This isn't even a case of multi-threading, there are no extra threads here.

> Probably the cause is related to how often the collection kicks in in relation to the pages allocated via VirtualAlloc. But still, the issue OP raised is not a Fiber related issue but a memory management issue.

Collections usually happen more often than they should. The biggest issue with 32-bit arch is that random stack data often keeps large blocks of memory from being collected. As those build up, the situation gets worse until you have no more memory left.

But smaller chunks like a 16k Fiber stack should work without issues.

It should be simple to prove, instead of allocating a fiber, allocate a similar sized chunk of bytes.

-Steve
April 24, 2018
On 4/24/18 10:31 AM, Byron Heads wrote:
> On Tuesday, 24 April 2018 at 13:36:48 UTC, Steven Schveighoffer wrote:
>> This is not the case of executing 100,000 concurrent fibers, but executing 100,000 *sequential* fibers. It should work just fine.
>>
> 
> Correct, in a normal run of my system there maybe 10-20 fibers max alive. I was using threads only before, but found the system to execute jobs in a balanced way. But using a few threads to process Fibers keep in a queue balances out the work evenly. It is also easier to track down bugs by just using 1 thread to process the fiber pool.
> 
> I would love to use dip1000, Allocators, and shared. But none of that stuff really works beyond trivial examples. (Allocators probably works fine, but there are forum post about it changing and I dont want to refactor it twice..)

stdx.allocator (https://github.com/dlang-community/stdx-allocator and http://code.dlang.org/packages/stdx-allocator) is "stable", you can use that instead of the one inside phobos. This way, even if phobos introduces breaking changes, you can depend on a specific version of the allocators. DIP1000 is very much a work in progress, I'm unsure if/when it will become usable. Shared likely isn't going to get any better until the main players start focusing on it. Right now, I think they are more interested in memory safety.

> I will start ignoring win32 when win64 doesn't require dealing with visual studio installs.
> Also I have a feeling a client will ask for it.

Unfortunately I don't think the VS license will ever allow us to avoid installing VS as well.

My recommendation is just to ignore Win32. I wouldn't trust it at all. There are serious threading issues there, and the GC is prone to run out of memory if you aren't careful.

But I can understand if you can't go that route.

Another thing to try is -m32mscoff, which creates 32-bit binaries, but links against Microsoft's runtime instead of DMD. While this is a stated problem from you, it may help to determine if it's really the digital mars library or something more inherent in the way Fibers or the GC is working.

-Steve
April 24, 2018
On Tuesday, 24 April 2018 at 16:22:04 UTC, Steven Schveighoffer wrote:
> On 4/24/18 10:31 AM, Byron Heads wrote:
>> I will start ignoring win32 when win64 doesn't require dealing with visual studio installs.
>> Also I have a feeling a client will ask for it.
>
> Unfortunately I don't think the VS license will ever allow us to avoid installing VS as well.

DMD doesn't require VS anymore since v2.079.
April 24, 2018
On 4/24/18 12:49 PM, kinke wrote:
> On Tuesday, 24 April 2018 at 16:22:04 UTC, Steven Schveighoffer wrote:
>> On 4/24/18 10:31 AM, Byron Heads wrote:
>>> I will start ignoring win32 when win64 doesn't require dealing with visual studio installs.
>>> Also I have a feeling a client will ask for it.
>>
>> Unfortunately I don't think the VS license will ever allow us to avoid installing VS as well.
> 
> DMD doesn't require VS anymore since v2.079.

Oh? That's good news. What do you need to install instead? Or do we include the SDK library directly?

I had thought there were licensing issues, but glad to be wrong!

-Steve
April 25, 2018
On 25/04/2018 5:13 AM, Steven Schveighoffer wrote:
> On 4/24/18 12:49 PM, kinke wrote:
>> On Tuesday, 24 April 2018 at 16:22:04 UTC, Steven Schveighoffer wrote:
>>> On 4/24/18 10:31 AM, Byron Heads wrote:
>>>> I will start ignoring win32 when win64 doesn't require dealing with visual studio installs.
>>>> Also I have a feeling a client will ask for it.
>>>
>>> Unfortunately I don't think the VS license will ever allow us to avoid installing VS as well.
>>
>> DMD doesn't require VS anymore since v2.079.
> 
> Oh? That's good news. What do you need to install instead? Or do we include the SDK library directly?
> 
> I had thought there were licensing issues, but glad to be wrong!
> 
> -Steve

We are not providing it, its coming straight from MS, no licensing issues.
April 24, 2018
On Tuesday, 24 April 2018 at 16:05:48 UTC, Steven Schveighoffer wrote:
> On 4/24/18 10:16 AM, Radu wrote:
>> On Tuesday, 24 April 2018 at 13:36:48 UTC, Steven Schveighoffer wrote:
>>> On 4/24/18 5:11 AM, bauss wrote:
>>>> On Tuesday, 24 April 2018 at 07:58:01 UTC, Radu wrote:
>>>>> On Tuesday, 24 April 2018 at 00:46:39 UTC, Byron Heads wrote:
>>>>>> [...]
>>>>>
>>>>> This is not a fiber issue but a more memory management issue. Your run out of address space on win32, the GC will not always collect all those 99999 fibers that you allocate in that loop. As an exercise replace `auto` with `scope` like `scope foo = new Foo();` in that loop - you should see different results.
>>>
>>> This shouldn't be a requirement, the 32-bit GC is generally not this bad.
>>>
>> 
>> Allocating so many fibers in a loop produces an OOM error on win32, that's a fact! Event though it doesn't always happen you often get OOM errors with the program above.
>
> I'm not saying it doesn't happen, just that it *shouldn't* happen. At least not for small sized chunks like this.
>
> I want to emphasize that the program is allocating and releasing to the GC 1 fiber at a time -- loop or no loop, this should work (more or less) reliably, or Win32 has some more serious issues.
>

Changing main to
---
void main(string[] args)
{
    import core.memory;
    foreach(ulong i; 0..99_999) {
        auto foo = new Foo();
        foo.call();
        foo.call();
        if (i % 10000) // <-- this
            GC.collect();
    }
}
---

makes the OOM error go away.

April 24, 2018
On 4/24/18 3:45 PM, Radu wrote:
> On Tuesday, 24 April 2018 at 16:05:48 UTC, Steven Schveighoffer wrote:
>> On 4/24/18 10:16 AM, Radu wrote:
>>> On Tuesday, 24 April 2018 at 13:36:48 UTC, Steven Schveighoffer wrote:
>>>> On 4/24/18 5:11 AM, bauss wrote:
>>>>> On Tuesday, 24 April 2018 at 07:58:01 UTC, Radu wrote:
>>>>>> On Tuesday, 24 April 2018 at 00:46:39 UTC, Byron Heads wrote:
>>>>>>> [...]
>>>>>>
>>>>>> This is not a fiber issue but a more memory management issue. Your run out of address space on win32, the GC will not always collect all those 99999 fibers that you allocate in that loop. As an exercise replace `auto` with `scope` like `scope foo = new Foo();` in that loop - you should see different results.
>>>>
>>>> This shouldn't be a requirement, the 32-bit GC is generally not this bad.
>>>>
>>>
>>> Allocating so many fibers in a loop produces an OOM error on win32, that's a fact! Event though it doesn't always happen you often get OOM errors with the program above.
>>
>> I'm not saying it doesn't happen, just that it *shouldn't* happen. At least not for small sized chunks like this.
>>
>> I want to emphasize that the program is allocating and releasing to the GC 1 fiber at a time -- loop or no loop, this should work (more or less) reliably, or Win32 has some more serious issues.
>>
> 
> Changing main to
> ---
> void main(string[] args)
> {
>      import core.memory;
>      foreach(ulong i; 0..99_999) {
>          auto foo = new Foo();
>          foo.call();
>          foo.call();
>          if (i % 10000) // <-- this
>              GC.collect();
>      }
> }
> ---
> 
> makes the OOM error go away.

This made it click for me -- VirtualAlloc does NOT allocate from the GC. I had mistakenly thought the GC was being used to allocate the stack. This means there is little to no pressure on the GC to run a collection even though all the memory is being consumed. In other words, the runtime has plenty of space to allocate the Fiber class (which is likely about 64 or 128 bytes per instance), and is consuming all the memory via VirtualAlloc.

I also noticed, Windows default stack size is 32k, not 16k (as it is on other systems), so 100,000 stacks in that case is 3.2GB. That's too much for sure.

I'll file an issue. We may not be able to solve the problem, but it's something we should try and solve.

Thanks

-Steve
April 24, 2018
On 4/24/18 4:30 PM, Steven Schveighoffer wrote:

> I'll file an issue. We may not be able to solve the problem, but it's something we should try and solve.

Seems there's already a similar issue in there: https://issues.dlang.org/show_bug.cgi?id=3523

-Steve
April 25, 2018
On Friday, 20 April 2018 at 18:58:36 UTC, Byron Moxie wrote:
> [...]
> In WIN32 it looks like its leaking memory

Unless there is something I'm misunderstanding, it seems that Fibers that were not run to completion won't unroll their stack, which would mean that some destructors wouldn't be called, and possibly, some memory wouldn't be freed:

https://github.com/dlang/druntime/blob/86cd40a036a67d9b1bff6c14e91cba1e5557b119/src/core/thread.d#L4142

Could this have something to do with the problem?