October 25, 2017
On Wednesday, 25 October 2017 at 14:19:14 UTC, Jonathan M Davis wrote:
> On Wednesday, October 25, 2017 09:26:26 Steven Schveighoffer via Digitalmars-d wrote:
>> On 10/23/17 12:56 PM, Brian Schott wrote:
>> > Context: https://issues.dlang.org/show_bug.cgi?id=17914
>> >
>> > I need to get this issue resolved as soon as possible so that the fix makes it into the next compiler release. Because it involves cleanup code in a class destructor a design change may be necessary. Who should I contact to determine the best way to fix this bug?
>>
>> It appears that the limitation applies to mmap calls as well, and mmap call to allocate the stack has been in Fiber since as far as I can tell the beginning. How has this not shown up before?
>
> Maybe there was a change in the OS(es) being used that affected the limit?
>
> - Jonathan M Davis

On linux it's controllable by: `sysctl vm.max_map_count`
October 25, 2017
On Wednesday, 25 October 2017 at 13:26:26 UTC, Steven Schveighoffer wrote:
> On 10/23/17 12:56 PM, Brian Schott wrote:
>> Context: https://issues.dlang.org/show_bug.cgi?id=17914
>> 
>> I need to get this issue resolved as soon as possible so that the fix makes it into the next compiler release. Because it involves cleanup code in a class destructor a design change may be necessary. Who should I contact to determine the best way to fix this bug?
>
> It appears that the limitation applies to mmap calls as well, and mmap call to allocate the stack has been in Fiber since as far as I can tell the beginning. How has this not shown up before?

Although after the stack overflow protection for fibers number of mmap calls is the same, _I think_ mprotect actually splits the single mmapped region into two (as they share different protection bits). This effectively doubles the number of the regions. As a workaround (if you have lots of fibers and doesn't care about stack overflow protection) you can just pass zero for the guardPageSize: https://github.com/dlang/druntime/blob/master/src/core/thread.d#L4037
October 25, 2017
On Wednesday, 25 October 2017 at 15:12:27 UTC, Nemanja Boric wrote:
> On Wednesday, 25 October 2017 at 14:19:14 UTC, Jonathan M Davis wrote:
>> On Wednesday, October 25, 2017 09:26:26 Steven Schveighoffer via Digitalmars-d wrote:
>>> [...]
>>
>> Maybe there was a change in the OS(es) being used that affected the limit?
>>
>> - Jonathan M Davis
>
> Yes, the stack is not immediately unmapped because it's very common
> just to reset the fiber and reuse it for handling the new connection -
> creating new fibers (and doing unmap on termination) is a problem in the
> real life (as this is as well).
>
> At sociomantic we already had this issue: https://github.com/sociomantic-tsunami/tangort/issues/2 Maybe this is the way to go - I don't see a reason why every stack should be mmaped separately.

I'm not sure that would allow us to mprotect the page guards, though.
October 25, 2017
On Wednesday, 25 October 2017 at 15:22:18 UTC, Nemanja Boric wrote:
> On Wednesday, 25 October 2017 at 15:12:27 UTC, Nemanja Boric wrote:
>> On Wednesday, 25 October 2017 at 14:19:14 UTC, Jonathan M Davis wrote:
>>> On Wednesday, October 25, 2017 09:26:26 Steven Schveighoffer via Digitalmars-d wrote:
>>>> [...]
>>>
>>> Maybe there was a change in the OS(es) being used that affected the limit?
>>>
>>> - Jonathan M Davis
>>
>> Yes, the stack is not immediately unmapped because it's very common
>> just to reset the fiber and reuse it for handling the new connection -
>> creating new fibers (and doing unmap on termination) is a problem in the
>> real life (as this is as well).
>>
>> At sociomantic we already had this issue: https://github.com/sociomantic-tsunami/tangort/issues/2 Maybe this is the way to go - I don't see a reason why every stack should be mmaped separately.
>
> I'm not sure that would allow us to mprotect the page guards, though.

I think the easiest way to proceed from here is to default the stack protection page to 0, so we avoid this regression. Once that's in, we can think about different allocation strategy for the fiber's stacks or throwing the exceptions on too many fibers (too bad mmap doesn't return error code, but you get SIGABRT instead :( )

What do you think?
October 25, 2017
On 10/25/17 11:12 AM, Nemanja Boric wrote:
> On Wednesday, 25 October 2017 at 14:19:14 UTC, Jonathan M Davis wrote:
>> On Wednesday, October 25, 2017 09:26:26 Steven Schveighoffer via Digitalmars-d wrote:
>>> On 10/23/17 12:56 PM, Brian Schott wrote:
>>> > Context: https://issues.dlang.org/show_bug.cgi?id=17914
>>> >
>>> > I need to get this issue resolved as soon as possible so > that the 
>>> fix makes it into the next compiler release. > Because it involves cleanup code in a class destructor a > design change may be necessary. Who should I contact to > determine the best way to fix this bug?
>>>
>>> It appears that the limitation applies to mmap calls as well, and mmap call to allocate the stack has been in Fiber since as far as I can tell the beginning. How has this not shown up before?
>>
>> Maybe there was a change in the OS(es) being used that affected the limit?
>>
> 
> Yes, the stack is not immediately unmapped because it's very common
> just to reset the fiber and reuse it for handling the new connection -
> creating new fibers (and doing unmap on termination) is a problem in the
> real life (as this is as well).
> 
> At sociomantic we already had this issue: https://github.com/sociomantic-tsunami/tangort/issues/2 Maybe this is the way to go - I don't see a reason why every stack should be mmaped separately.

Hm... the mprotect docs specifically state that calling mprotect on something that's not allocated via mmap is undefined. So if you use GC to allocate Fiber stacks, you can't mprotect it.

I think what we need is a more configurable way to allocate stacks. There is a tradeoff to mprotect vs. simple allocation, and it's not obvious to choose one over the other.

I still am baffled as to why this is now showing up. Perhaps if you are using mmap as an allocator (as Fiber seems to be doing), it doesn't count towards the limit? Maybe it's just glommed into the standard allocator's space?

-Steve
October 25, 2017
On Wednesday, 25 October 2017 at 15:32:36 UTC, Steven Schveighoffer wrote:
> On 10/25/17 11:12 AM, Nemanja Boric wrote:
>> On Wednesday, 25 October 2017 at 14:19:14 UTC, Jonathan M Davis wrote:
>>> On Wednesday, October 25, 2017 09:26:26 Steven Schveighoffer via Digitalmars-d wrote:
>>>> [...]
>>>
>>> Maybe there was a change in the OS(es) being used that affected the limit?
>>>
>> 
>> Yes, the stack is not immediately unmapped because it's very common
>> just to reset the fiber and reuse it for handling the new connection -
>> creating new fibers (and doing unmap on termination) is a problem in the
>> real life (as this is as well).
>> 
>> At sociomantic we already had this issue: https://github.com/sociomantic-tsunami/tangort/issues/2 Maybe this is the way to go - I don't see a reason why every stack should be mmaped separately.
>
> Hm... the mprotect docs specifically state that calling mprotect on something that's not allocated via mmap is undefined. So if you use GC to allocate Fiber stacks, you can't mprotect it.
>
> I think what we need is a more configurable way to allocate stacks. There is a tradeoff to mprotect vs. simple allocation, and it's not obvious to choose one over the other.
>
> I still am baffled as to why this is now showing up. Perhaps if you are using mmap as an allocator (as Fiber seems to be doing), it doesn't count towards the limit? Maybe it's just glommed into the standard allocator's space?
>
> -Steve

I'm sorry I wrote several messages in the row, as the thoughts were
coming to me. I think the reason is that mprotect creates a new range, since it
needs to have distinct protection attributes, hence doubling the number of mappings.

> Maybe it's just glommed into the standard allocator's space?


No, you get to see each fiber's stack allocated separately when you cat /proc/pid/mappings.
October 25, 2017
On 10/25/17 11:27 AM, Nemanja Boric wrote:
> On Wednesday, 25 October 2017 at 15:22:18 UTC, Nemanja Boric wrote:
>> On Wednesday, 25 October 2017 at 15:12:27 UTC, Nemanja Boric wrote:
>>> On Wednesday, 25 October 2017 at 14:19:14 UTC, Jonathan M Davis wrote:
>>>> On Wednesday, October 25, 2017 09:26:26 Steven Schveighoffer via Digitalmars-d wrote:
>>>>> [...]
>>>>
>>>> Maybe there was a change in the OS(es) being used that affected the limit?
>>>>
>>>> - Jonathan M Davis
>>>
>>> Yes, the stack is not immediately unmapped because it's very common
>>> just to reset the fiber and reuse it for handling the new connection -
>>> creating new fibers (and doing unmap on termination) is a problem in the
>>> real life (as this is as well).
>>>
>>> At sociomantic we already had this issue: https://github.com/sociomantic-tsunami/tangort/issues/2 Maybe this is the way to go - I don't see a reason why every stack should be mmaped separately.
>>
>> I'm not sure that would allow us to mprotect the page guards, though.
> 
> I think the easiest way to proceed from here is to default the stack protection page to 0, so we avoid this regression. Once that's in, we can think about different allocation strategy for the fiber's stacks or throwing the exceptions on too many fibers (too bad mmap doesn't return error code, but you get SIGABRT instead :( )

mmap does return an error. And onOutMemoryError is called when it fails.

https://github.com/dlang/druntime/blob/master/src/core/thread.d#L4518

Which should throw an error:

https://github.com/dlang/druntime/blob/144c9e6e9a3c00aba82b92da527a52190fe91c97/src/core/exception.d#L542

however, when mprotect fails, it calls abort():

https://github.com/dlang/druntime/blob/master/src/core/thread.d#L4540

> What do you think?

I think we should reverse the mprotect default, and try and determine a better way to opt-in to the limit.

Is this a Linux-specific problem? Are there issues with mprotect on other OSes? Or is Linux the only OS that supports mprotect?

-Steve
October 25, 2017
On Wednesday, 25 October 2017 at 15:43:12 UTC, Steven Schveighoffer wrote:
> On 10/25/17 11:27 AM, Nemanja Boric wrote:
>> On Wednesday, 25 October 2017 at 15:22:18 UTC, Nemanja Boric wrote:
>>> On Wednesday, 25 October 2017 at 15:12:27 UTC, Nemanja Boric wrote:
>>>> On Wednesday, 25 October 2017 at 14:19:14 UTC, Jonathan M Davis wrote:
>>>>> On Wednesday, October 25, 2017 09:26:26 Steven Schveighoffer via Digitalmars-d wrote:
>>>>>> [...]
>>>>>
>>>>> Maybe there was a change in the OS(es) being used that affected the limit?
>>>>>
>>>>> - Jonathan M Davis
>>>>
>>>> Yes, the stack is not immediately unmapped because it's very common
>>>> just to reset the fiber and reuse it for handling the new connection -
>>>> creating new fibers (and doing unmap on termination) is a problem in the
>>>> real life (as this is as well).
>>>>
>>>> At sociomantic we already had this issue: https://github.com/sociomantic-tsunami/tangort/issues/2 Maybe this is the way to go - I don't see a reason why every stack should be mmaped separately.
>>>
>>> I'm not sure that would allow us to mprotect the page guards, though.
>> 
>> I think the easiest way to proceed from here is to default the stack protection page to 0, so we avoid this regression. Once that's in, we can think about different allocation strategy for the fiber's stacks or throwing the exceptions on too many fibers (too bad mmap doesn't return error code, but you get SIGABRT instead :( )
>
> mmap does return an error. And onOutMemoryError is called when it fails.
>
> https://github.com/dlang/druntime/blob/master/src/core/thread.d#L4518
>
> Which should throw an error:
>
> https://github.com/dlang/druntime/blob/144c9e6e9a3c00aba82b92da527a52190fe91c97/src/core/exception.d#L542
>
> however, when mprotect fails, it calls abort():
>
> https://github.com/dlang/druntime/blob/master/src/core/thread.d#L4540
>
>> What do you think?
>
> I think we should reverse the mprotect default, and try and determine a better way to opt-in to the limit.
>
> Is this a Linux-specific problem? Are there issues with mprotect on other OSes? Or is Linux the only OS that supports mprotect?
>
> -Steve


Reading FreeBSD man pages, it looks like at least FreeBSD has the same limitation in the mmap, but the man pages for the mprotect doesn't specify this. However, I believe it's just the issue in the man pages, as it would be the same thing really as with Linux. Linux's manpage for mprotect specifically says this:

>        ENOMEM Changing the protection of a memory region would result in the
              total number of mappings with distinct attributes (e.g., read
              versus read/write protection) exceeding the allowed maximum.
              (For example, making the protection of a range PROT_READ in
              the middle of a region currently protected as
              PROT_READ|PROT_WRITE would result in three mappings: two
              read/write mappings at each end and a read-only mapping in the
              middle.)

so I was right about doubling the mappings.

> > I think we should reverse the mprotect default, and try and
> determine a better way to opt-in to the limit.

I agree: https://github.com/dlang/druntime/pull/1956
1 2
Next ›   Last »