Thread overview
StackThreads and Phobos
Jun 30, 2006
mclysenk
Jul 02, 2006
Bruno Medeiros
Jul 02, 2006
mclysenk
Jul 03, 2006
Bruno Medeiros
Jul 03, 2006
mclysenk
Jul 03, 2006
Bruno Medeiros
Jul 03, 2006
mclysenk
June 30, 2006
Hello fellow D users!  I've been working diligently on improving StackThreads (http://assertfalse.com), and so far D has worked wonderfully.  The same can not be said for Phobos...  (This may get a bit long winded.)

---

1.  removeRange is really slow.

In order to prevent the garbage collector from accidentally deleting valid objects, we need to mark the stack of each user-context.

One naive solution is to simply mark the entire stack region, however this isn't a very good idea.

Say I create a user context which calls a function and then yields.  If that function performed any allocations, then they would remain on the stack, and get scanned by the GC.  Until I overwrite those values, the GC will always see them, and therefore never collect their memory.

This results in memory leaks, and a dramatic loss of performance.  The solution is to dynamically resize the range used by the stack.  When we switch into a context, we remove the range, and before we leave, we add it back.  This nicely solves the problem, except for one thing - removeRange requires O(n) time.

This performance penalty applies to each Context, increasing the cost of switching dramatically.

A simple solution is to use a hash set or another efficient data structure. This would also be able to detect range address conflicts, such as adding a range twice.

As a short term fix, I propose changing line 987 of internal/gcx.d from:

memmove(ranges + i, ranges + i + 1, (nranges - i) * ranges[0].sizeof);

to:

memmove(ranges + i, ranges + nranges, ranges[0].sizeof);

While this does not remove the sluggish linear search, it does eliminate a very large memory copy.  The cost of this approach is that the ordering of ranges is not preserved, which may increase look up time in some circumstances.

Another possibility is to allow for 'dynamic' ranges, which the user would be able to resize.  In this situation, we could still use a linear search for removal, however changing the range's dimensions is a constant time operation. Thread safety could be insured via an atomic operation, ie CMPXCHG.


2.  It is impossible to safely handle a page fault.

Another problem with StackThreads right now is stack growth.  In a regular context, the operating system automatically grows the stack once the program hits the bottom.  It keeps on doing this until there is no more address space/memory left, and then it throws an overflow.

In StackThreads, I'd like to do the same thing.  However, DMD transforms any page faults into exceptions, which it then unwinds up the call stack.  This makes it impossible to trap, since the exception handler will unwind the stack until it hits a try block with a matching catch - and only then will it get handled.  From this state, there is no way to recover the program, since we cannot undo the unwind.  The only tenable option is a global fault and termination of the context.

In windows, you could try to install a custom exception handler, and trap any stack overflows yourself - except DMD automatically installs an SEH for each try..catch block.  It is pretty much inevitable that a hapless user will eat your page fault when he is trying to catch some other sort of exception.

Call backs are one solution, user programs can carefully handle page faults and stack overflows without disturbing the state of the program.


---

There are a few more issues, but these are the only two I have encountered that I couldn't hack around.

My personal view is that integrated user-context switching would fix most of those issues, and provide a great deal of flexibility.  Once the GC issues are resolved, it can be made very fast.  In fact, the only overhead (outside GC management) is equivalent to a single function call.

Standard user level contexts make all sorts of things possible, like coroutines or micro threads.  They can be used to iterate over complex data structures trivially.  They are much simpler and faster than threads for GUIs, and eliminate the need for many state machine objects.

In conjunction with standard preemptive threading, user-contexts provide an elegant solution to many practical programming problems.  Standard library integration will result in a much faster context switching and better future support.


-Mikola Lysenko


July 02, 2006
mclysenk@mtu.edu wrote:
> Hello fellow D users!  I've been working diligently on improving StackThreads
> (http://assertfalse.com), and so far D has worked wonderfully.  The same can not
> be said for Phobos...  (This may get a bit long winded.)
> 
> ---
> 
> 1.  removeRange is really slow.
> 
> In order to prevent the garbage collector from accidentally deleting valid
> objects, we need to mark the stack of each user-context.
> 
> One naive solution is to simply mark the entire stack region, however this isn't
> a very good idea.
> 
> Say I create a user context which calls a function and then yields.  If that
> function performed any allocations, then they would remain on the stack, and get
> scanned by the GC.  Until I overwrite those values, the GC will always see them,
> and therefore never collect their memory. 
> 
> This results in memory leaks, and a dramatic loss of performance.  The solution
> is to dynamically resize the range used by the stack.  When we switch into a
> context, we remove the range, and before we leave, we add it back.  This nicely
> solves the problem, except for one thing - removeRange requires O(n) time.
> 

But isn't that the correct behavior? If one can return to that function instance(context), shouldn't all allocations made previously by that function remain valid?
(note: I haven't used your library, I just read your post and this question came up)
Also: Where do you keep the data from each context? In the heap?


-- 
Bruno Medeiros - CS/E student
http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D
July 02, 2006
In article <e88ltq$kvc$3@digitaldaemon.com>, Bruno Medeiros says...
>
>But isn't that the correct behavior? If one can return to that function
>instance(context), shouldn't all allocations made previously by that
>function remain valid?
>(note: I haven't used your library, I just read your post and this
>question came up)
>

It is not correct behavior.  Here's an example which may illustrate this better:

#void func1()
#{
#     func2();
#     yield();
#}
#
#void func2()
#{
#     int[] arr = new int[20];
#}

Here is a trace of the execution of func1 alongside it's stack

# Instruction              |      Stack
#--------------------------+--------------
#  func2()                 | SP> &func1 + offset
#                          |
#  int[] arr = new int[20] |     &func1 + offset
#                          | SP> &arr
#                          |
#  return from func2()     | SP> &func1 + offset
#                          |     &arr             <-- Old unused reference
#                          |
#  yield()                 |     &func1 + offset
#                          |     &arr             <-- Will NEVER get collected
#                          |

In this situation, it is doubtful that arr will ever get collected.  In order for the GC to remove it, it must not find ANY references to arr in memory. Period.  If the GC scans the entire stack, it will see a reference to arr, and therefore it will never collect arr.  This is a trivial example, but it is not difficult to see that these sorts of memory leaks can become very substantial and very difficult to track down.


The problem is how to tell the garbage collector not to scan past the stack pointer.  The only way at the moment is to use addRange and removeRange, or zero out all of the unused stack memory every time you yield.  Either option has a substantial overhead.

>
>Also: Where do you keep the data from each context? In the heap?
>
>

Heap.  The next version of stackthreads will replace malloc/free with system specific allocators, such as mmap and VirtualAlloc.

-Mikola Lysenko


July 03, 2006
mclysenk@mtu.edu wrote:
> In article <e88ltq$kvc$3@digitaldaemon.com>, Bruno Medeiros says...
>> But isn't that the correct behavior? If one can return to that function instance(context), shouldn't all allocations made previously by that function remain valid?
>> (note: I haven't used your library, I just read your post and this question came up)
>>
> 
> It is not correct behavior.  Here's an example which may illustrate this better:
> 
> #void func1()
> #{
> #     func2();
> #     yield();
> #}
> #
> #void func2()
> #{
> #     int[] arr = new int[20];
> #}
> 
> Here is a trace of the execution of func1 alongside it's stack
> 
> # Instruction              |      Stack
> #--------------------------+--------------
> #  func2()                 | SP> &func1 + offset
> #                          |
> #  int[] arr = new int[20] |     &func1 + offset
> #                          | SP> &arr
> #                          |
> #  return from func2()     | SP> &func1 + offset
> #                          |     &arr             <-- Old unused reference
> #                          |
> #  yield()                 |     &func1 + offset
> #                          |     &arr             <-- Will NEVER get collected
> #                          |
> 
> In this situation, it is doubtful that arr will ever get collected.  In order
> for the GC to remove it, it must not find ANY references to arr in memory.
> Period.  If the GC scans the entire stack, it will see a reference to arr, and
> therefore it will never collect arr.  This is a trivial example, but it is not
> difficult to see that these sorts of memory leaks can become very substantial
> and very difficult to track down.
> 
> 
> The problem is how to tell the garbage collector not to scan past the stack
> pointer.  The only way at the moment is to use addRange and removeRange, or zero
> out all of the unused stack memory every time you yield.  Either option has a
> substantial overhead.
> 
>> Also: Where do you keep the data from each context? In the heap?
>>
>>
> 
> Heap.  The next version of stackthreads will replace malloc/free with system
> specific allocators, such as mmap and VirtualAlloc.
> 
> -Mikola Lysenko
> 
> 

Unless my understanding of this is wrong, the use of the word "never" (gets collected) is too strong. If during normal program flow the stack is overwritten again, the reference becomes free. Or am I missing something?
Also, doesn't this happen with any regular function?


-- 
Bruno Medeiros - CS/E student
http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D
July 03, 2006
In article <e8artd$16e0$1@digitaldaemon.com>, Bruno Medeiros says...
>
>Unless my understanding of this is wrong, the use of the word "never"
>(gets collected) is too strong. If during normal program flow the stack
>is overwritten again, the reference becomes free. Or am I missing something?
>Also, doesn't this happen with any regular function?
>

Well, you are correct, that it may still get collected in the future - regardless, its prospects are fairly dim.  You are also right in your statement that this is not abnormal behavior for functions, and the GC knows this.  It has a special case which handles the stack of a regular function.

When the GC does a full collect, it pauses all threads and marks the range from the stack pointer to the bottom of the stack.  This prevents it from scanning stuff which is no longer valid, since that range never gets marked.  This works very well.

The bad part happens when we create multiple stacks for one thread.  In order for the GC to properly scan them, we need to mark their stack when it is not in use.  If the GC didn't scan their stack, then we'd accidentally over-collect stuff they were using.  If we mark the entire stack range, then we'd under-collect and get a bunch of memory leaks.  To fix this, we use addRange to mark the active part of their stack before we switch out their context, and removeRange to delete their old - and potentially invalid - stack range when we switch into their context.

This part is tricky, and it burned me many times while I was working on stack threads before I finally caught on.  It is not a big deal to fix once you know what's happening.  The main disadvantage at the moment is removeRange - it's just too slow!  It can be fixed, but it requires changing phobos.

-Mikola Lysenko


July 03, 2006
mclysenk@mtu.edu wrote:
> In article <e8artd$16e0$1@digitaldaemon.com>, Bruno Medeiros says...
>> Unless my understanding of this is wrong, the use of the word "never" (gets collected) is too strong. If during normal program flow the stack is overwritten again, the reference becomes free. Or am I missing something?
>> Also, doesn't this happen with any regular function?
>>
> 
> Well, you are correct, that it may still get collected in the future -
> regardless, its prospects are fairly dim.  You are also right in your statement
> that this is not abnormal behavior for functions, and the GC knows this.  It has
> a special case which handles the stack of a regular function.
> 
> When the GC does a full collect, it pauses all threads and marks the range from
> the stack pointer to the bottom of the stack.  This prevents it from scanning
> stuff which is no longer valid, since that range never gets marked.  This works
> very well.
> 
> The bad part happens when we create multiple stacks for one thread.  In order
> for the GC to properly scan them, we need to mark their stack when it is not in
> use.  If the GC didn't scan their stack, then we'd accidentally over-collect
> stuff they were using.  If we mark the entire stack range, then we'd
> under-collect and get a bunch of memory leaks.  To fix this, we use addRange to
> mark the active part of their stack before we switch out their context, and
> removeRange to delete their old - and potentially invalid - stack range when we
> switch into their context.
> 
> This part is tricky, and it burned me many times while I was working on stack
> threads before I finally caught on.  It is not a big deal to fix once you know
> what's happening.  The main disadvantage at the moment is removeRange - it's
> just too slow!  It can be fixed, but it requires changing phobos.  
> 
> -Mikola Lysenko
> 
> 

That was clear now, but for what I understand, then your problem is simply that you must use removeRange(and it is slow).
That is, it doesn't really have to do with function contexts and stacks. Anyone who allocates manually managed memory for any purpose, and then uses addRange on it, will have the same problem, right? (because they too will have to use removeRange)

-- 
Bruno Medeiros - CS/E student
http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D
July 03, 2006
In article <e8baer$1ond$1@digitaldaemon.com>, Bruno Medeiros says...
>
>
>That was clear now, but for what I understand, then your problem is
>simply that you must use removeRange(and it is slow).
>That is, it doesn't really have to do with function contexts and stacks.
>Anyone who allocates manually managed memory for any purpose, and then
>uses addRange on it, will have the same problem, right? (because they
>too will have to use removeRange)
>


Correct.  This problem will affect everyone who uses addRange/removeRange.

-Mikola Lysenko