April 20, 2012
On Thursday, 19 April 2012 at 00:07:45 UTC, Sean Kelly wrote:
> On Apr 18, 2012, at 4:06 PM, Andrew Lauritzen wrote:
>
>> I'm still interested in if anyone has any suggested workarounds or experience using Win32 fibers in D2 as well.
>
> The x32 Windows code should be pretty well tested.  If this is using the x64 code though, that's all quite new.  I'll give this a try when I find some time, but can't suggest a workaround offhand.  It almost sounds alignment-related, which could be tricky.

Been following D for a while now, and fibers right in the std lib are a huge draw for me. I'm not an expert on them, but on the topic of x64 fibers, I have some exposure to them trying to contribute x64 windows support to bsnes, which uses its own home-grown fiber/coroutine system.

Out of curiosity I took a look at the D fiber context code, and noticed that the x64 windows version doesn't seem to save the XMM6-15 registers (unless I missed it), which is something I forgot to do also. MSDN indicates that they are nonvolatile, which could potentially cause problems for FP heavy code on x64 windows.

Not sure if I should file a bug for this, as I haven't tried an x64 windows fiber in D yet to make sure it's actually a problem first.
April 20, 2012
"Jameson Ernst" <j.patrick.ernst@gmail.com> wrote in message news:qfsswbtsxlnxrsloxaco@forum.dlang.org...
>
> Out of curiosity I took a look at the D fiber context code, and noticed that the x64 windows version doesn't seem to save the XMM6-15 registers (unless I missed it), which is something I forgot to do also. MSDN indicates that they are nonvolatile, which could potentially cause problems for FP heavy code on x64 windows.
>
> Not sure if I should file a bug for this, as I haven't tried an x64 windows fiber in D yet to make sure it's actually a problem first.

May as well file it just so it doesn't get lost. If it turns out to be invalid, it can just be closed as "invalid".


April 20, 2012
On 4/19/2012 10:00 PM, Jameson Ernst wrote:
> On Thursday, 19 April 2012 at 00:07:45 UTC, Sean Kelly wrote:
>> On Apr 18, 2012, at 4:06 PM, Andrew Lauritzen wrote:
>>
>>> I'm still interested in if anyone has any suggested workarounds or
>>> experience using Win32 fibers in D2 as well.
>>
>> The x32 Windows code should be pretty well tested. If this is using
>> the x64 code though, that's all quite new. I'll give this a try when I
>> find some time, but can't suggest a workaround offhand. It almost
>> sounds alignment-related, which could be tricky.
>
> Been following D for a while now, and fibers right in the std lib are a
> huge draw for me. I'm not an expert on them, but on the topic of x64
> fibers, I have some exposure to them trying to contribute x64 windows
> support to bsnes, which uses its own home-grown fiber/coroutine system.
>
> Out of curiosity I took a look at the D fiber context code, and noticed
> that the x64 windows version doesn't seem to save the XMM6-15 registers
> (unless I missed it), which is something I forgot to do also. MSDN
> indicates that they are nonvolatile, which could potentially cause
> problems for FP heavy code on x64 windows.
>
> Not sure if I should file a bug for this, as I haven't tried an x64
> windows fiber in D yet to make sure it's actually a problem first.

Fibers seem like a last resort to me.  They are fairly difficult to make bulletproof due to the thread local storage issues and a few other problems with context switches.  Win7 scheduling and management of real threads is a lot better than in previous versions, and in x64 mode there is also user mode scheduling (UMS) system and the library built on top of it (ConcRT), which gets you almost all of the benefits of fibers but you get to use 'real' threads, plus the added bonus of being able to switch to another thread when you stall on a page fault or block on a kernel object (something fiber's can't do).
April 20, 2012
On Apr 19, 2012, at 6:50 PM, "Andrew Lauritzen" <andrew.lauritzen@gmail.com> wrote:

>> Gah, not this again. *sigh*
>> 
>> It is semi-well tested as in that I'm relying on it for async I/O in Thrift, and its test suite now passes on every Windows box I tested.
>> 
>> If you can spare the time, it would be great if you cloud download DMD 2.057 and try if the test case also fails with it on your setup. The reason is that I changed the way the top of the SEH chain is set up for fibers in 2.058 to make it work on Windows Server 2008/2008 R2, and maybe I inadvertently screwed up the stack alignment/… (you have to reverse-engineer the gritty details of SEH as you are going, and you never know whether you really »got it right« in terms of Microsoft's internal specs).
> 
> Yeah I feel your pain on Windows SEH stuff and lack of docs. I actually work with someone who used to work on Visual C++ so I'll see if I can get any info/links out of him on the subject that may be helpful :)
> 
> In the mean time, as per the issue in the discussion, the plot has thickened to include there being apparently side-effects to putting break-points/debugging the code itself, which throws a big wrench into my analysis of what is going wrong. Suffice it say, I'm not sure it's actually D's problem, although I'm still worried by the "Stack Overflow" message.
> 
> I'll grab 2.057 and give it a test though to see if the behavior is any different. Thanks!

Stack overflow?  Give the fiber a larger stack when you create it. The default is really rather small.
April 20, 2012
On Apr 19, 2012, at 9:37 PM, Sean Cavanaugh <WorksOnMyMachine@gmail.com> wrote:

> On 4/19/2012 10:00 PM, Jameson Ernst wrote:
>> On Thursday, 19 April 2012 at 00:07:45 UTC, Sean Kelly wrote:
>>> On Apr 18, 2012, at 4:06 PM, Andrew Lauritzen wrote:
>>> 
>>>> I'm still interested in if anyone has any suggested workarounds or experience using Win32 fibers in D2 as well.
>>> 
>>> The x32 Windows code should be pretty well tested. If this is using the x64 code though, that's all quite new. I'll give this a try when I find some time, but can't suggest a workaround offhand. It almost sounds alignment-related, which could be tricky.
>> 
>> Been following D for a while now, and fibers right in the std lib are a huge draw for me. I'm not an expert on them, but on the topic of x64 fibers, I have some exposure to them trying to contribute x64 windows support to bsnes, which uses its own home-grown fiber/coroutine system.
>> 
>> Out of curiosity I took a look at the D fiber context code, and noticed that the x64 windows version doesn't seem to save the XMM6-15 registers (unless I missed it), which is something I forgot to do also. MSDN indicates that they are nonvolatile, which could potentially cause problems for FP heavy code on x64 windows.
>> 
>> Not sure if I should file a bug for this, as I haven't tried an x64 windows fiber in D yet to make sure it's actually a problem first.
> 
> Fibers seem like a last resort to me.  They are fairly difficult to make bulletproof due to the thread local storage issues and a few other problems with context switches.  Win7 scheduling and management of real threads is a lot better than in previous versions, and in x64 mode there is also user mode scheduling (UMS) system and the library built on top of it (ConcRT), which gets you almost all of the benefits of fibers but you get to use 'real' threads, plus the added bonus of being able to switch to another thread when you stall on a page fault or block on a kernel object (something fiber's can't do).

I've thought about giving fibers their own TLS so D could have "real" user space threads. It would allow us to make the thread count in apps substantially higher if everything were a fiber and std.concurrency receive, for example, performed a context switch instead of blocking.
April 20, 2012
On Friday, 20 April 2012 at 04:37:32 UTC, Sean Cavanaugh wrote:
> On 4/19/2012 10:00 PM, Jameson Ernst wrote:
>> On Thursday, 19 April 2012 at 00:07:45 UTC, Sean Kelly wrote:
>>> On Apr 18, 2012, at 4:06 PM, Andrew Lauritzen wrote:
>>>
>>>> I'm still interested in if anyone has any suggested workarounds or
>>>> experience using Win32 fibers in D2 as well.
>>>
>>> The x32 Windows code should be pretty well tested. If this is using
>>> the x64 code though, that's all quite new. I'll give this a try when I
>>> find some time, but can't suggest a workaround offhand. It almost
>>> sounds alignment-related, which could be tricky.
>>
>> Been following D for a while now, and fibers right in the std lib are a
>> huge draw for me. I'm not an expert on them, but on the topic of x64
>> fibers, I have some exposure to them trying to contribute x64 windows
>> support to bsnes, which uses its own home-grown fiber/coroutine system.
>>
>> Out of curiosity I took a look at the D fiber context code, and noticed
>> that the x64 windows version doesn't seem to save the XMM6-15 registers
>> (unless I missed it), which is something I forgot to do also. MSDN
>> indicates that they are nonvolatile, which could potentially cause
>> problems for FP heavy code on x64 windows.
>>
>> Not sure if I should file a bug for this, as I haven't tried an x64
>> windows fiber in D yet to make sure it's actually a problem first.
>
> Fibers seem like a last resort to me.  They are fairly difficult to make bulletproof due to the thread local storage issues and a few other problems with context switches.  Win7 scheduling and management of real threads is a lot better than in previous versions, and in x64 mode there is also user mode scheduling (UMS) system and the library built on top of it (ConcRT), which gets you almost all of the benefits of fibers but you get to use 'real' threads, plus the added bonus of being able to switch to another thread when you stall on a page fault or block on a kernel object (something fiber's can't do).

To be precise, fibers in and of themselves aren't exacty what I want, but are the best means to getting it that I've seen so far. They enable efficient implementation of coroutines, which many languages either cannot express at all, or can only express very poorly by using the sledgehammer of a full-on kernel thread to get it. A call stack is a very useful way to group logic, and being forced to go outside the language and ask the OS for another one is a shame.

Game logic is an area where this technique REALLY shines. Case in point: Unity3D. The entire engine is built around C#'s iterator method facility to implement coroutines to control entity logic. Hundreds or even thousands of them can be active with very little overhead compared to a full thread context switch. Unfortunately, they're hamstrung by not being a true coroutine (you can only yield from the top frame).

This capability makes C# very compelling for game development, and we use it extensively at work on both the client and server side. D could really eat its lunch by making fibers first-class and portable.
April 20, 2012
On Friday, 20 April 2012 at 05:08:13 UTC, Sean Kelly wrote:
>
> I've thought about giving fibers their own TLS so D could have "real" user space threads. It would allow us to make the thread count in apps substantially higher if everything were a fiber and std.concurrency receive, for example, performed a context switch instead of blocking.

Would this cause a noticeable performance hit? One of the most important things is that fibers are incredibly cheap. For example, in my web server I'd like to implement being able to use a fiber for each request that has to wait on an asynchronous operation (aka, a database call or file read) to allow anothe request to be proecssed during the wait. If fibers had a noticeable performance hit (such as if they had to run per-thread static constructors to initialize things like TLS data), this would not work and I'd have to resort to essentially reimplementing fibers.
April 20, 2012
> To be precise, fibers in and of themselves aren't exacty what I want, but are the best means to getting it that I've seen so far. They enable efficient implementation of coroutines, which many languages either cannot express at all, or can only express very poorly by using the sledgehammer of a full-on kernel thread to get it. A call stack is a very useful way to group logic, and being forced to go outside the language and ask the OS for another one is a shame.
>
> Game logic is an area where this technique REALLY shines. Case in point: Unity3D. The entire engine is built around C#'s iterator method facility to implement coroutines to control entity logic. Hundreds or even thousands of them can be active with very little overhead compared to a full thread context switch. Unfortunately, they're hamstrung by not being a true coroutine (you can only yield from the top frame).
>
Right, exactly the above! Fibers are totally uninteresting as a "lighter thread" or unit of scheduling for the reasons that you note, but coroutines are just a better way to write a lot of code. This is basically the entire premise for the Go programming language, so it's worth taking a peek at that if you haven't already.

In my example, it's effectively to yield and restore connection-specific state for different clients. In the equivalent C code, it's a gigantic mess of a state machine specified by big switch statements and is almost impossible to follow the intended flow. With coroutines it's quite simple: you write the code in the most natural way - as if your sockets were blocking - and when you would otherwise block, you yield the fiber instead and come back to it when more socket data is available. Go takes this a step further by embedding the yields (and more) into both the standard library and the language itself. It's quite a powerful programming model for certain types of work.

When I started running into fiber issues I tried to use threads instead (just to avoid the issues; I have no need of the parallelism or other utilities that they provide in this application) but I ran into a bunch of problems.

First, D's insistence on message-passing, while noble and respectable, is not a perfect fit  for cases where I have no need or desire to synchronize large portions of the data structures. In general, this sort of coarse-grained opportunistic parallelism extracts very few additional useful cycles out of multicore hardware compared to a more targeted fine-grained approach. As you note, there's a reason that a lot of the scheduling is starting to move more to user-space "tasks".

(As an aside, it would be awesome to see a Cilk-like work stealing implementation in D. That's by far the easiest first step to really extracting parallelism our of programs and you can often get most of the benefit just with that. It's yet another elegant way to use the call-stack for expressing and exploiting parallelism and dependencies.)

Second, and more importantly, there didn't seem to be a clean way to wait on multiple things in D right now. For instance, I want to yield a thread/fiber until there is either a socket state change, *or* a thread message. This can be a minor problem with fibers, but less-so since the fiber itself can basically just take over whatever work needs to be done rather than trying to wake other threads. There doesn't seem to be a clean way to do this in D currently other than waking threads and basically polling multiple things, which I'm sure you can agree is not ideal.

> Stack overflow?  Give the fiber a larger stack when you create it. The default is really rather small.

I'm fairly certain that it's not a "real" stack overflow... the program continues to operate normally unless the debugger is stepping through and it only happens when an exception is thrown. And it happens pretty much always when an exception is thrown, you just won't see it unless you have a debugger attached to see the output. So like I said, it is somewhat worrisome, but the program seems to be running properly despite it, so it may be a red herring.
April 20, 2012
On Apr 19, 2012, at 10:55 PM, Kapps wrote:

> On Friday, 20 April 2012 at 05:08:13 UTC, Sean Kelly wrote:
>> 
>> I've thought about giving fibers their own TLS so D could have "real" user space threads. It would allow us to make the thread count in apps substantially higher if everything were a fiber and std.concurrency receive, for example, performed a context switch instead of blocking.
> 
> Would this cause a noticeable performance hit? One of the most important things is that fibers are incredibly cheap. For example, in my web server I'd like to implement being able to use a fiber for each request that has to wait on an asynchronous operation (aka, a database call or file read) to allow anothe request to be proecssed during the wait. If fibers had a noticeable performance hit (such as if they had to run per-thread static constructors to initialize things like TLS data), this would not work and I'd have to resort to essentially reimplementing fibers.

There wouldn't me much of a performance hit, mostly an additional allocation and a bitcopy when creating a Fiber.  It's more that making this work on platforms with built-in TLS could be quite tricky.
April 20, 2012
On Friday, 20 April 2012 at 17:49:52 UTC, Sean Kelly wrote:
> There wouldn't me much of a performance hit, mostly an additional allocation and a bitcopy when creating a Fiber.  It's more that making this work on platforms with built-in TLS could be quite tricky.
Note that this would somewhat sabotage their usefulness as coroutines, depending on how it was implemented. That's not to say the idea isn't good (but I'd frame it more like "tasks"; see Thread Building Blocks or similar), but fibers/coroutines as they stand now are useful so I'd hate to see that capability lost.