Jump to page: 1 2
Thread overview
concurrency call to arms
Aug 16, 2018
John Belmonte
Aug 16, 2018
H. S. Teoh
Aug 17, 2018
John Belmonte
Aug 17, 2018
rikki cattermole
Aug 17, 2018
H. S. Teoh
Aug 18, 2018
John Belmonte
Aug 22, 2018
Russel Winder
Aug 22, 2018
Russel Winder
Aug 28, 2018
John Belmonte
Aug 28, 2018
Russel Winder
Sep 15, 2018
John Belmonte
Aug 22, 2018
Russel Winder
Aug 22, 2018
H. S. Teoh
Aug 23, 2018
Russel Winder
Sep 26, 2018
John Belmonte
August 16, 2018
This is actually not about war; rather the peace and prosperity of people writing concurrent programs.

(Andrei, I hope you are reading and will check out
https://vorpus.org/blog/notes-on-structured-concurrency-or-go-statement-considered-harmful/ and
https://vorpus.org/blog/timeouts-and-cancellation-for-humans/)

Recently I've been working with Trio, which is a Python async concurrency library implementing the concepts described in the articles above.  A synopsis (Python):

    with open_task_container() as container:
        container.start_task(a)
        container.start_task(b)
        await sleep(1)
        container.start_task(c)
        # end of with block

    # program continues (tasks a, b, c must be completed)...

The point is that tasks started in the container's scope will not live past the scope.  Scope exit will block until all tasks are complete (normally or by cancellation).  If task b has an exception, all other tasks in the container are cancelled.

What this means is that task lifetimes can be readily understood by looking at the structure of a program.  They are tied to scoped blocks, honor nesting, etc.

Similar for control of timeouts and cancellation:

    with fail_after(10):  # raise exception if scope not completed in 10s
        reply = await request(a)
        do_something(reply)
        reply2 = await request(b)
        ...

These are novel control structures for managing concurrency.  Combining this with cooperative multitasking and explicit, plainly-visible context switching (i.e. async/await-- sorry Olshansky) yields something truly at the forefront of concurrent programming.  I mean no callbacks, almost no locking, no explicitly maintained context and associated state machines, no task lifetime obscurity, no manual plumbing of cancellations, no errors dropped on the floor, no shutdown hiccups.  I'm able to write correct, robust, maintainable concurrent programs with almost no mental overhead beyond a non-concurrent program.

Some specimens (not written by me):
    #1:  the I/O portion of a robust HTTP 1.1 server implementation in about 200 lines of code.  https://github.com/python-hyper/h11/blob/33c5282340b61ddea0dc00a16b6582170d822d81/examples/trio-server.py
    #2: an implementation of the notoriously difficult "happy eyeballs" networking connection algorithm in about 150 lines of code.  https://github.com/python-trio/trio/blob/7d2e2603b972dc0adeaa3ded35cd6590527b5e66/trio/_highlevel_open_tcp_stream.py

I'd like to see a D library supporting these control structures (with possibly some async/await syntax for the coroutine case).  And of course for vibe.d and other I/O libraries to unify around this.

I'll go out on a limb and say if this could happen in addition to D addressing its GC dirty laundry, the language would actually be an unstoppable force.

Regards,
--John

August 16, 2018
On Thu, Aug 16, 2018 at 08:30:26PM +0000, John Belmonte via Digitalmars-d wrote: [...]
> (Andrei, I hope you are reading and will check out https://vorpus.org/blog/notes-on-structured-concurrency-or-go-statement-considered-harmful/ and https://vorpus.org/blog/timeouts-and-cancellation-for-humans/)

I read both articles, and am quite impressed by the revolutionary way of looking at concurrency.  It provides a clean(er) abstraction that can be reasoned about much more easily than currently prevalent models of concurrency.  Seems it would fit right in with D's message-based concurrency communication model.


[...]
> These are novel control structures for managing concurrency. Combining this with cooperative multitasking and explicit, plainly-visible context switching (i.e. async/await-- sorry Olshansky) yields something truly at the forefront of concurrent programming.  I mean no callbacks, almost no locking, no explicitly maintained context and associated state machines, no task lifetime obscurity, no manual plumbing of cancellations, no errors dropped on the floor, no shutdown hiccups.  I'm able to write correct, robust, maintainable concurrent programs with almost no mental overhead beyond a non-concurrent program.

Indeed.  It certainly seems like a promising step toward addressing the nasty minefield that is today's concurrent programming models.

However, it would seem to require language support, no?  It's going to be a tough sell to Walter & Andrei if it requires language support. (Though IMO it's worth it.)

One potential problem point is C/C++ interoperability.  Once you can call into the wild world of arbitrary C/C++ code that may spawn threads and do other low-level concurrent things, all bets are off and you can no longer provide the above guarantees.

But we might be able to work around this with a mechanism similar to @safe/@system/@trusted, to isolate potentially encapsulation-breaking code from code that's been vetted to not have unwanted concurrent side-effects.  Then within the realm of "well-behaved" code, we can reap the benefits of the new concurrency model without being crippled by the potential of interacting with "badly-behaved" code.


[...]
> I'd like to see a D library supporting these control structures (with possibly some async/await syntax for the coroutine case).  And of course for vibe.d and other I/O libraries to unify around this.

If this is possible to implement without requiring language support, that would be a major win, and would be much more likely to be acceptable to W & A.


> I'll go out on a limb and say if this could happen in addition to D addressing its GC dirty laundry, the language would actually be an unstoppable force.
[...]

Not sure what you're referring to by "GC dirty laundry", but IMO, the GC gets a lot of undeserved hate from GC-phobic folk coming from C/C++. While there's certainly room for improvement, I think the GC ought to be regarded as one of those modern features that are essential to D's success, not the illegitimate step-child that everyone tries to avoid. It liberates the programmer from being constantly bogged down with the nitty-gritties of low-level memory management issues, and frees the mind to focus on actually solving the problem domain the program is intended to address. Just like this new concurrency model liberates the programmer from constantly having to worry about data races, deadlocks, and all the other nice things traditional concurrency models entail. ;-)


T

-- 
For every argument for something, there is always an equal and opposite argument against it. Debates don't give answers, only wounded or inflated egos.
August 17, 2018
On Thursday, 16 August 2018 at 23:33:04 UTC, H. S. Teoh wrote:
> However, it would seem to require language support, no?  It's going to be a tough sell to Walter & Andrei if it requires language support. (Though IMO it's worth it.)

To implement scoped nursery and cancellation?  I hope it could be done with libraries given D's flexibility.  At the very least they could be prototyped with scope exits.

async/await might need syntax.  Part of async/await is just knowing what functions and call sites can context switch, so you can get that with decorators and clear library API.  But the other part is compiler help-- e.g. any function with await must be declared async, enforced by the compiler.  But I suspect a D library could do some compile time magic here too.

One point is that the new control structures are valid regardless of how threads are implemented:   OS threads, coroutines with implicit context switch, coroutines with explicit context switch, etc.  What seems by far the most promising is the last one since it further simplifies reasoning about concurrent programs.  And that's exactly what Trio + Python async/await provide.

> But we might be able to work around this with a mechanism similar to @safe/@system/@trusted, to isolate potentially encapsulation-breaking code from code that's been vetted to not have unwanted concurrent side-effects.  Then within the realm of "well-behaved" code, we can reap the benefits of the new concurrency model without being crippled by the potential of interacting with "badly-behaved" code.

It's an important point.  As argued in the first article, as soon as parts of the program are spawning tasks ad-hoc, the benefits break down.  Hence GOTO has been eliminated or neutered in most languages, using the author's analogy.  Similarly D can use the approach of @safe etc. so that we know what parts of the program will behave correctly on cancellation or exception.

> Not sure what you're referring to by "GC dirty laundry", but IMO, the GC gets a lot of undeserved hate from GC-phobic folk coming from C/C++.

I totally agree with the importance of a GC.  I'm referring to GC stop-the world latency.  E.g. Go language has a concurrent GC now at around 500 usec pause per GC and due to drop significantly more (https://blog.golang.org/ismmkeynote).

August 17, 2018
After reading the article I can say, it isn't any better than async and await for dependencies. You still need an event loop.

The problem is that joining that happens at the end of that block needs to run the event loop for iterations until it completes. Which is wonderful if you're not doing real time like game development.

In essence you want a stack of state per thread, which uses the central event loop:

func():
	with x:
		spawn(foo)
		join(foo)
		endScope()
funcParent():
	with x:
		spawn(func)
		join(func)
		endScope()

If you don't have this, you will miss timers, window events and all sorts of things that could be very time sensitive which would be very very bad.

Because we have an event loop, we don't need a nursery! It comes free of charge. It also means we don't need that with statement... hang on that now becomes await and async! Just without the await (auto added in scope(exit), and compiler can merge them into a single function call ;) ).
August 17, 2018
On Fri, Aug 17, 2018 at 06:36:36PM +1200, rikki cattermole via Digitalmars-d wrote:
> After reading the article I can say, it isn't any better than async and await for dependencies. You still need an event loop.
> 
> The problem is that joining that happens at the end of that block needs to run the event loop for iterations until it completes. Which is wonderful if you're not doing real time like game development.

I don't see the problem.

The event loop can spawn tasks into one main nursery, and each task can spawn subtasks into its own nurseries.  The nursery in the task blocks until the all subtasks have completed, *but* that does not preclude other tasks in the event loop's main nursery from running simultaneously, e.g., to handle events and timers.


> In essence you want a stack of state per thread, which uses the central event loop:
> 
> func():
> 	with x:
> 		spawn(foo)
> 		join(foo)
> 		endScope()
> funcParent():
> 	with x:
> 		spawn(func)
> 		join(func)
> 		endScope()
> 
> If you don't have this, you will miss timers, window events and all sorts of things that could be very time sensitive which would be very very bad.

Why will you miss timers and window events?  The event loop will spawn tasks into one nursery, while tasks spawn subtasks into their own nurseries. While the task nurseries are blocking to join subtasks, the event loop nursery continues running simultaneously.  It doesn't join until the event loop exits.

Please elaborate on why you think there's a problem here. I'm not seeing it.


> Because we have an event loop, we don't need a nursery! It comes free of charge. It also means we don't need that with statement... hang on that now becomes await and async! Just without the await (auto added in scope(exit), and compiler can merge them into a single function call ;) ).

I think you're missing the point here.  The point is to create an abstraction of concurrent processes that's easier to reason about. Saying "we have an event loop, we don't need a nursery" is akin to saying "we have unrestricted goto, we don't need structured blocks like functions, loops and else-blocks".  The point is not whether you can express the same things, but whether it's easier to reason about the resulting code.  Two abstractions may be equivalent in expressive power, but one may be extremely hard to reason about (unrestricted gotos, await / async, etc.), while the other may be much easier to reason about (structured code blocks, nurseries).

Abstractions that are hard to reason about tends to lead to buggy code, because the programmer has a hard time understanding the full implications of what he's writing.  That's why we prefer abstractions that are easier to reason about.  That's the point.


T

-- 
Perhaps the most widespread illusion is that if we were in power we would behave very differently from those who now hold it---when, in truth, in order to get power we would have to become very much like them. -- Unknown
August 18, 2018
On Friday, 17 August 2018 at 06:36:36 UTC, rikki cattermole wrote:
> Because we have an event loop, we don't need a nursery! It comes free of charge. It also means we don't need that with statement... hang on that now becomes await and async! Just without the await (auto added in scope(exit), and compiler can merge them into a single function call ;) ).

H. S. Teoh already made some fair points in response.  I'll just add that having task lifetimes defined by nested blocks mixed right into the normal program flow is powerful, and trying to emulate this with function scopes is not sufficient (return doesn't work, exception handling is a mess, etc.).

It's a subtle thing that you actually must try to appreciate.  If it were obvious we would have had this decades ago.

Regarding async / await, "auto" anything defeats the purpose:  they are primarily markers to help programmers reason about the code.

   value = await some_container.async_get(key)
   modified_value = transform(some_std_lib_func(value))
   /*... a bunch of non-await code that does more stuff with modified_value ...*/
   some_container.set(key, modified_value)
   await foo()

Essentially all code between await keywords can be considered atomic.  This greatly reduces the need for locking, as well as the cognitive load of reasoning about race conditions and access to shared state.  Contrast this with say go's coroutines, where context switches can happen on any function call (https://golang.org/doc/go1.2#preemption).

Regards,
--John

August 22, 2018
On Thu, 2018-08-16 at 20:30 +0000, John Belmonte via Digitalmars-d wrote:
> This is actually not about war; rather the peace and prosperity of people writing concurrent programs.
> 
> (Andrei, I hope you are reading and will check out
> 
https://vorpus.org/blog/notes-on-structured-concurrency-or-go-statement-considered-harmful/
> 

On skimming this, I get the feeling the author doesn't really understand goroutines and channels. Actually I am not entirely sure the person understands concurrency and parallelism.

>  and
> https://vorpus.org/blog/timeouts-and-cancellation-for-humans/)
> 
> Recently I've been working with Trio, which is a Python async concurrency library implementing the concepts described in the articles above.  A synopsis (Python):

Have you tried asyncio in the Python standard library? Is Trio better?

>      with open_task_container() as container:
>          container.start_task(a)
>          container.start_task(b)
>          await sleep(1)
>          container.start_task(c)
>          # end of with block
> 
>      # program continues (tasks a, b, c must be completed)...

Assuming a, b, and c run in parallel and this is just a nice Pythonic
way of ensuring join, this is fairly standard fork/join thread pool
task management – except Python is single threaded so the above is time
division multiplexing of tasks.

std.parallelism can already handle this sort of stuff in D as far as I know.

> The point is that tasks started in the container's scope will not live past the scope.  Scope exit will block until all tasks are complete (normally or by cancellation).  If task b has an exception, all other tasks in the container are cancelled.

Use of scope like this is a good thing, and something GPars, Quasar, and others supports. Using a context manager in Python is clearly a very Pythonic way of doing it.

> What this means is that task lifetimes can be readily understood by looking at the structure of a program.  They are tied to scoped blocks, honor nesting, etc.
> 
> Similar for control of timeouts and cancellation:
> 
>      with fail_after(10):  # raise exception if scope not
> completed in 10s
>          reply = await request(a)
>          do_something(reply)
>          reply2 = await request(b)
>          ...
> 
> These are novel control structures for managing concurrency. Combining this with cooperative multitasking and explicit, plainly-visible context switching (i.e. async/await-- sorry Olshansky) yields something truly at the forefront of concurrent programming.  I mean no callbacks, almost no locking, no explicitly maintained context and associated state machines, no task lifetime obscurity, no manual plumbing of cancellations, no errors dropped on the floor, no shutdown hiccups.  I'm able to write correct, robust, maintainable concurrent programs with almost no mental overhead beyond a non-concurrent program.

I'd disagree with them being novel control structures. The concepts have been around for a couple of decades. They have different expressions in different languages. Python's context manager just makes it all very neat.

Clearly getting rid of the nitty-gritty management detail of concurrency and parallelism is a good thing.  Processes and channels have been doing all this for decades, but have only recently become fashionable – one up to Rob Pike and team. I've not followed async/await in C# but in Python it is a tool for concurrency but clearly not for parallelism. Sadly async/await has become a fashion that means it is being forced into programming languages that really do not need it. Still there we see the power of fashion driven programming language development.

> Some specimens (not written by me):
>      #1:  the I/O portion of a robust HTTP 1.1 server
> implementation in about 200 lines of code.
> 
https://github.com/python-hyper/h11/blob/33c5282340b61ddea0dc00a16b6582170d822d81/examples/trio-server.py
>      #2: an implementation of the notoriously difficult "happy
> eyeballs" networking connection algorithm in about 150 lines of
> code.
> 
https://github.com/python-trio/trio/blob/7d2e2603b972dc0adeaa3ded35cd6590527b5e66/trio/_highlevel_open_tcp_stream.py
> 
> I'd like to see a D library supporting these control structures (with possibly some async/await syntax for the coroutine case). And of course for vibe.d and other I/O libraries to unify around this.

Kotlin, Java, etc. are all jumping on the coroutines bandwagon, but why? There is no actual need for these given you can have blocking tasks in a threadpool with channels already.

> I'll go out on a limb and say if this could happen in addition to D addressing its GC dirty laundry, the language would actually be an unstoppable force.

Why?

Are coroutines with language syntax support really needed?

And whilst Go is obsessively improving it's GC so as to make it a non- issue to any performance arguments, it seems this is an insoluble problem in D.

-- 
Russel.
===========================================
Dr Russel Winder      t: +44 20 7585 2200
41 Buckmaster Road    m: +44 7770 465 077
London SW11 1EN, UK   w: www.russel.org.uk



August 22, 2018
On Thu, 2018-08-16 at 16:33 -0700, H. S. Teoh via Digitalmars-d wrote:
> 
[…]
> I read both articles, and am quite impressed by the revolutionary way
> of
> looking at concurrency.  It provides a clean(er) abstraction that can
> be
> reasoned about much more easily than currently prevalent models of
> concurrency.  Seems it would fit right in with D's message-based
> concurrency communication model.

I found the assumptions about what goroutines were to be wrong. Yes
there is an interesting structure built using Python context managers
to manage tasks executed by time division multiplexing, but is that
really needed since the current systems work just fine if you have
threadpools and multiple executing threads – as Java, Go, etc. have but
Python does not.

[…]
> 
> Indeed.  It certainly seems like a promising step toward addressing
> the
> nasty minefield that is today's concurrent programming models.

I'd say processes and channels works just fine. What is this really providing outside the Python sphere? (Also Javascript?)

> 
[…]
-- 
Russel.
===========================================
Dr Russel Winder      t: +44 20 7585 2200
41 Buckmaster Road    m: +44 7770 465 077
London SW11 1EN, UK   w: www.russel.org.uk



August 22, 2018
On Fri, 2018-08-17 at 18:36 +1200, rikki cattermole via Digitalmars-d wrote:
> After reading the article I can say, it isn't any better than async
> and
> await for dependencies. You still need an event loop.
> 
> […]

Or a work stealing threadpool.

Event loops are only really needed in contexts that must be single threaded.

-- 
Russel.
===========================================
Dr Russel Winder      t: +44 20 7585 2200
41 Buckmaster Road    m: +44 7770 465 077
London SW11 1EN, UK   w: www.russel.org.uk



August 22, 2018
On Wed, Aug 22, 2018 at 05:56:09PM +0100, Russel Winder via Digitalmars-d wrote:
> On Thu, 2018-08-16 at 16:33 -0700, H. S. Teoh via Digitalmars-d wrote: […]
> > I read both articles, and am quite impressed by the revolutionary way of looking at concurrency.  It provides a clean(er) abstraction that can be reasoned about much more easily than currently prevalent models of concurrency.  Seems it would fit right in with D's message-based concurrency communication model.
> 
> I found the assumptions about what goroutines were to be wrong. Yes there is an interesting structure built using Python context managers to manage tasks executed by time division multiplexing, but is that really needed since the current systems work just fine if you have threadpools and multiple executing threads – as Java, Go, etc. have but Python does not.

I approached the article from a language-independent viewpoint. While I know a little bit of Python, I wasn't really very interested in the Python-specific aspects of the article, nor in the specific implementation the author had written.  What caught my interest was the concept behind it -- the abstraction for concurrent/parallel computation that is easy to reason about, compared to other models.

The main innovative idea, IMO, is the restriction of parallel/concurrent processing to the lifetime of an explicit object, in this case, a "nursery". (TBH a better term could have been chosen, but that doesn't change the underlying concept.)  More specifically, the lifetime of this object can in turn be tied to a lexical scope, which gives you an explicit, powerful way to manage the lifetime of child processes (threads, coroutines, whatever), as opposed to the open-endedness of, say, spawning a thread that may run arbitrarily long relative to the parent thread.

This restriction does not limit the expressive power of the abstraction -- it "gracefully degrades" to current open-ended models if, for example, you allocate a nursery on the heap and spawn child processes / threads / etc. into it.

However, by restricting the open-endedness of child (process, thread, ...) lifetime, it gives you the ability to reason about control flow in a much more intuitive way.  It restores the linearity of control flow in a given block of code (with the well-defined exception if a nursery was explicitly passed in), making it it much easier to reason about.  Unless you're explicitly passing nurseries around, you no longer have to worry about whether some function you call in the block might spawn new processes that continue running after the block exits. You no longer need to explicitly manage shared resources and worry about whether resource X could be released at the end of the block. And so on.

Even in the more complex case where nurseries are being passed around, you can still reason about the code with relative ease by examining the lifetime of the nursery objects.  You no longer have to worry about the case where background processes continue running past the lifetime of the main program (function, block, etc.), or manually keeping track of child processes so that you can sync with them.

Once you have this new way of thinking about concurrent processing, other possibilities open up, like returning values from child processes, propagating exceptions, cancellation, etc..  (Cancellation may require further consideration in non-Python implementations, but still, the model provides the basis for a cleaner approach to this than open-ended models allow.)


[…]
> > Indeed.  It certainly seems like a promising step toward addressing the nasty minefield that is today's concurrent programming models.
> 
> I'd say processes and channels works just fine. What is this really providing outside the Python sphere? (Also Javascript?)
[...]

Functionally, not very much.

Readability and understandibility-wise, a lot.

And that is the point. I personally couldn't care less what it contributes to Python, since I don't use Python very much outside of SCons, and within SCons concurrent processing is already taken care of for you and isn't an issue the user needs to worry about. So in that sense, Trio isn't really relevant to me.  But what I do care about is the possibility of a model of concurrency that is much more easily understood and reasoned about, regardless of whether the underlying implementation uses explicit context-switching, fibres, threads, or full-blown processes.

Basically, what we're talking about is the difference between a control flow graph that's an arbitrarily-branching tree (open-ended concurrency model with unrestricted child lifetimes: one entry point, arbitrary number of exits), vs. a single-entry single-exit graph where every branch eventually rejoins the parent (nursery model). Having an arbitrarily branching control flow means many concepts don't work, like return values, propagating exceptions back to the parent, managing child lifetimes, etc..  Having well-defined joining points for all children means that it's possible to have well-defined return values, exception propagation, manage child lifetimes, etc..

I don't claim this solves all the difficulties of comprehension in concurrent programming, but it does reduce the mental load by quite a bit. And that to me is a plus, because reduced mental load means the programmer is more likely to get it right, and can spend more effort actually focusing on the problem domain instead of wrestling with the nitty-gritty of concurrency.  More productivity, less bugs.  Like using a GC instead of manual memory management.  Or writing in D instead of assembly language. :-D


T

-- 
Almost all proofs have bugs, but almost all theorems are true. -- Paul Pedersen
« First   ‹ Prev
1 2