August 23, 2018
On Wed, 2018-08-22 at 12:00 -0700, H. S. Teoh via Digitalmars-d wrote: […]
> 
> I approached the article from a language-independent viewpoint. While
> I
> know a little bit of Python, I wasn't really very interested in the
> Python-specific aspects of the article, nor in the specific
> implementation the author had written.  What caught my interest was
> the
> concept behind it -- the abstraction for concurrent/parallel
> computation
> that is easy to reason about, compared to other models.

But I see nothing new in the concept (unless I am missing something): scatter/gather parallelism has been a staple part of parallelism for more than 35 years, and that is what this model is about.

> The main innovative idea, IMO, is the restriction of
> parallel/concurrent
> processing to the lifetime of an explicit object, in this case, a
> "nursery". (TBH a better term could have been chosen, but that
> doesn't
> change the underlying concept.)  More specifically, the lifetime of
> this
> object can in turn be tied to a lexical scope, which gives you an
> explicit, powerful way to manage the lifetime of child processes
> (threads, coroutines, whatever), as opposed to the open-endedness of,
> say, spawning a thread that may run arbitrarily long relative to the
> parent thread.

Another name for scatter/gather for the last 35+ years is farmer/worker, which is just another way of describing this nursery. Unless I am missing something, this is just new terminology for the same abstraction.

> This restriction does not limit the expressive power of the
> abstraction
> -- it "gracefully degrades" to current open-ended models if, for
> example, you allocate a nursery on the heap and spawn child processes
> /
> threads / etc. into it.
> 
> However, by restricting the open-endedness of child (process, thread,
> ...) lifetime, it gives you the ability to reason about control flow
> in
> a much more intuitive way.  It restores the linearity of control flow
> in
> a given block of code (with the well-defined exception if a nursery
> was
> explicitly passed in), making it it much easier to reason
> about.  Unless
> you're explicitly passing nurseries around, you no longer have to
> worry
> about whether some function you call in the block might spawn new
> processes that continue running after the block exits. You no longer
> need to explicitly manage shared resources and worry about whether
> resource X could be released at the end of the block. And so on.
> 
> Even in the more complex case where nurseries are being passed
> around,
> you can still reason about the code with relative ease by examining
> the
> lifetime of the nursery objects.  You no longer have to worry about
> the
> case where background processes continue running past the lifetime of
> the main program (function, block, etc.), or manually keeping track
> of
> child processes so that you can sync with them.
> 
> Once you have this new way of thinking about concurrent processing,
> other possibilities open up, like returning values from child
> processes,
> propagating exceptions, cancellation, etc..  (Cancellation may
> require
> further consideration in non-Python implementations, but still, the
> model provides the basis for a cleaner approach to this than open-
> ended
> models allow.)

I am not sure I see a difference between nursery and threadpool or executor. Everything that is being said about this nursery can be said of threadpools and executors so it seems to be just a relabelling of a system already available.

I am not trying to attack the idea of nice management of concurrency and parallelism, I have been railing against crap parallelism for 35+ years. What worries me is that there is a claim of something new here when there appears not to be something new, it is a relabelling of a known concept.

> 
> […]
> > > Indeed.  It certainly seems like a promising step toward
> > > addressing
> > > the nasty minefield that is today's concurrent programming
> > > models.
> > 
> > I'd say processes and channels works just fine. What is this really providing outside the Python sphere? (Also Javascript?)
> 
> [...]
> 
> Functionally, not very much.
> 
> Readability and understandibility-wise, a lot.

Really? I am not convinced.

There is a danger of mixing management of processes/tasks/threads with managing data. Programmers should never have to manage processes/tasks/threads the framework or language should handle that, programmers should be worried only about their data and the flow and readiness of it.

> And that is the point. I personally couldn't care less what it
> contributes to Python, since I don't use Python very much outside of
> SCons, and within SCons concurrent processing is already taken care
> of
> for you and isn't an issue the user needs to worry about. So in that
> sense, Trio isn't really relevant to me.  But what I do care about is
> the possibility of a model of concurrency that is much more easily
> understood and reasoned about, regardless of whether the underlying
> implementation uses explicit context-switching, fibres, threads, or
> full-blown processes.

We are on the same page on that wish, that is certain.

> Basically, what we're talking about is the difference between a
> control
> flow graph that's an arbitrarily-branching tree (open-ended
> concurrency
> model with unrestricted child lifetimes: one entry point, arbitrary
> number of exits), vs. a single-entry single-exit graph where every
> branch eventually rejoins the parent (nursery model). Having an
> arbitrarily branching control flow means many concepts don't work,
> like
> return values, propagating exceptions back to the parent, managing
> child
> lifetimes, etc..  Having well-defined joining points for all children
> means that it's possible to have well-defined return values,
> exception
> propagation, manage child lifetimes, etc..

But should the programmer care about the details of the tasks/fibres/processes? Shouldn't a task/fibre/process terminate when it has finished providing the data. Lifetime should be tied to the data not to some arbitrary notion of code. This is the core of why the analysis of Go is wrong here. goroutines without channels are meaningless. The papers undermines goroutines without understanding how they are used is Go.

Dataflow, actor, CSP, fork/join, etc. implementing various scatter gather strategies already cover the abstraction of "nurseries", and thus cover the problems you highlight. The abstractions already exist, and "nurseries" do not seem to add anything new.

> I don't claim this solves all the difficulties of comprehension in
> concurrent programming, but it does reduce the mental load by quite a
> bit. And that to me is a plus, because reduced mental load means the
> programmer is more likely to get it right, and can spend more effort
> actually focusing on the problem domain instead of wrestling with the
> nitty-gritty of concurrency.  More productivity, less bugs.  Like
> using
> a GC instead of manual memory management.  Or writing in D instead of
> assembly language. :-D

Concurrent programming already has ways of providing a good UX.  At least there are lots of frameworks for this for most programming languages in various guises.

-- 
Russel.
===========================================
Dr Russel Winder      t: +44 20 7585 2200
41 Buckmaster Road    m: +44 7770 465 077
London SW11 1EN, UK   w: www.russel.org.uk



August 28, 2018
On Wednesday, 22 August 2018 at 16:49:01 UTC, Russel Winder wrote:
> Have you tried asyncio in the Python standard library? Is Trio better?

The library that Guido admits is a disaster?  https://twitter.com/gvanrossum/status/938445451908472832

Trio and libraries like it have evolved out of frustration with asyncio.

>>      with open_task_container() as container:
>>          container.start_task(a)
>>          container.start_task(b)
>>          await sleep(1)
>>          container.start_task(c)
>>          # end of with block
>> 
>>      # program continues (tasks a, b, c must be completed)...
>
> Assuming a, b, and c run in parallel and this is just a nice Pythonic
> way of ensuring join, this is fairly standard fork/join thread pool
> task management

It's far more than "ensuring join".  It ensures that tasks created by children are similarly constrained.  It allows the normal try-catch of exceptions coming from any level of child tasks.  It ensures that when a task has an error, all sibling tasks are cancelled and the resulting multi-exception is propagated to the parent.

> – except Python is single threaded so the above is time
> division multiplexing of tasks.

No, tasks can optionally be real threads (obviously constrained by GIL).  Soon they can be tasks from the "multiprocessing" library as well (separate process, either local or remote).

These are details of Trio's implementation and Python.  As mentioned, the control structures apply to any concurrency model (implicit or explicit context switching, OS or light threads, etc.)

> I've not followed async/await in C# but in Python it is a tool for concurrency but clearly not for parallelism. Sadly async/await has become a fashion that means it is being forced into programming languages that really do not need it.

async/await is a means to explicitly mark points of context switching in API's and code.  It applies to threads, coroutines, remote processing, or anything else which may allow something else to modify system state while you're waiting.

Local to a single thread, async/await with coroutines happens to be wonderful because it eliminates a ton of explicit locking, fretting about race conditions, etc. required of the programmer.  It's a building block-- e.g. you can then combine such threads in careful ways with message passing within and among CPU cores to get the best of all worlds.

While Go can certainly implement the nursery and cancellation control structures, there is no turning back on Go's implicit context switching (i.e. any old function call can cause a context switch).  The human is back to fretting about locks and race conditions, and unable to prove that even the smallest of programs is correct.

August 28, 2018
On Tue, 2018-08-28 at 03:36 +0000, John Belmonte via Digitalmars-d wrote:
> On Wednesday, 22 August 2018 at 16:49:01 UTC, Russel Winder wrote:
> > Have you tried asyncio in the Python standard library? Is Trio better?
> 
> The library that Guido admits is a disaster? https://twitter.com/gvanrossum/status/938445451908472832
> 
> Trio and libraries like it have evolved out of frustration with asyncio.

When I originally wrote the comment I was thinking of asyncore, which is even worse than asyncio.

[…]
> 
> It's far more than "ensuring join".  It ensures that tasks created by children are similarly constrained.  It allows the normal try-catch of exceptions coming from any level of child tasks.  It ensures that when a task has an error, all sibling tasks are cancelled and the resulting multi-exception is propagated to the parent.

Any fork/join framework will provide such guarantees.

> > – except Python is single threaded so the above is time division multiplexing of tasks.
> 
> No, tasks can optionally be real threads (obviously constrained by GIL).  Soon they can be tasks from the "multiprocessing" library as well (separate process, either local or remote).
> 
> These are details of Trio's implementation and Python.  As mentioned, the control structures apply to any concurrency model (implicit or explicit context switching, OS or light threads, etc.)

But that is the point, this is Python specific, and yet the motivating example is a misunderstanding of how Go is used. This inconsistency seriously undermines the general argument.

I have no problem with using with statement and context managers to enforce fork/join abstractions in Python. But to say this is a new general abstraction is  false claim.

[…]
> async/await is a means to explicitly mark points of context switching in API's and code.  It applies to threads, coroutines, remote processing, or anything else which may allow something else to modify system state while you're waiting.

Certainly it is a language structure to support yield to the appropriate scheduler, the question is whether a language structure is required or it can be handled with library features. I suspect language feature makes things easier of implementation.

> Local to a single thread, async/await with coroutines happens to be wonderful because it eliminates a ton of explicit locking, fretting about race conditions, etc. required of the programmer. It's a building block-- e.g. you can then combine such threads in careful ways with message passing within and among CPU cores to get the best of all worlds.

I've never really worried about single threaded concurrency, nor used C#, so I have no actual data to apply to this argument. In a multi- threaded context, you can do all that is needed using processes and channels. Having said this executors in single or multi threaded contexts work fine at the library level without language constructs.

> While Go can certainly implement the nursery and cancellation control structures, there is no turning back on Go's implicit context switching (i.e. any old function call can cause a context switch).  The human is back to fretting about locks and race conditions, and unable to prove that even the smallest of programs is correct.

Not if the user sticks to using processes and channels as is the idiom on Go. Go has all the locks stuff, but if you use channels as the means of communication between processes, the programmer never needs explicit locks since everything is handled bu blocking on channels.

-- 
Russel.
===========================================
Dr Russel Winder      t: +44 20 7585 2200
41 Buckmaster Road    m: +44 7770 465 077
London SW11 1EN, UK   w: www.russel.org.uk



September 15, 2018
On Tuesday, 28 August 2018 at 20:05:32 UTC, Russel Winder wrote:
> But that is the point, this is Python specific, and yet the motivating example is a misunderstanding of how Go is used. This inconsistency seriously undermines the general argument.

I don't believe I misunderstand how Go is used.  Trying to solve
every concurrency issue with processes and channels-- were you need
some data silo to manage chunks of shared state-- is not exactly friendly to
casual or novice programmers, which is an important segment of the
population.

>> async/await is a means to explicitly mark points of context switching in API's and code.  It applies to threads, coroutines, remote processing, or anything else which may allow something else to modify system state while you're waiting.
>
> Certainly it is a language structure to support yield to the appropriate scheduler, the question is whether a language structure is required or it can be handled with library features. I suspect language feature makes things easier of implementation.

I'm in total agreement about using a library solution if possible, and
stated that in the original post.

Important news relevant to this thread:  Kotlin has announced support of structured concurrency in its standard coroutine library, citing the "Go statement considered harmful" article as inspiration.

    https://medium.com/@elizarov/structured-concurrency-722d765aa952

Kotlin appears to be a good example of supporting many concurrency mechanisms via a library.  From the manual:

> Many asynchronous mechanisms available in other languages can be
> implemented as libraries using Kotlin coroutines. This includes async/await
> from C# and ECMAScript, channels and select from Go, and generators/yield
> from C# and Python.

Furthermore it generalizes across different ways of managing shared state.  A concurrency scope can span threads, so shared state must be accessed through locks or channels; or a scope can be pinned to a single thread, allowing the programming simplicity of Trio's single-threaded model.

September 26, 2018
On Thursday, 16 August 2018 at 20:30:26 UTC, John Belmonte wrote:
> These are novel control structures for managing concurrency.  Combining this with cooperative multitasking and explicit, plainly-visible context switching (i.e. async/await-- sorry Olshansky) yields something truly at the forefront of concurrent programming.  I mean no callbacks, almost no locking, no explicitly maintained context and associated state machines, no task lifetime obscurity, no manual plumbing of cancellations, no errors dropped on the floor, no shutdown hiccups.  I'm able to write correct, robust, maintainable concurrent programs with almost no mental overhead beyond a non-concurrent program.

I've written an article which attempts to expand on the ingredients making Trio + async/await effective, in the hope this paradigm can be carried elsewhere.

    https://medium.com/@belm0/concurrency-made-easy-d3fdb0382c58

1 2
Next ›   Last »