Wait-free MPSC and MPMC implement in D (page 2)

May 09, 2018

Re: Wait-free MPSC and MPMC implement in D

Posted by Jonathan M Davis
in reply to Shachar Shemesh

Permalink

Jonathan M Davis

Posted in reply to Shachar Shemesh

Permalink

On Wednesday, May 09, 2018 07:13:08 Shachar Shemesh via Digitalmars-d wrote:
> I have a motto in life - if you assume you're great, you most likely aren't. I have grown out of the "my code has no bugs" phase of my life quite a while ago.

The scary truth is that no matter what you do, you're likely going to have bugs in your software that you've simply never found, and they could be minor issues that will never matter or major issues that could completely blow up in your face later. There are plenty of steps that we can and should take in order to try to write software without bugs, but even if we somehow manage to write a piece of software that has no bugs (which is _highly_ unlikely in any software of any real size), we have no way of knowing that we've reached that point. A lack of bug reports just means that no one has reported a bug, not that we don't have any.

> The fact that code is used, to good effect and for a long period of time does not mean that it doesn't have problems, some of them might be quite fundamental.

A great example of this would be openssl. It's used by huge amounts of software and yet it's terrible software.

Stuff like operating systems have bugs reported on them all the time, and those often have thousands of developers and millions of users.

> fundamental. "This code is highly tested" is not a good answer to "I think I found a problem".

Yeah. Thorough testing is a great sign (and a lack of testing is a very bad sign), but it's not a silver bullet. It's easy to do what looks like very thorough testing and find that you've missed stuff when you actually look at the code coverage. And 100% code coverage doesn't mean that that you caught everything, just that you're actually running tests that touch all of the parts of the code.

std.datetime had very thorough test coverage when it was first released, but at least one bug still crept through (and was fortunately found later). I have no clue whether there are any bugs left. I hope not, but there's no way to know, and any time I touch the code, I risk adding more bugs, even if I'm fixing other problems.

dxml had 100% test coverage (or as close as is possible anyway), and yet when Johan passed it through ldc's fuzzer, it found three cases where I'd failed to call empty before calling front which had not been caught by the tests. Maybe it has more such problems, maybe it doesn't. And it may or may not have other bugs. I have no way to know, and on some level, that's a bit unnerving, but it's just how life goes as a software developer.

If someone finds a bug in my software, I want to know about it. It's quite possible that what someone is pointing out is invalid (e.g. when Johan first fuzz-tested dxml, he'd used the API incorrectly, violating the pre-condition for a function he was calling, so what he initially found was not a bug), but if we have any hope of even approaching perfect software, we need to know about the problems that are found so that they can be fixed. And to an extent, a lack of bug reports indicates a lack of use, not a lack of bugs.

> So, my plea is this. If you find an area worth improving in Mecca, please don't hesitate to post. If you find you do hesitate, please please feel free to send me a private message.

I second this for any libraries that I make available and would hope that this would be the attitude of anyone releasing software. There are plenty of cases where folks disagree on where things should go and what is or isn't a bug, but we need to know where the problems are if we have any hope of fixing them.

- Jonathan M Davis

On Wednesday, 9 May 2018 at 04:17:17 UTC, Shachar Shemesh wrote:
> On 09/05/18 01:09, David Nadlinger wrote:
>> The algorithm isn't wait-free (haven't thought too carefully about this, though)
>
> This mirrors a discussion I had with Maor (who originally wrote it). Let's see if I bring you around the way I was brought around.

Ah, and I've been taking Liran to be the originator of this fine piece of work up to now… ;)

> At the API level, there are two areas where the algorithm *cannot* be wait free. If a consumer tries to consume from an empty queue, or a producer tries to produce to a full one.

True, but some applications might favour APIs which can provide these guarantees by blocking instead of aggressively early-returning. For example, one could use a helping scheme to implement a MPSC queue that doesn't starve any producers even if the queue is close to full, or a SPMC queue that doesn't starve any consumers even if the queue is close to empty.

> Those two aside […] the algorithm actually is wait free.

Is it, though?

If we assume the usual definition of wait-freedom (bounded number of steps for each thread to make progress), then MCSPQueue.pop() is problematic, as any one thread could get persistently unlucky with the cas(). This could also be the case in the steady state with a non-empty queue; for example, if the producer pushes elements at a rate matching that at which (n - 1) of n consumer threads pop them.

I'd need to think more carefully about the non-sequentially-consistent isFull() before making any statements about SCMPQueue.

 — David

Forums