February 20, 2012
On Sun, Feb 19, 2012 at 11:58:19PM +0100, deadalnix wrote: [...]
> I would add that, by thinking at your proposal of exception that may succed if you retry the same thing, phobos should propose a retry function that take as parameter a closure and and limit and will retry the operation until it succeed or that the limit is reached.
> 
> The more I think of it, the more it make sense to have a property on Exceptions to explicit if a retry may help.

This is starting more and more to sound like what's described in the link that bearophile posted elsewhere in this discussion:

http://www.gigamonkeys.com/book/beyond-exception-handling-conditions-and-restarts.html

I stand by my objection that if something might succeed if it can be retried, then it needs to be retried in the called function, not the caller.

I've already posted a summary of the above link, but I'd like to repeat some of the salient points:

- With the current try/catch mechanism, we are limited to two modes of
  recovery: (1) don't bother, just abort; (2) restart the operation from
  scratch, and hope it won't fail this time.

- To have more intelligent error recovery requires that such recovery
  take place *in the scope of the function that throws the exception*.

- However, the code that throws the exception is usually low-level, and
  as such does not have the global, high-level context to decide which
  course of action to take.

- By the time a thrown exception gets to said high-level code, the stack
  is already unwound, the original execution context is long gone, and
  the only recovery left is to restart the entire operation from
  scratch.

Perhaps the "ideal exception handling facility" that Andrei is looking for is a Lispian model, where:

- The low-level function that throws the exception also indicates a list
  of possible recovery strategies;

- The high-level code that eventually calls the low-level function
  registers exception recovery policies with the runtime (in the form of
  delegates that choose between recovery strategies or prompts the
  runtime to unwind the stack);

- When an exception is thrown, the runtime matches the thrown exception
  with the recovery policy, and corrects the problem based on the
  decision of said policy *in the execution context of the low-level
  function where the problem occurred*. The stack is only unwound if no
  recovery policy is available to correct the problem, or if the policy
  says to abort the operation.

Note that the registration of recovery delegates needs to be out-of-band (not passed as function parameters) because there can potentially be a very long chain of calls before the low-level code is reached. Manually propagating lists of recovery delegates through the entire call chain is not practical. It also adds lots of noise to the code (clutters normal code with exception-related code) and adds unnecessary CPU overhead for the usual case when no problems happen.

That's why this needs to be done by the runtime system.  You also want language support for this mechanism, otherwise you end up with tons of boilerplate code.


T

-- 
It only takes one twig to burn down a forest. It only takes one twit to burn down a project.
February 20, 2012
On Sun, Feb 19, 2012 at 05:38:23PM -0600, Andrei Alexandrescu wrote:
> On 2/19/12 5:28 PM, Jonathan M Davis wrote:
> >On Sunday, February 19, 2012 18:48:02 address_is@invalid.invalid wrote:
> >>I guess "transient" is more descriptive.
> >
> >Actually, thinking on it some more, I don't think that transient will work at all, and the reason is simple. _Which_ operation should you retry?
> 
> The application decides.
> 
> >You don't even necessarily know which function the exception came from out of the functions that you called within the try block - let alone which function actually threw the exception. Maybe it was thrown 3 functions deep from the function that you called, and while retrying that specific call 3 functions down might have made sense, retrying the function 3 functions up doesn't necessarily make sense at all.
> >
> >Whether or not you can retry or retrying makes any sense at all is _highly_ dependent on who actually catches the exception. In many cases, it may be a function which could retry it, but in many it won't be, and so having the exception tell the caller that it could retry would just be misleading.
> 
> No dependence on context. The bit simply tells you "operation has failed, but due to a transitory matter". That is information local to the thrower.
[...]

But *which* transitory matter? A temporary outage on the network? A timeout due to excessive CPU load? A full disk (which is transitory because some other process might remove a large file in the interim)?

Without knowing the context, this information is of little use.

I'm really starting to like the Lisp system more, the more I think about this.  Let the low-level code provide a list of recovery strategies, and let the high-level code register recovery policies that select between these recovery strategies *in the context of the low-level code*. The runtime matches policy to strategy, and the stack is only unwound when no recovery is possible.


T

-- 
Frank disagreement binds closer than feigned agreement.
February 20, 2012
On Sun, Feb 19, 2012 at 02:53:04PM -0800, Jim Hewes wrote: [...]
> Two, the exception hierarchy is orthogonal to the library.

Yes, yes, and yes! You have identified one of the causes of the current problems with the exception hierarchy.


[...]
> So changing the library functions---adding or removing them--- doesn't require changes in the exception hierarchy. Perhaps ParseException could then have a field that highlights the text that could not be parsed. This could be generally for all parsing type application.

For a well-defined library like Phobos, we really shouldn't be defining new exceptions in each module. Rather, (almost) all exceptions should be collected in a common place, organized as a clean hierarchy without being unnecessarily bound to any particular module. Modifications to the exception hierarchy need to be properly evaluated before being committed to the codebase.

Of course, there are some exceptions that are specific to a module; those obviously belong in their respective module. But generally speaking, exceptions should fit into a clean classification hierarchy independently of which module first introduced them.

Just because exception X was first introduced by module M doesn't mean that it's specific to module M; it may encapsulate a larger class of problems that M just happens to touch.

By separating the exception hierarchy from the module hierarchy, we can clean up a lot of the mess that resulted from conflating the two.


T

-- 
Chance favours the prepared mind. -- Louis Pasteur
February 20, 2012
On 2/19/12 7:53 PM, H. S. Teoh wrote:
> I stand by my objection that if something might succeed if it can be
> retried, then it needs to be retried in the called function, not the
> caller.

If read fails from a socket, it's of no use to try it again. One must close the socket, reconnect, and attempt the whole operation once again.

Andrei
February 20, 2012
On 2/19/12 7:58 PM, H. S. Teoh wrote:
> On Sun, Feb 19, 2012 at 05:38:23PM -0600, Andrei Alexandrescu wrote:
>> On 2/19/12 5:28 PM, Jonathan M Davis wrote:
>>> On Sunday, February 19, 2012 18:48:02 address_is@invalid.invalid wrote:
>>>> I guess "transient" is more descriptive.
>>>
>>> Actually, thinking on it some more, I don't think that transient will work at
>>> all, and the reason is simple. _Which_ operation should you retry?
>>
>> The application decides.
>>
>>> You don't even necessarily know which function the exception came
>> >from out of the functions that you called within the try block - let
>>> alone which function actually threw the exception. Maybe it was
>>> thrown 3 functions deep from the function that you called, and while
>>> retrying that specific call 3 functions down might have made sense,
>>> retrying the function 3 functions up doesn't necessarily make sense
>>> at all.
>>>
>>> Whether or not you can retry or retrying makes any sense at all is
>>> _highly_ dependent on who actually catches the exception. In many
>>> cases, it may be a function which could retry it, but in many it
>>> won't be, and so having the exception tell the caller that it could
>>> retry would just be misleading.
>>
>> No dependence on context. The bit simply tells you "operation has
>> failed, but due to a transitory matter". That is information local to
>> the thrower.
> [...]
>
> But *which* transitory matter? A temporary outage on the network? A
> timeout due to excessive CPU load? A full disk (which is transitory
> because some other process might remove a large file in the interim)?

Doesn't matter.

> Without knowing the context, this information is of little use.

It is. User code may decide to retry the high-level operation (of which the low-level failure has no knowledge).


Andrei
February 20, 2012
On Sun, Feb 19, 2012 at 08:33:04PM -0600, Andrei Alexandrescu wrote:
> On 2/19/12 7:53 PM, H. S. Teoh wrote:
> >I stand by my objection that if something might succeed if it can be retried, then it needs to be retried in the called function, not the caller.
> 
> If read fails from a socket, it's of no use to try it again. One must close the socket, reconnect, and attempt the whole operation once again.
[...]

Correct, so that would be a recovery strategy at the operation level, say at sendHttpRequest or something like that. There is not enough information available to sendHttpRequest to know whether or not the caller wants the request to be retried if it fails.

But if the higher-level code could indicate this by way of a recovery policy delegate, then this retry can be done at the sendHttpRequest level, instead of percolating up the call stack all the way to submitHttpForm, which then has to reparse user data, convert into JSON, say, and then retry the entire operation all over again.

I'm really liking the Lisp approach. It nicely combines stack unwinding with in-context recovery by using a high-level delegate to make sound decisions based on factors outside the scope of the low-level function. You can unwind up to the level where a high-level delegate can step in and select a recovery strategy, and then continue on your way, rather than unwinding all the way up to the top and having to restart from square 1.

The delegate that makes decisions doesn't have to be at the absolute top level either (that wouldn't make sense, since it would break encapsulation: top-level code needs to know about inner workings of low-level code so that it can decide how to recover from low-level operations). You can have a chain of delegates that make decisions at various levels, and the one nearest to where the exception is generated is invoked. It can then decide to defer to the next handler if it doesn't have enough information to proceed.


T

-- 
Дерево держится корнями, а человек - друзьями.
February 20, 2012
I agree that the "Lispian" model works well, though I had issues trying to get my head around it when I encountered it.

I don't know how you'd make a simpler version for D (D lacking Lisps ridiculous macros) but maybe something that essentially "returns" a list of recovery codes (which would unfortunately have to be documented) that can be called depending on the context of the error.

But error-handling is hard, programmers are naturally lazy, and checking errors is not something exiting. Exceptions are always going to be a source of contention amongst people. I know people (mostly C programmers) that hate them, and other people swear by them. I agree that incorrect parameters should not be Exceptions, and are contract-level issues, check your parameters before passing them if there are conditions on them!

its a difficult topic, hence the ridiculously long thread we have going here. Error codes are not really the way to go, I prefer more of an Objective-C/Smalltalk null-pattern style, where null can be a valid argument almost anywhere the type system allows and code handles it properly, normally by returning null.

Exceptions are a source of contention due to long-range dependencies, but you can apply that to many things, and if you treat exceptions as part of the API, then changing an exception can be considered an API change and therefore something to be done with care.

James Miller
February 20, 2012
On Sun, Feb 19, 2012 at 08:34:30PM -0600, Andrei Alexandrescu wrote:
> On 2/19/12 7:58 PM, H. S. Teoh wrote:
> >On Sun, Feb 19, 2012 at 05:38:23PM -0600, Andrei Alexandrescu wrote:
[...]
> >>No dependence on context. The bit simply tells you "operation has failed, but due to a transitory matter". That is information local to the thrower.
> >[...]
> >
> >But *which* transitory matter? A temporary outage on the network? A timeout due to excessive CPU load? A full disk (which is transitory because some other process might remove a large file in the interim)?
> 
> Doesn't matter.
> 
> >Without knowing the context, this information is of little use.
> 
> It is. User code may decide to retry the high-level operation (of which the low-level failure has no knowledge).
[...]

But on what basis will it make this decision? All it knows is that something went wrong somewhere deep in the call stack, and it's presented with a binary choice: retry or abort. It doesn't know what that problem was. So how would it know if retrying would help? Saying that it "might" help doesn't seem useful to me.

It's only useful if you present this choice to the *user* along with the error message encapsulated in the exception, and let the user make the decision on the basis of the error message.

I think we're all agreed that parsing the error message is not a viable solution, so basically the catch block is presented with a blind binary choice of which it knows nothing about.  Such a choice is meaningless to a computer program.

Do you have a concrete scenario in mind where such a decision would actually be useful? Otherwise we'll just end up with boilerplate code copy-n-pasted everywhere of the form:

	auto retries = SomeArbitraryNumber;
	do {
		try {
			...
		} catch(Exception e) {
			if (e.is_transient && retries-- > 0)
				continue;
			throw e;
		}
	} while(false);

But since this block is completely independent of what's inside the try block, why not just put it where the exception is generated in the first place?


T

-- 
If the comments and the code disagree, it's likely that *both* are wrong. -- Christopher
February 20, 2012
On 2/19/12 8:52 PM, H. S. Teoh wrote:
> On Sun, Feb 19, 2012 at 08:33:04PM -0600, Andrei Alexandrescu wrote:
>> On 2/19/12 7:53 PM, H. S. Teoh wrote:
>>> I stand by my objection that if something might succeed if it can be
>>> retried, then it needs to be retried in the called function, not the
>>> caller.
>>
>> If read fails from a socket, it's of no use to try it again. One must
>> close the socket, reconnect, and attempt the whole operation once
>> again.
> [...]
>
> Correct, so that would be a recovery strategy at the operation level,
> say at sendHttpRequest or something like that. There is not enough
> information available to sendHttpRequest to know whether or not the
> caller wants the request to be retried if it fails.
>
> But if the higher-level code could indicate this by way of a recovery
> policy delegate, then this retry can be done at the sendHttpRequest
> level, instead of percolating up the call stack all the way to
> submitHttpForm, which then has to reparse user data, convert into JSON,
> say, and then retry the entire operation all over again.
>
> I'm really liking the Lisp approach.

Now we're talking. Ideas. Outside the box.

Andrei


February 20, 2012
On 2/19/12 9:06 PM, H. S. Teoh wrote:
> On Sun, Feb 19, 2012 at 08:34:30PM -0600, Andrei Alexandrescu wrote:
>> On 2/19/12 7:58 PM, H. S. Teoh wrote:
>>> On Sun, Feb 19, 2012 at 05:38:23PM -0600, Andrei Alexandrescu wrote:
> [...]
>>>> No dependence on context. The bit simply tells you "operation has
>>>> failed, but due to a transitory matter". That is information local
>>>> to the thrower.
>>> [...]
>>>
>>> But *which* transitory matter? A temporary outage on the network? A
>>> timeout due to excessive CPU load? A full disk (which is transitory
>>> because some other process might remove a large file in the interim)?
>>
>> Doesn't matter.
>>
>>> Without knowing the context, this information is of little use.
>>
>> It is. User code may decide to retry the high-level operation (of
>> which the low-level failure has no knowledge).
> [...]
>
> But on what basis will it make this decision? All it knows is that
> something went wrong somewhere deep in the call stack, and it's
> presented with a binary choice: retry or abort.

No. The information is: "There's been failure, but due to a transitory cause." In a high-level transaction, it doesn't matter which particular step failed. If whatever failure was transitory, the transaction can be attempted again.

> It doesn't know what
> that problem was.

Doesn't have to.

> So how would it know if retrying would help? Saying
> that it "might" help doesn't seem useful to me.

Retrying helps because the error happened because of a temporary cause. That info is known at the raise place, and nicely passed up to the high-level command.

> It's only useful if you present this choice to the *user* along with the
> error message encapsulated in the exception, and let the user make the
> decision on the basis of the error message.

That's up to the application. A server application, for example, can't ask the user. We log and we have configurable number of retries.

> I think we're all agreed that parsing the error message is not a viable
> solution, so basically the catch block is presented with a blind binary
> choice of which it knows nothing about.  Such a choice is meaningless to
> a computer program.
>
> Do you have a concrete scenario in mind where such a decision would
> actually be useful? Otherwise we'll just end up with boilerplate code
> copy-n-pasted everywhere of the form:
>
> 	auto retries = SomeArbitraryNumber;
> 	do {
> 		try {
> 			...
> 		} catch(Exception e) {
> 			if (e.is_transient&&  retries-->  0)
> 				continue;
> 			throw e;
> 		}
> 	} while(false);
>
> But since this block is completely independent of what's inside the try
> block, why not just put it where the exception is generated in the first
> place?

I explained this. The raise locus does not have access to the high-level context.


Andrei