February 07, 2014
On Friday, February 07, 2014 21:27:04 bearophile wrote:
> Jonathan M Davis:
> > 3. Code which should succeed most of the time but where doing
> > validation
> > essentially requires doing what you're validating for anyway.
> > Again, parsers
> > are a good example of this. For instance, to validate that
> > "2013-12-22T01:22:27z" is in the valid ISO extended string
> > format for a
> > timestamp, you have to do pretty much exactly the same work
> > that you have to
> > do to parse out all of the values to convert it to something
> > other than a
> > string (e.g. SysTime). So, if you validated it first, you'd be
> > doing the work
> > twice. As such, why validate first? Just have it throw an
> > exception when the
> > parsing fails. And if for some reason, you expect that there's
> > a high chance
> > that the parsing would fail, then you can have a function which
> > returns an
> > error code and passed out the result as an out parameter
> > instead, but that
> > makes the code much uglier and error-prone. So, in most cases,
> > you'd want it
> > to throw an exception on failure.
> 
> Languages with a good type system solve this with Maybe / Nullable / Optional and similar things. It's both safe (and efficient if the result is equivalent to just a wapping struct).

That can be a good solution, but it also then requires checking the result. One of the big advantages of exceptions is that your code can not care except for the relatively few points that catch exceptions and handle them. Where you run into problems is when the failure case is likely. And if that's the case, then something like Maybe or Nullable is definitely better.

- Jonathan M Davis
February 07, 2014
On Friday, 7 February 2014 at 22:57:26 UTC, Jonathan M Davis wrote:
> On Friday, February 07, 2014 20:43:38 Dmitry Olshansky wrote:
>> 07-Feb-2014 20:29, Andrej Mitrovic пишет:
>> > On Friday, 7 February 2014 at 16:27:35 UTC, Andrei Alexandrescu wrote:
>> >> Add a bugzilla and let's define isValid that returns bool!
>> > 
>> > Add std.utf.decode() to that as well. IOW, it should have an overload
>> > which returns a status code
>> 
>> Much simpler - it returns a special dchar to designate bad encoding. And
>> there is one defined by Unicode spec.
>
> Isn't that actually worse? Unless you're suggesting that we stop throwing on
> decode errors, then functions like std.array.front will have to check the
> result on every call to see whether it was valid or not and thus whether they
> should throw, which would mean extra overhead over simply having decode throw
> on decode errors. validate has no business throwing, and we definitely should
> add isValidUnicode (or isValid or whatever you want to call it) for validation
> purposes. Code can then call that to validate that a string is valid and not
> worry about any UTFExceptions being thrown as long as it doesn't manipulate
> the string in a way that could result in its Unicode becoming invalid.
> However, I would argue that assuming that everyone is going to validate their
> strings and that pretty much all string-related functions shouldn't ever have
> to worry about invalid Unicode is just begging for subtle bugs all over the
> place IMHO. You're essentially dealing with error codes at that point, and I
> think that experience has shown quite clearly that error codes are generally a
> bad way to go. Almost no one checks them unless they have to. I think that
> having decode throw on invalid Unicode is exactly what it should be doing. The
> problem is that validate shouldn't.
>
> - Jonathan M Davis

You could always return an Option!char. Nullable won't work because it lets you access the naked underlying value.
February 07, 2014
On Friday, February 07, 2014 23:01:46 Meta wrote:
> On Friday, 7 February 2014 at 22:57:26 UTC, Jonathan M Davis
> 
> wrote:
> > On Friday, February 07, 2014 20:43:38 Dmitry Olshansky wrote:
> >> 07-Feb-2014 20:29, Andrej Mitrovic пишет:
> >> > On Friday, 7 February 2014 at 16:27:35 UTC, Andrei
> >> > 
> >> > Alexandrescu wrote:
> >> >> Add a bugzilla and let's define isValid that returns bool!
> >> > 
> >> > Add std.utf.decode() to that as well. IOW, it should have an
> >> > overload
> >> > which returns a status code
> >> 
> >> Much simpler - it returns a special dchar to designate bad
> >> encoding. And
> >> there is one defined by Unicode spec.
> > 
> > Isn't that actually worse? Unless you're suggesting that we
> > stop throwing on
> > decode errors, then functions like std.array.front will have to
> > check the
> > result on every call to see whether it was valid or not and
> > thus whether they
> > should throw, which would mean extra overhead over simply
> > having decode throw
> > on decode errors. validate has no business throwing, and we
> > definitely should
> > add isValidUnicode (or isValid or whatever you want to call it)
> > for validation
> > purposes. Code can then call that to validate that a string is
> > valid and not
> > worry about any UTFExceptions being thrown as long as it
> > doesn't manipulate
> > the string in a way that could result in its Unicode becoming
> > invalid.
> > However, I would argue that assuming that everyone is going to
> > validate their
> > strings and that pretty much all string-related functions
> > shouldn't ever have
> > to worry about invalid Unicode is just begging for subtle bugs
> > all over the
> > place IMHO. You're essentially dealing with error codes at that
> > point, and I
> > think that experience has shown quite clearly that error codes
> > are generally a
> > bad way to go. Almost no one checks them unless they have to. I
> > think that
> > having decode throw on invalid Unicode is exactly what it
> > should be doing. The
> > problem is that validate shouldn't.
> > 
> > - Jonathan M Davis
> 
> You could always return an Option!char. Nullable won't work because it lets you access the naked underlying value.

How is that any better than returning an invalid dchar with a specific value? In either case, you have to check the value. With the exception, code doesn't have to care. If the string is invalid, it'll get a UTFException, and it can handle it appropriately, but having to check the return value just adds overhead (albeit minimal) and is error-prone, because it generally won't be checked (and if it is checked, it complicates the calling code, because it has to do the check).

Code that doesn't want to risk a UTFException being thrown can validate up front - and that validator function return bool and _not_ throw. But having decode not throw is going to be error-prone. It also doesn't help performance- wise, because it still has to do all of the same validity checks as it decodes. It's just that instead of throwing, it returns an error value. I really think that having decode throw on invalid Unicode is the right decision, and I don't see what we gain by making it not throw.

- Jonathan M Davis
February 08, 2014
On 2/7/14, 8:29 AM, Andrej Mitrovic wrote:
> On Friday, 7 February 2014 at 16:27:35 UTC, Andrei Alexandrescu wrote:
>> Add a bugzilla and let's define isValid that returns bool!
>
> Add std.utf.decode() to that as well. IOW, it should have an overload
> which returns a status code but assigns the return value through another
> parameter.

.toBugzilla()

Andrei
February 08, 2014
On 2/7/14, 8:40 AM, Dmitry Olshansky wrote:
> 07-Feb-2014 06:44, Walter Bright пишет:
>> On 2/6/2014 2:15 PM, Brad Anderson wrote:
>>> Personally I don't think bad user input qualifies as an exceptional
>>> case because
>>> it's expected to happen and the program is expected to handle it (and
>>> let the
>>> user know) when it does. That's just a matter of taste though.
>>
>> It's not a matter of taste. If your input is subject to a DoS attack,
>> don't put exceptions in the control flow.
>
> Meh. If exceptions are such a liability we'd better make them (much)
> faster.

One simple idea is to statically allocate the same exception and rethrow it over and over. After all there's no guarantee a distinct exception is thrown every time, and the approach is still memory safe (though it might surprise the programmer who saves a reference to an old exception).

Andrei


February 08, 2014
On Friday, February 07, 2014 16:49:45 Andrei Alexandrescu wrote:
> On 2/7/14, 8:40 AM, Dmitry Olshansky wrote:
> > 07-Feb-2014 06:44, Walter Bright пишет:
> >> On 2/6/2014 2:15 PM, Brad Anderson wrote:
> >>> Personally I don't think bad user input qualifies as an exceptional
> >>> case because
> >>> it's expected to happen and the program is expected to handle it (and
> >>> let the
> >>> user know) when it does. That's just a matter of taste though.
> >> 
> >> It's not a matter of taste. If your input is subject to a DoS attack, don't put exceptions in the control flow.
> > 
> > Meh. If exceptions are such a liability we'd better make them (much)
> > faster.
> 
> One simple idea is to statically allocate the same exception and rethrow it over and over. After all there's no guarantee a distinct exception is thrown every time, and the approach is still memory safe (though it might surprise the programmer who saves a reference to an old exception).

As long as exceptions are cloneable, and people are aware of the fact that they tend to be non-unique, then it can be common practice to clone/dup an exception when you need to keep it around. However, the two potential problems with this overall approach are

1. Do we just always allocate one of each exception type per thread (probably in a static constructor for that exception type)? That would result in a fair number of exceptions being allocated up front. The obvious alternative would be to allocate it the first time that it's thrown so that you only end up with exceptions that get used being allocated, but regardless, we need to take close look at the allocation scheme.

2. This sort of thing has a definite impact on enforce and any idioms related to it. We'd need to either adjust enforce, enforceEx, etc. to avoid the allocation, or we'd need to introduce alternatives to them that expect something like a static opCall on the exception type which returns the common exception for that type or some other standard means of getting at the reusable exception.

Regardless, we need to agree upon a standard way to define exception types allow with some set of standard idioms for handling them such that we can deal with exceptions generically (particularly with regards to stuff like enforce) rather than it being an ad-hoc per-exception type thing that you can't reasonably rely on.

- Jonathan M Davis
February 08, 2014
On Friday, 7 February 2014 at 23:45:06 UTC, Jonathan M Davis wrote:
> On Friday, February 07, 2014 23:01:46 Meta wrote:
>> On Friday, 7 February 2014 at 22:57:26 UTC, Jonathan M Davis
>> 
>> wrote:
>> > On Friday, February 07, 2014 20:43:38 Dmitry Olshansky wrote:
>> >> 07-Feb-2014 20:29, Andrej Mitrovic пишет:
>> >> > On Friday, 7 February 2014 at 16:27:35 UTC, Andrei
>> >> > 
>> >> > Alexandrescu wrote:
>> >> >> Add a bugzilla and let's define isValid that returns bool!
>> >> > 
>> >> > Add std.utf.decode() to that as well. IOW, it should have an
>> >> > overload
>> >> > which returns a status code
>> >> 
>> >> Much simpler - it returns a special dchar to designate bad
>> >> encoding. And
>> >> there is one defined by Unicode spec.
>> > 
>> > Isn't that actually worse? Unless you're suggesting that we
>> > stop throwing on
>> > decode errors, then functions like std.array.front will have to
>> > check the
>> > result on every call to see whether it was valid or not and
>> > thus whether they
>> > should throw, which would mean extra overhead over simply
>> > having decode throw
>> > on decode errors. validate has no business throwing, and we
>> > definitely should
>> > add isValidUnicode (or isValid or whatever you want to call it)
>> > for validation
>> > purposes. Code can then call that to validate that a string is
>> > valid and not
>> > worry about any UTFExceptions being thrown as long as it
>> > doesn't manipulate
>> > the string in a way that could result in its Unicode becoming
>> > invalid.
>> > However, I would argue that assuming that everyone is going to
>> > validate their
>> > strings and that pretty much all string-related functions
>> > shouldn't ever have
>> > to worry about invalid Unicode is just begging for subtle bugs
>> > all over the
>> > place IMHO. You're essentially dealing with error codes at that
>> > point, and I
>> > think that experience has shown quite clearly that error codes
>> > are generally a
>> > bad way to go. Almost no one checks them unless they have to. I
>> > think that
>> > having decode throw on invalid Unicode is exactly what it
>> > should be doing. The
>> > problem is that validate shouldn't.
>> > 
>> > - Jonathan M Davis
>> 
>> You could always return an Option!char. Nullable won't work
>> because it lets you access the naked underlying value.
>
> How is that any better than returning an invalid dchar with a specific value?
> In either case, you have to check the value. With the exception, code doesn't
> have to care. If the string is invalid, it'll get a UTFException, and it can
> handle it appropriately, but having to check the return value just adds
> overhead (albeit minimal) and is error-prone, because it generally won't be
> checked (and if it is checked, it complicates the calling code, because it has
> to do the check).

We have had this discussion at least once before. A hypothetical Option type will not let you do anything with the wrapped value UNTIL you check it, as opposed to returning null, -1, some special Unicode value, etc. Trying to use it before this check is necessarily a compile-time error. This is both faster than exceptions and safer than special "error values" that are only special by convention. I recall that you've worked with Haskell before, so you must know how useful this pattern is.

> Code that doesn't want to risk a UTFException being thrown can validate up
> front - and that validator function return bool and _not_ throw. But having
> decode not throw is going to be error-prone. It also doesn't help performance-
> wise, because it still has to do all of the same validity checks as it
> decodes. It's just that instead of throwing, it returns an error value. I
> really think that having decode throw on invalid Unicode is the right
> decision, and I don't see what we gain by making it not throw.
>
> - Jonathan M Davis

February 08, 2014
On Saturday, February 08, 2014 01:26:10 Meta wrote:
> >> You could always return an Option!char. Nullable won't work because it lets you access the naked underlying value.
> > 
> > How is that any better than returning an invalid dchar with a
> > specific value?
> > In either case, you have to check the value. With the
> > exception, code doesn't
> > have to care. If the string is invalid, it'll get a
> > UTFException, and it can
> > handle it appropriately, but having to check the return value
> > just adds
> > overhead (albeit minimal) and is error-prone, because it
> > generally won't be
> > checked (and if it is checked, it complicates the calling code,
> > because it has
> > to do the check).
> 
> We have had this discussion at least once before. A hypothetical Option type will not let you do anything with the wrapped value UNTIL you check it, as opposed to returning null, -1, some special Unicode value, etc. Trying to use it before this check is necessarily a compile-time error. This is both faster than exceptions and safer than special "error values" that are only special by convention. I recall that you've worked with Haskell before, so you must know how useful this pattern is.

The problem is that you need to check it. This is _slower_ than exceptions in the normal case, as invalid Unicode should be the rare case. The great thing with exceptions is that you can write your code as if it will always work and don't need to put checks in it everywhere. Instead, you just put try-catch blocks in the (relatively) few places that you want to handle exceptions. Most of your code doesn't care. And if you validate the string before you start doing a bunch of operations on it, then you don't have to worry about a UTFException being thrown. Also, if code fails to validate a string for one reason or another, the error gets reported rather than an invalid return value being ignored.

As for returning Optional/Nullable dchar vs an invalid dchar, I don't see much difference. In both cases, you have to check the return value, which is precisely what you don't want to have to do in most cases. And decode has to do the same work to check for valid Unicode whether it throws an exception or returns a value indicating decode-failure, so why have the extra overhead of having to check the result for decode-failure? Just let it throw an exception in that case and handle it in the appropriate part of your code. Returning a Nullable result or a specific bad value that you have to check rather than throwing an exception only makes sense when it's expected that failures are going to be frequent. If failures are infrequent, it's generally far better to use exceptions, because it will lead to much cleaner, less error-prone code.

- Jonathan M Davis
February 08, 2014
Jonathan M Davis:

> The problem is that you need to check it. This is _slower_ than exceptions in the normal case,

Right, but verifying the correctness of the Unicode encoding of a string probably on average requires much more than time than testing a single conditional. So I think this tiny added time is acceptable.

Bye,
bearophile
February 08, 2014
On Saturday, February 08, 2014 02:41:54 bearophile wrote:
> Jonathan M Davis:
> > The problem is that you need to check it. This is _slower_ than exceptions in the normal case,
> 
> Right, but verifying the correctness of the Unicode encoding of a string probably on average requires much more than time than testing a single conditional. So I think this tiny added time is acceptable.

But why even do it in the first place then? The code is cleaner and less error-prone if it uses exceptions. The only argument I can see being made for not using exceptions with decode is efficiency, because it's more cumbersome to use if it's returning error values of some kind rather than just throwing in the rare case that there's a Unicode decoding error. It's also more error- prone than using exceptions, because most code will just skip checking the result. That's one of the big reasons that error codes are generally a bad idea.

But since decode has to do the same validity checks whether it returns an invalid dchar or a Nullable!dchar or if it throws, I don't see why not having the exception buys us anything. It just makes the API worse.

- Jonathan M Davis