Jump to page: 1 27  
Page
Thread overview
Checking function parameters in Phobos
Nov 20, 2013
growler
Nov 20, 2013
bearophile
Nov 20, 2013
Brad Anderson
Nov 20, 2013
Walter Bright
Nov 20, 2013
Jacob Carlborg
Nov 20, 2013
Marco Leise
Nov 20, 2013
Timon Gehr
Nov 20, 2013
Jacob Carlborg
Nov 20, 2013
Timon Gehr
Nov 20, 2013
Meta
Nov 20, 2013
Meta
Nov 20, 2013
Meta
Nov 20, 2013
Walter Bright
Nov 20, 2013
Jacob Carlborg
Nov 20, 2013
Jonathan M Davis
Nov 20, 2013
Jacob Carlborg
Nov 20, 2013
Marco Leise
Nov 20, 2013
Jacob Carlborg
Nov 20, 2013
Simen Kjærås
Nov 20, 2013
Meta
Nov 20, 2013
Dicebot
Nov 20, 2013
Dmitry Olshansky
Nov 20, 2013
Dicebot
Nov 21, 2013
inout
Nov 21, 2013
Meta
Nov 22, 2013
Simen Kjærås
Nov 25, 2013
Marco Leise
Nov 25, 2013
Simen Kjærås
Nov 26, 2013
inout
Nov 27, 2013
Simen Kjærås
Nov 27, 2013
Meta
Nov 27, 2013
Marco Leise
Nov 24, 2013
Simen Kjærås
Nov 25, 2013
Meta
Nov 25, 2013
Meta
Nov 25, 2013
Simen Kjærås
Nov 26, 2013
Meta
Nov 26, 2013
Simen Kjærås
Nov 20, 2013
Simen Kjærås
Nov 20, 2013
Dmitry Olshansky
Nov 20, 2013
Meta
Nov 20, 2013
Jonathan M Davis
Nov 20, 2013
Meta
Nov 20, 2013
Jonathan M Davis
Nov 21, 2013
Jacob Carlborg
Nov 21, 2013
Jonathan M Davis
Nov 21, 2013
Jacob Carlborg
Nov 21, 2013
Meta
Nov 21, 2013
Simen Kjærås
Nov 21, 2013
Jacob Carlborg
Nov 21, 2013
Simen Kjærås
Nov 21, 2013
Daniel Davidson
Nov 21, 2013
Walter Bright
Nov 21, 2013
Jonathan M Davis
Nov 25, 2013
Marco Leise
Nov 21, 2013
Walter Bright
Nov 20, 2013
Jonathan M Davis
Nov 20, 2013
Jacob Carlborg
Nov 20, 2013
Timon Gehr
Nov 20, 2013
Jacob Carlborg
Nov 20, 2013
Jonathan M Davis
Nov 20, 2013
Dmitry Olshansky
Nov 27, 2013
Lionello Lunesu
Nov 20, 2013
Jonathan M Davis
November 20, 2013
There's been recent discussion herein about what parameter validation method would be best for Phobos to adhere to.

Currently we are using a mix of approaches:

1. Some functions enforce()

2. Some functions just assert()

3. Some (fewer I think) functions assert(0)

4. Some functions don't do explicit checking, relying instead on lower-level enforcement such as null dereference and bounds checking to ensure safety.

Each method has its place. The question is what guidelines we put forward for Phobos code to follow; we're a bit loose about that right now.

A second, just as interesting topic, is how to design abstractions for speed and safety. There are cases in which spurious checking is prohibitively expensive if not necessary, so it should be avoided where necessary. Examples:

(a) FracSecs(long x) validates x to be within range. The cost of the validation itself is about as high as the payload itself (which is one assignment).

(b) sort() offers a SortedRange with its goodies. We also have assumeSorted that also offers a SortedRange, but relies on the user to validate that assumption.

(c) A variety of text functions currently suffer because we don't make the difference between validated UTF strings and potentially invalid ones.

Walter and I are thinking of fostering the idiom in which types (or attributes?) are used as information about validation, similar to how assumeSorted works. Building on that, we'd have a function like "static FracSecs assumeValid(long)" inside FracSecs (no need for a different type here). Then, we'd have a CleanUTF type or something that would guarantee the string stored within has been validated.


Please chime in with ideas!

Andrei
November 20, 2013
On Wednesday, 20 November 2013 at 00:01:00 UTC, Andrei Alexandrescu wrote:
> There's been recent discussion herein about what parameter validation method would be best for Phobos to adhere to.
>
> Currently we are using a mix of approaches:
>
> 1. Some functions enforce()
>
> 2. Some functions just assert()
>
> 3. Some (fewer I think) functions assert(0)
>
> 4. Some functions don't do explicit checking, relying instead on lower-level enforcement such as null dereference and bounds checking to ensure safety.
>
> Each method has its place. The question is what guidelines we put forward for Phobos code to follow; we're a bit loose about that right now.
>
> A second, just as interesting topic, is how to design abstractions for speed and safety. There are cases in which spurious checking is prohibitively expensive if not necessary, so it should be avoided where necessary. Examples:
>
> (a) FracSecs(long x) validates x to be within range. The cost of the validation itself is about as high as the payload itself (which is one assignment).
>
> (b) sort() offers a SortedRange with its goodies. We also have assumeSorted that also offers a SortedRange, but relies on the user to validate that assumption.
>
> (c) A variety of text functions currently suffer because we don't make the difference between validated UTF strings and potentially invalid ones.
>
> Walter and I are thinking of fostering the idiom in which types (or attributes?) are used as information about validation, similar to how assumeSorted works. Building on that, we'd have a function like "static FracSecs assumeValid(long)" inside FracSecs (no need for a different type here). Then, we'd have a CleanUTF type or something that would guarantee the string stored within has been validated.
>
>
> Please chime in with ideas!
>
> Andrei

I'm not a Phobos dev. but as a user of Phobos and coming from C/C++ I'd like to see...

Less enforce and more debug-only contracts in the std lib, with opt-in run-time checks for release builds.

That way I can decide on a function-by-function basis or globally at compile time whether the run-time checks occur in release builds.

For example, given:

1. FracSecs(long x)
2. FracSecs!Args.verify(long x)

In debug 1. would always have full run-time checking enabled. In release builds 1. would only have essential run-time checks, preferably none. I can then opt-in for run-time checks in release builds using 2.

There would also be a version(ArgsVerify) so I can turn on run-time checks globally at compile time in release builds (maybe the --debug flag allows this already, not sure).

Of course this unfortunately requires even more work from Phobos devs and I'm not a D expert so I don't know how viable it would be.

Whatever is decided I'm looking forward to see what you guys come up with because I'm currently using Phobos as my "Idiomatic D" reference guide.

Thanks G.
November 20, 2013
Andrei Alexandrescu:

> There's been recent discussion herein about what parameter validation method would be best for Phobos to adhere to.
>
> Currently we are using a mix of approaches:
>
> 1. Some functions enforce()
>
> 2. Some functions just assert()
>
> 3. Some (fewer I think) functions assert(0)
>
> 4. Some functions don't do explicit checking, relying instead on lower-level enforcement such as null dereference and bounds checking to ensure safety.
>
> Each method has its place. The question is what guidelines we put forward for Phobos code to follow; we're a bit loose about that right now.

I think Phobos should rely much more on Contract Programming based on asserts. This could mean Dmd automatically using a Phobos compiled with asserts when you compile your D code normally, and automatically using a assert-stripped version of Phobos libs when you compile with -release and similar.

In other situations enforce and exceptions are still useful.


> (b) sort() offers a SortedRange with its goodies. We also have assumeSorted that also offers a SortedRange, but relies on the user to validate that assumption.

I'd like another function, that could be named validateSorted() that returns a SortedRange and always fully verifies its range argument is actually sorted, and throws an exception otherwise. So it doesn't assume its input is sorted. It's like a isSorted + assumeSorted.


> (c) A variety of text functions currently suffer because we don't make the difference between validated UTF strings and potentially invalid ones.

Often I have genomic data or other text data that is surely ASCII (and I can accept a run-time exception at loading time if it's not ASCII). Once such text is in memory I'd like to not pay for UTF on it. Sometimes you can do this with std.string.representation, but there is no opposite function (http://d.puremagic.com/issues/show_bug.cgi?id=10162 ). Also in Phobos there are several string/char functions that could be made faster if the input is assumed to be ASCII. To solve this problem in languages as Haskell they usually introduce a new type like AsciiString. In past I have suggested to introduce such string wrapper in Phobos.


> Then, we'd have a CleanUTF type or something that would guarantee the string stored within has been validated.

In recent talks Bjarne Stroustrup has being advocating a lot such usage of types for safety in C++11/C++14, and functional programmers use it often since lot of time. OcaML programmers use such style of coding to write "safer" code all the time.

Too many types make the code harder (also because D doesn't have de-structuring syntax in function signatures and so on), but few strategically designed structs can help.

Bye,
bearophile
November 20, 2013
On Wednesday, 20 November 2013 at 00:48:40 UTC, bearophile wrote:
> [snip]
> Often I have genomic data or other text data that is surely ASCII (and I can accept a run-time exception at loading time if it's not ASCII). Once such text is in memory I'd like to not pay for UTF on it. Sometimes you can do this with std.string.representation, but there is no opposite function (http://d.puremagic.com/issues/show_bug.cgi?id=10162 ). Also in Phobos there are several string/char functions that could be made faster if the input is assumed to be ASCII. To solve this problem in languages as Haskell they usually introduce a new type like AsciiString. In past I have suggested to introduce such string wrapper in Phobos.
>

Is that not what phobo's AsciiString is? http://dlang.org/phobos/std_encoding.html#.AsciiString

November 20, 2013
On 2013-11-20 01:01, Andrei Alexandrescu wrote:
> There's been recent discussion herein about what parameter validation
> method would be best for Phobos to adhere to.
>
> Currently we are using a mix of approaches:
>
> 1. Some functions enforce()
>
> 2. Some functions just assert()
>
> 3. Some (fewer I think) functions assert(0)
>
> 4. Some functions don't do explicit checking, relying instead on
> lower-level enforcement such as null dereference and bounds checking to
> ensure safety.
>
> Each method has its place. The question is what guidelines we put
> forward for Phobos code to follow; we're a bit loose about that right now.
>
> A second, just as interesting topic, is how to design abstractions for
> speed and safety. There are cases in which spurious checking is
> prohibitively expensive if not necessary, so it should be avoided where
> necessary. Examples:
>
> (a) FracSecs(long x) validates x to be within range. The cost of the
> validation itself is about as high as the payload itself (which is one
> assignment).
>
> (b) sort() offers a SortedRange with its goodies. We also have
> assumeSorted that also offers a SortedRange, but relies on the user to
> validate that assumption.
>
> (c) A variety of text functions currently suffer because we don't make
> the difference between validated UTF strings and potentially invalid ones.
>
> Walter and I are thinking of fostering the idiom in which types (or
> attributes?) are used as information about validation, similar to how
> assumeSorted works. Building on that, we'd have a function like "static
> FracSecs assumeValid(long)" inside FracSecs (no need for a different
> type here). Then, we'd have a CleanUTF type or something that would
> guarantee the string stored within has been validated.

Would we accompany the assumeSorted with an assert in the function assuming something is sorted? We probably don't want to rely on convention.

What about distributing a version of druntime and Phobos with asserts enabled that is used by default (or with the -debug flag). Then a version with asserts disabled is used when the -release flag is used.

We probably also want it to be possible to use Phobos with asserts enabled even in release mode.

-- 
/Jacob Carlborg
November 20, 2013
On 20/11/13 01:01, Andrei Alexandrescu wrote:
> There's been recent discussion herein about what parameter validation method
> would be best for Phobos to adhere to.
>
> Currently we are using a mix of approaches:
>
> 1. Some functions enforce()
>
> 2. Some functions just assert()
>
> 3. Some (fewer I think) functions assert(0)
>
> 4. Some functions don't do explicit checking, relying instead on lower-level
> enforcement such as null dereference and bounds checking to ensure safety.
>
> Each method has its place. The question is what guidelines we put forward for
> Phobos code to follow; we're a bit loose about that right now.

Regarding enforce() vs. assert(), a good rule that I remember having suggested to me was that enforce() should be used for actual runtime checking (e.g. checking that the input to a public API function has correct properties), assert() should be used to test logical failures (i.e. checking that cases which should never arise, really don't arise).

I've always followed that as a rule of thumb ever since.
November 20, 2013
On 11/19/2013 4:01 PM, Andrei Alexandrescu wrote:
> There's been recent discussion herein about what parameter validation method
> would be best for Phobos to adhere to.

Important is deciding upon the notions of "validated data" and "untrusted data" is.

1. Validated data should get asserts if it is found to be invalid.

2. Untrusted data should get exceptions thrown if it is found to be invalid (or return errors).

For example, consider a utf string. If it has passed a validation check, then it becomes trusted data. Further processing on it should assert if it turns out to be invalid (because then you've got a programming bug).

File open failures should always throw, and never assert, because the file is not part of the program and so is inherently not trusted.

One way to distinguish validated from untrusted data is by using different types (or a naming convention, see Joel Spolsky's http://www.joelonsoftware.com/articles/Wrong.html).

It is of major importance in a program to think about what APIs get validated arguments and what APIs get untrusted arguments.
November 20, 2013
On 11/19/2013 4:48 PM, bearophile wrote:
> Also in Phobos there are
> several string/char functions that could be made faster if the input is assumed
> to be ASCII.

Which ones? The ones I coded up originally were designed so they weren't degraded by utf.

November 20, 2013
On Tuesday, November 19, 2013 16:01:00 Andrei Alexandrescu wrote:
> Please chime in with ideas!

In general, I favor using defensive programming in library APIs and using enforce to validate the input to functions. Doing so makes it much harder to misuse the library and makes it much less likely that programs will run into weird and/or undefined behavior or other types of bugs. I then favor using DbC within a library or application for its own code and asserting that input is valid in those cases, because in that case, the caller is essentially part of the same code that's doing the asserting and is maintained by the same people.

The problem with that is of course that there are cases where performance degrades when you use defensive programming and always check input - especially when the caller can know that the data is valid without having to check it first. So, having a way to use an API that doesn't involve it always defensively checking its input can be useful for the sake of efficiency.

Unfortunately, I don't think that it scales at all to take the approach that Walter has suggested of having the API normally assert on input and provide helper functions which the caller can use to validate input when they deem appropriate. That has the advantage of giving the caller control over what is and isn't checked and avoiding unnecessary checks, but it also makes it much easier to misuse the API, and I would expect the average programmer to skip the checks in most cases. It very quickly becomes like using error codes instead of exceptions, except that in this case, instead of an error code being ignored, the data's validity wouldn't have even been checked in the first place, resulting in the function being called doing who-knows-what. And the resulting bugs could be very obvious, or they could be insidiously hard to detect.

So, if we can find a way to default to checking validity and throwing on bad input but still provide a way for the caller to avoid the checks when appropriate, I think that that would be ideal. That way, we default to correctness and user-friendliness (in that the API is harder to silently use incorrectly that way), but we still provide a more performant route for those who know what they're doing and are willing to take the time to make sure that they are sure that they truly do know how to use the API correctly and take responsibility for ensuring that they don't feed bad input to the API.

Now, how we do that, I don't know. In some cases, creating a wrapper type would solve the problem (e.g. some kind of wrapper for strings which guaranteed UTF-correctness). But I don't think that it scales to use wrapper types for all such situations. One alternative is to essentially duplicate a lot of functions with one function validating the input for you and throwing on failure, and the other asserting that the input is valid. But that could result in a lot of code duplication, which isn't terribly desirable either.

The assumeSorted or FracSec.assumeValid solutions seem to go either with a wrapper type or with essentially being a second function which does the same thing but without the validation depending on the types involved and what the function is doing.

Another alternative would be to provide an argument (probably a template argument, though it could be a function argument if that makes more sense) which told the function whether it should assert or enforce on its input. That would at least localize the code duplication, but again, that could get a bit verbose, and I do like how assumeXYZ makes it abundantly clear that the caller is taking responsibility for the correctness in that case.

And in some situations, I think that it would clearly be the case that it wouldn't make any sense to do anything else other than enforce on the input (e.g. string parsing functions have a tendency to have to do almos the same work in the validation function as the actual parsing function, making it almost pointless to have a separate validation function).

So, I think that what we end up doing is definitely going to depend on what the code in question is for and what it's doing, but I agree that it would be valuable to come up with some common idioms for handling validation and error checking, and assumeXYZ would be one such idiom and one which documents things nicely when it can be used.

Still, the most important point that I'd like to make is that I think we should lean towards validating input with enforce by default and then provide alternative means to avoid that validation rather than using assertions and DbC by default, because leaving the validation up to the caller in release and asserting in debug is going to lead to _far_ more bugs in code using Phobos, particularly when the result isn't immediately and obviously wrong when bad input is given. And the fact that by default, the assertions in Phobos won't be hit in calling code unless the Phobos function is templatized (because Phobos will have been compiled in release) makes using assertions that much worse.

But I'll definitely have to think about idioms that we could use to do separate validation where appropriate and yet validate arguments via enforce by default.

- Jonathan M Davis
November 20, 2013
On Wednesday, 20 November 2013 at 00:01:00 UTC, Andrei Alexandrescu wrote:
> (c) A variety of text functions currently suffer because we don't make the difference between validated UTF strings and potentially invalid ones.

I think it is fair to always assume that a char[] is a valid UTF-8 string, and instead perform the validation when creating/filling the string from a non-validated source.

Take std.file.read() as an example; it returns void[], but has a validating counterpart in std.file.readText().

I think we should use ubyte[] to a greater extent for data which is potentially *not* valid UTF.  Examples include interfacing with C functions, where I think there is a tendency towards always translating C char to D char, when they are in fact not equivalent.  Another example is, again, std.file.read(), which currently returns void[].  I guess it is a matter of taste, but I think ubyte[] would be more appropriate here, since you can actually use it for something without casting it first.

The transition from string to ubyte[] is already made simple by std.string.representation.  We should offer an equally simple and convenient way to do the opposite transformation.  In one of my current projects, I am using this function:

  inout(char)[] asString(inout(ubyte)[] data) @safe pure
  {
    auto s = cast(typeof(return)) data;
    import std.utf: validate;
    validate(s);
    return s;
  }

This could easily be written as a template, to accept wider encodings as well, and I think it would be a nice addition to Phobos.

Lars
« First   ‹ Prev
1 2 3 4 5 6 7