Jump to page: 1 2
Thread overview
Please do not use 'auto' return types without thoroughly describing the interface
Dec 26
Mark
Dec 28
Mark
December 25
If you have a function with a return type listed as `auto`, please thoroughly describe what interface the return value provides.

"Returns a forward range of Elem", where Elem is a template parameter, is fine, for instance.

But look at std.regex.RegexMatch.front:

"Functionality for processing subsequent matches of global regexes via range interface"

It has an example that mentions a `hit` property. Other parts of the RegexMatch interface mention `pre` and `post` properties. What are they? It is a mystery!

Turns out that the return type is a Captures struct, and that could be made explicit with little effort. (Such as https://github.com/dlang/phobos/pull/5963.)

This *is* redundant. Looking at the source code lets you find the return type in less than thirty seconds. Or you can put into your code: `pragma(msg, ReturnType!(func).stringof);` However, I imagine we'd lose a lot of people in the "see missing documentation" → "look at source code" step.
December 25
I would like to fix this using signatures[0].
Signatures have a number of benefits here, including removing of TypeInfo (which prevents code-bloat).

There is a few other examples based upon allocators here[1].

But this isn't a short term goal even if it does get approved ;)

[0] https://github.com/rikkimax/DIPs/blob/master/DIPs/DIP1xxx-RC.md
[1] https://github.com/rikkimax/stdc-signatures/blob/master/stdc/memory/allocators.d
December 25
On Monday, 25 December 2017 at 03:23:33 UTC, Neia Neutuladh wrote:
> If you have a function with a return type listed as `auto`, please thoroughly describe what interface the return value provides.
>

I would just like to say that I strongly agree. Lack of documentation of template parameters and other unspecified types in interfaces, like auto return types is one of the primary reasons for people to turn away from template libraries and regard templates as obscure.

> "Returns a forward range of Elem", where Elem is a template parameter, is fine, for instance.
>
> But look at std.regex.RegexMatch.front:
>
> "Functionality for processing subsequent matches of global regexes via range interface"
>
> It has an example that mentions a `hit` property. Other parts of the RegexMatch interface mention `pre` and `post` properties. What are they? It is a mystery!
>
> Turns out that the return type is a Captures struct, and that could be made explicit with little effort. (Such as https://github.com/dlang/phobos/pull/5963.)
>
> This *is* redundant. Looking at the source code lets you find the return type in less than thirty seconds. Or you can put into your code: `pragma(msg, ReturnType!(func).stringof);` However, I imagine we'd lose a lot of people in the "see missing documentation" → "look at source code" step.

This is not redundant. The documentation doesn't give enough information to use the interface, so its simply incomplete.
December 25
On Mon, Dec 25, 2017 at 04:26:52PM +0000, Piotr Klos via Digitalmars-d wrote:
> On Monday, 25 December 2017 at 03:23:33 UTC, Neia Neutuladh wrote:
> > If you have a function with a return type listed as `auto`, please thoroughly describe what interface the return value provides.
> > 
> 
> I would just like to say that I strongly agree. Lack of documentation of template parameters and other unspecified types in interfaces, like auto return types is one of the primary reasons for people to turn away from template libraries and regard templates as obscure.

While I agree that all template parameters ought to be documented and all auto return types thoroughly described, I disagree with explicit naming of auto return types. The whole point of auto return types is to return an *opaque* type that user code should not depend on, apart from what the documentation says you can do with the type.  It's a matter of encapsulation, i.e., "you can do X, Y, Z with the return value of this function, everything else is none of your business". Or, in other words, if your code can't possibly work without knowing the explicit type of the return value, then you're doing something wrong (using the function wrongly).

Ideally, you shouldn't even be able to create an instance of the type yourself, hence the name Voldemort type (the type that can't be named) though there is a valid use case for storing instances of such types in aggregates like structs and classes. For the latter use case, typeof(...) is your friend.

The idea behind this kind of coding style is to make your code as generic as possible, i.e., it will work knowing only the bare minimum about the data it's given, so it can handle a wider variety of data than when types are hard-coded.  Naming explicit library types creates an unnecessary dependency between user code and library implementation details, and reduces encapsulation.


[...]
> > But look at std.regex.RegexMatch.front:
> > 
> > "Functionality for processing subsequent matches of global regexes via range interface"
> > 
> > It has an example that mentions a `hit` property. Other parts of the RegexMatch interface mention `pre` and `post` properties. What are they?  It is a mystery!
> > 
> > Turns out that the return type is a Captures struct, and that could be made explicit with little effort. (Such as https://github.com/dlang/phobos/pull/5963.)
[...]

I strongly disagree with this kind of change.  The exact type does not and should not need to be known by user code. It's an implementation detail.  The whole idea is that user code shouldn't need to know what the type is in order to be able to work with it.  This is part of the encapsulation of the library type.  Making the type name visible to user code means that future Phobos releases can no longer rename the type or otherwise substitute it with something else without breaking user code. This breaks encapsulation.  Making the type non-explicit in user code means Phobos can transparently change the concrete type and user code will still work as before after recompilation.

Even if you need to store the result in a struct member, you can use typeof(...) to implicitly get the type needed to declare the member. In this case, the typeof(...) will automatically adjust itself to whatever the return type is when a future release of Phobos changes the underlying implementation, so it keeps user code independent of Phobos implementation details and preserves encapsulation.

Now granted, if the documentation is unclear about what exactly you can do with the type, then the docs are at fault and need to be improved. If what exactly can be done with .pre and .post isn't made clear, then the docs need to be fixed.  But the declaration of such types ought to remain internal to Phobos and opaque to user code.


T

-- 
This is a tpyo.
December 26
On Monday, 25 December 2017 at 22:48:39 UTC, H. S. Teoh wrote:
> The exact type does not and should not need to be known by user code.

It's a public type reported by some of the documentation but not other parts. Its capabilities are given vaguely in the docs for the functions that return it.

This is a documentation problem, but concrete return types will put a hard limit on how bad the documentation situation can be.

> It's an implementation detail.

The fact that there is a named type involved is not shocking. If there is not a named type, we can use something like:

  public alias Regex(Char) = typeof(regex([Char.init]));

I can add it in my own code too. Pretty much no chance of breakage.

The fact that two functions return values of the same type is perhaps an implementation detail, but there's no way to ensure you don't break user code that way besides using a unique type for each (hello, std.typecons.Typedef). Like I might read the docs and, quite reasonably, write:

  auto capture = needFirst ? matchFirst(str, regex) : matchAll(str, regex).skip(3).front;
  auto process = asShell ? spawnShell("echo hi") : spawnProcess(["/bin/echo", "hi"]);

And that's suddenly going to stop compiling some day. Maybe. Except that would be bonkers because nobody's going to write two different `Captures` structs to be used in nearly identical circumstances. It would be pointless work to have two different types with the same interface for processes.

The fact that it's somehow absolutely essential to maintain this option in all current cases, but not so essential as to motivate anyone to deprecate public types for anything or use Typedef everywhere, is kind of worrying.
December 26
On Monday, 25 December 2017 at 22:48:39 UTC, H. S. Teoh wrote:
> While I agree that all template parameters ought to be documented and all auto return types thoroughly described, I disagree with explicit naming of auto return types. The whole point of auto return types is to return an *opaque* type that user code should not depend on, apart from what the documentation says you can do with the type.  It's a matter of encapsulation, i.e., "you can do X, Y, Z with the return value of this function, everything else is none of your business". Or, in other words, if your code can't possibly work without knowing the explicit type of the return value, then you're doing something wrong (using the function wrongly).
>
> T

Maybe we can document the interface of the return type using signature constraints? For instance, for the function:

auto map(Range)(Range r) if (isInputRange!(Unqual!Range));

we'd add the following to its constraints:

isInputRange!(ReturnType!(map!R))

However, this does not compile at the moment. I'm not sure why.
December 26
On Monday, 25 December 2017 at 22:48:39 UTC, H. S. Teoh wrote:
> On Mon, Dec 25, 2017 at 04:26:52PM +0000, Piotr Klos via Digitalmars-d wrote:
>> On Monday, 25 December 2017 at 03:23:33 UTC, Neia Neutuladh wrote:
>> > If you have a function with a return type listed as `auto`, please thoroughly describe what interface the return value provides.
>> > 
>> 
>> I would just like to say that I strongly agree. Lack of documentation of template parameters and other unspecified types in interfaces, like auto return types is one of the primary reasons for people to turn away from template libraries and regard templates as obscure.
>
> While I agree that all template parameters ought to be documented and all auto return types thoroughly described, I disagree with explicit naming of auto return types. The whole point of auto return types is to return an *opaque* type that user code should not depend on, apart from what the documentation says you can do with the type.  It's a matter of encapsulation, i.e., "you can do X, Y, Z with the return value of this function, everything else is none of your business". Or, in other words, if your code can't possibly work without (...)

While I agree with the sentiment that encapsulation is good and all, I think the emphasis is on the wrong thing here. It seems that you are happy to take forever thinking about the right thing to do, even at the price of eternal unusability. The desing goals of standard library should be the exact opposite, i.e. make things 100% usable even at the cost of making less features, and then maybe later make more features if a need appears.
If something is not usable yet, it shouldn't be in the stdlib!

So in my humble opinion the process for getting something into a stdlib should be create + document + assess usability -> repeat -> publish or reject, not create -> publish -> document, especially for template libs!
Otherwise you risk that the lib won't be very usable after all despite the best intentions (if users are even patient enouh to wait for the proper docs). For example look at all the template parameters in boost c++ libraries, most of which are never used.
December 27
On Tue, Dec 26, 2017 at 11:28:56AM +0000, Mark via Digitalmars-d wrote:
> On Monday, 25 December 2017 at 22:48:39 UTC, H. S. Teoh wrote:
> > While I agree that all template parameters ought to be documented and all auto return types thoroughly described, I disagree with explicit naming of auto return types. The whole point of auto return types is to return an *opaque* type that user code should not depend on, apart from what the documentation says you can do with the type. It's a matter of encapsulation, i.e., "you can do X, Y, Z with the return value of this function, everything else is none of your business". Or, in other words, if your code can't possibly work without knowing the explicit type of the return value, then you're doing something wrong (using the function wrongly).
> > 
> > T
> 
> Maybe we can document the interface of the return type using signature constraints? For instance, for the function:
> 
> auto map(Range)(Range r) if (isInputRange!(Unqual!Range));
> 
> we'd add the following to its constraints:
> 
> isInputRange!(ReturnType!(map!R))
> 
> However, this does not compile at the moment. I'm not sure why.

I'm not sure why either, but signature constraints are intended to be used for constraining the range of incoming template arguments that the template will accept. It's not intended to establish a contract on what the template is going to produce when instantiated with those arguments, though this could well be an enhancement request. Perhaps this should be filed in bugzilla so that it won't get forgotten.

The best we can do currently, which unfortunately won't show up in the docs, is to use a static assert to force compilation failure when the return type doesn't match expectations, e.g.:

	auto myFunc(Args...)(Args args) {
		struct Result {
			...
		}
		static assert(isInputRange!Result);
		return Result(args);
	}

Or similarly:

	auto myFunc(Args...)(Args args) {
		return ...;
		assert(is(typeof(return) == ExpectedType));
	}


T

-- 
The easy way is the wrong way, and the hard way is the stupid way. Pick one.
December 27
On Wednesday, 27 December 2017 at 16:36:59 UTC, H. S. Teoh wrote:
> The best we can do currently, which unfortunately won't show up in the docs, is to use a static assert to force compilation failure when the return type doesn't match expectations, e.g.:
[...]
> 		static assert(isInputRange!Result);

I'm more concerned about types that are specific to a purpose with an interface that is not standard and widely used across a lot of D code. Which is why I'm talking about std.regex.Captures and not std.algorithm.iteration.MapResult: MapResult is just a range implementation, and ranges are a core concept in D today.

And this is about documentation, so a static assert inside the implementation doesn't really help.
December 27
On Tue, Dec 26, 2017 at 01:50:06AM +0000, Neia Neutuladh via Digitalmars-d wrote:
> On Monday, 25 December 2017 at 22:48:39 UTC, H. S. Teoh wrote:
> > The exact type does not and should not need to be known by user code.
> 
> It's a public type reported by some of the documentation but not other parts. Its capabilities are given vaguely in the docs for the functions that return it.
> 
> This is a documentation problem, but concrete return types will put a hard limit on how bad the documentation situation can be.

Then the documentation needs to be fixed.  Resorting to concrete return types seems to me to be a pessimistic approach, when we should rather be improving the code / docs.


> > It's an implementation detail.
> 
> The fact that there is a named type involved is not shocking. If there is not a named type, we can use something like:
> 
>   public alias Regex(Char) = typeof(regex([Char.init]));
> 
> I can add it in my own code too. Pretty much no chance of breakage.
> 
> The fact that two functions return values of the same type is perhaps an implementation detail, but there's no way to ensure you don't break user code that way besides using a unique type for each (hello, std.typecons.Typedef).

If more than one function is returning the same type, then that's reasonable grounds to ask for a named return type.

OTOH, there's the case where the API may reasonably return distinct types, but just so happens that the present implementation reuses the same type, in which case user code should not rely on the return types being compatible with each other.  Your example below of matchFirst vs. matchAll could, arguably, be an example of this category, since conceivably, matchFirst could return a type that encapsulates a single match, whereas matchAll could return a range of such matches.  Then potentially a future version of the library could use different return types, even if the current implementation does reuse the same type. Assuming that the return types are compatible is an iffy proposition at best.

Keeping the return types opaque helps to discourage this sort of misplaced assumption in user code.


> Like I might read the docs and, quite reasonably, write:
> 
>   auto capture = needFirst ? matchFirst(str, regex) : matchAll(str,
> regex).skip(3).front;
>   auto process = asShell ? spawnShell("echo hi") :
> spawnProcess(["/bin/echo", "hi"]);

I think you're missing some context there, I'm not sure what the second line of code has to do with `capture` in the first line. Care to clarify?


> And that's suddenly going to stop compiling some day. Maybe. Except that would be bonkers because nobody's going to write two different `Captures` structs to be used in nearly identical circumstances. It would be pointless work to have two different types with the same interface for processes.

Not necessarily, if matchFirst returns a single Match type, say, and matchAll returns a range of Match elements. Then you could very well have different, incompatible return types here.


> The fact that it's somehow absolutely essential to maintain this option in all current cases, but not so essential as to motivate anyone to deprecate public types for anything or use Typedef everywhere, is kind of worrying.

I wouldn't say it's *absolutely essential*, just that imposing the least amount of constraints on the code (including possible future changes) is a desirable thing to have.


T

-- 
If you're not part of the solution, you're part of the precipitate.
« First   ‹ Prev
1 2