Algorithms should be free from rich types

July 05, 2023
Re: Algorithms should be free from rich types
Posted by H. S. Teoh
in reply to Steven Schveighoffer
Permalink
H. S. Teoh
Posted in reply to Steven Schveighoffer
Permalink
On Mon, Jul 03, 2023 at 10:14:38PM -0400, Steven Schveighoffer via Digitalmars-d wrote:
> On 7/3/23 3:27 PM, H. S. Teoh wrote:
[...]
> > As I said, the *ideal* is that you wouldn't have private state, or that the private state would be minimal.  In practice, of course, certain things *should* be private, and that's not a problem. The problems the OP described arise when either private is used carelessly, causing things to be private that really need not be, or the API is poorly designed, so that parts of the library that ought to be reusable aren't just because of some arbitrary decision made by the author.
> 
> 
> If you carelessly label your fields as public, then realizing later they should have been private is costly, maybe impossible.

Depends.  D is flexible enough that public fields can be replaced with access functions, and almost all downstream code doesn't have to change to adapt to it.  I've done it a lot in my own code, where some field, say mydata, was previously public but now needs to be private. No problem: just rename it to _mydata, and create access functions mydata() and mydata(typeof(_mydata)) to maintain compatibility with old code.  Unless downstream code does something like take an address of the old field, this change will be transparent, a recompile will make it all work as before without requiring further changes.


> If you carelessly label your fields as private, while it might upset some people, making them public later is easy.

The point is that it then bottlenecks on the author. If the author is not responsive for whatever reason (busy, abandoned the project, etc.) downstream users are stuck up the creek without a paddle.


> So if you are going to "not care" about public/private, technically the less risky choice is to make everything private, and worry about it later if it becomes an issue. So in that sense I disagree with the OP point.

OK, I guess we differ on this point.  Given the choice between having to wait for a potentially MIA author to fix an issue and having the ability to go under the hood to manually work around the issue, I choose the latter.


> That being said, I've done a lot of libs where I just don't care and leave everything public. It's mostly because I don't expect widespread usage, and I also don't mind breaking peoples code (I don't think any of my projects that I started are past 1.0 yet). But something like Phobos shouldn't be so careless. We really should continue to make careless things private unless there is a good reason to make them public.

I guess this has to be judged on a case-by-case basis.


> > I've never heard people complaining about how the array length data field is private, for example.  That's because it being private does not hinder the user from doing whatever he wants to do with the array (short of breaking the implementation and doing something involving UB, of course).  That's an example of proper usage of private.
> 
> It's an obvious example that we all can agree on. If we agree there are clearly cases where private is important, than we start working our way back to where the line should be drawn.

My personal criteria is, if something can be designed without private (and without opening up holes that may allow user code to break stuff), prefer that design.  Barring that, prefer the design that has the least amount of private possible for it to work without opening up loopholes for breakage.

In general, I don't quite agree with e.g. Java's approach of making everything private by default and having only member functions mediate access to private state.  My approach is to prefer POD types that hold public data that anybody can safely mutate, and public functions that operate on said POD types, rather than the closed-box approach advocated by OO.

There's a time and place for the closed-box approach, of course. But in my book, that's the less preferred option that you'd fall back on only if you couldn't do it another way.  And even when you can't avoid the closed-box approach, my preference is to minimize the degree of closedness as much as possible.


> > An example of where private hinders what a user might wish to do is an algorithm used internally by the library, that for whatever reason is private and unusable outside of the library code, even though the algorithm itself is general and can be applied outside of the scope of the library.  Often in such cases there are immediate pragmatic reasons for it -- the implementation of the algorithm is bound to internal implementation details of other library code, for example. So you can't actually make it public without also making lots of things public that probably shouldn't be.  But at a higher level, one asks the question, why is that algorithm implemented in that way in the first place?  It could have been implemented generically, and the library could have used just a specialized instance of it to solve whatever it is it needs to solve, but the algorithm itself should be available for user code to use.  *That's* the proper design.
> 
> I agree that some things shouldn't be private. But what's the answer? When it should be public, just change it to public!

It's not always so simple, though.  The algorithm might have been implemented in a way that depends on private types and internal assumptions that may break in unforeseen ways if you use it without realizing what the assumptions are.  Forcibly changing it to public may require you to make other stuff public that shouldn't be.  Or it may be written in a way that's tightly coupled to other internal library code, such that you can't call it separately.

This gets particularly frustrating when the core of the algorithm itself does *not* depend on these things, but the upstream author wrote it that way because "it's private, so nobody cares if this code is dirty and badly designed". Being able to hide bad code behind private encourages this kind of one-off hacks that avoids having to think about proper code decomposition.


> An actual example of this in Phobos is the absence of a binary search algorithm. It's there, in SortedRange. But that implementation is private basically for no good reason (it can be trivially extracted into its own function). And SortedRange in itself is a schizophrenic meld of overbearing restrictions and puzzling allowances.

Yeah, that binary search function really ought to be public.

I think by now, experience has more than proven that SortedRange was a mistake.  It was an attempt to encode the sortedness of a range in the type system such that Phobos would be able to take advantage of this to provide performance improvements, but D's type system simply isn't powerful enough to express what's needed for this without unnecessary limitations and the weird quirks you see in the current implementation of SortedRange.

It was an interesting and ambitious experiment, but I think it has run its course and the conclusion is that it doesn't work in the current language. Or at least isn't pulling its own weight given its current limitations.  Perhaps it's time to send it to the scrap yard.


> The only reason I haven't made a PR for it is I just made a copy in my own code and have moved on. But it would probably be pretty trivial to expose.
[...]

IMO, we should just get rid of SortedRange and make the binary search algo a public function.

Or even if we don't get rid of SortedRange (breakage of existing code and all that), I don't see why the binary search function shouldn't be publicly available. This is exactly the kind of abuse of `private` I was talking about: the function is clearly there and ready to use, but the author for various reasons decided that no, you're not allowed to just call the function, you have to jump through this here set of hoops to prove your worthiness first.


T

-- 
My father told me I wasn't at all afraid of hard work. I could lie down right next to it and go to sleep. -- Walter Bright
Forums