Algorithms should be free from rich types (page 4) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » Algorithms should be free from rich types (page 4)

July 03, 2023

Re: Algorithms should be free from rich types

Posted by H. S. Teoh
in reply to Steven Schveighoffer

H. S. Teoh

Posted in reply to Steven Schveighoffer

On Mon, Jul 03, 2023 at 12:32:43PM -0400, Steven Schveighoffer via Digitalmars-d wrote: [...]
> The definition of `private` shouldn't change at all. The ability to circumvent it still should remain for those wanting to muck with internal data, and I don't think there's any way to get around that (there's always reinterpret casting). The thing is, it's important to identify the *consequences* of changing private data -- it can *never* be within spec for a library to allow private data access.
> 
> So one can muck around with private data, and pay the cost of zero support (and rightfully so). Or one can petition the library author to provide access to that private data.
[...]

I think we all agree that the mechanics of this won't (and shouldn't) change. But I think the OP was arguing at a higher level of abstraction. It isn't so much about whether private should be overridable or not, or even whether some piece of data in an object should be private or not; the question IMO is whether the library could have been designed in such a way that there's no *need* for private data in the first place. Or at least, the need for such is minimized.

A library with tons of private state and only a rudimentary public API is generally more likely to have situations where the user will be left wishing that there were a couple more knobs to turn that can be used to customize the library's behaviour.

A library with less private state, or just as much private state but with a sophisticated API can lets you tweak more things, would be less likely to leave the user out in the cold with unusual use cases. However, it does risk having too many knobs to turn, causing the API to be far more complex than it ought to be. Which in turn can lead to unnecessary complexity: the combinatorial explosion of configurations make it hard for the author to test every combination, so there may be lots of bugs hidden behind uncommon corner cases.

The ideal library is one where there's almost no private state because there's no need for it: the code Just Works(tm) for any combination of values one may assign to the public state.  The API is simple and concise, yet easily composible and naturally extends to all kinds of use cases, including unusual ones and ones the author himself never envisioned -- yet it all just works together naturally.  This ideal may or may not be attainable, but the closer a library gets to this ideal, the better.

T

-- 
It always amuses me that Windows has a Safe Mode during bootup. Does that mean that Windows is normally unsafe?

July 03, 2023

Re: Algorithms should be free from rich types

Posted by Steven Schveighoffer
in reply to H. S. Teoh

Steven Schveighoffer

Posted in reply to H. S. Teoh

On 7/3/23 2:05 PM, H. S. Teoh wrote:
> On Mon, Jul 03, 2023 at 12:32:43PM -0400, Steven Schveighoffer via Digitalmars-d wrote:
> [...]
>> The definition of `private` shouldn't change at all. The ability to
>> circumvent it still should remain for those wanting to muck with
>> internal data, and I don't think there's any way to get around that
>> (there's always reinterpret casting). The thing is, it's important to
>> identify the *consequences* of changing private data -- it can *never*
>> be within spec for a library to allow private data access.
>>
>> So one can muck around with private data, and pay the cost of zero
>> support (and rightfully so). Or one can petition the library author to
>> provide access to that private data.
> [...]
> 
> I think we all agree that the mechanics of this won't (and shouldn't)
> change. But I think the OP was arguing at a higher level of abstraction.
> It isn't so much about whether private should be overridable or not, or
> even whether some piece of data in an object should be private or not;
> the question IMO is whether the library could have been designed in such
> a way that there's no *need* for private data in the first place. Or at
> least, the need for such is minimized.
> 
> A library with tons of private state and only a rudimentary public API
> is generally more likely to have situations where the user will be left
> wishing that there were a couple more knobs to turn that can be used to
> customize the library's behaviour.

But that's the thing, there are parts that *simply must be private*. No matter how you cut it, it has to have some level of privacy, because otherwise, you can't enforce semantic invariants with the type.

Should array length (not the property, but the actual data field) be public? What about the pointer? Of course not. Yet, you still might want to access those things for some reason. That doesn't mean it's worth a change to public just for that one reason.

> 
> A library with less private state, or just as much private state but
> with a sophisticated API can lets you tweak more things, would be less
> likely to leave the user out in the cold with unusual use cases.
> However, it does risk having too many knobs to turn, causing the API to
> be far more complex than it ought to be. Which in turn can lead to
> unnecessary complexity: the combinatorial explosion of configurations
> make it hard for the author to test every combination, so there may be
> lots of bugs hidden behind uncommon corner cases.

It's easy to talk about this in general terms, like "let you tweak more things", but when you start talking about non-abstract real cases, usually the reason for private data becomes obvious.

The thing is, if it does make sense that something should just be public, making it public is easy, just make a PR to do it, and the benefits/drawbacks can be discussed, planned for, and agreed upon. Going the other way is much much worse.

If you provide public access, it then becomes a supported API. I remember one case in the past, some type in phobos had undocumented members that were public due to laziness or carelessness.

When the code had to change to a different implementation, we had to deprecate that access for years before actually changing. It was horrid. There is a real cost to careless publicity.

-Steve

July 03, 2023

Re: Algorithms should be free from rich types

Posted by H. S. Teoh
in reply to Steven Schveighoffer

H. S. Teoh

Posted in reply to Steven Schveighoffer

On Mon, Jul 03, 2023 at 02:30:14PM -0400, Steven Schveighoffer via Digitalmars-d wrote:
> On 7/3/23 2:05 PM, H. S. Teoh wrote:
[...]
> > I think we all agree that the mechanics of this won't (and shouldn't) change. But I think the OP was arguing at a higher level of abstraction.  It isn't so much about whether private should be overridable or not, or even whether some piece of data in an object should be private or not; the question IMO is whether the library could have been designed in such a way that there's no *need* for private data in the first place. Or at least, the need for such is minimized.
> > 
> > A library with tons of private state and only a rudimentary public API is generally more likely to have situations where the user will be left wishing that there were a couple more knobs to turn that can be used to customize the library's behaviour.
> 
> But that's the thing, there are parts that *simply must be private*. No matter how you cut it, it has to have some level of privacy, because otherwise, you can't enforce semantic invariants with the type.
> 
> Should array length (not the property, but the actual data field) be public?  What about the pointer? Of course not. Yet, you still might want to access those things for some reason. That doesn't mean it's worth a change to public just for that one reason.

We're actually agreeing with each other, y'know. :-D

As I said, the *ideal* is that you wouldn't have private state, or that the private state would be minimal.  In practice, of course, certain things *should* be private, and that's not a problem. The problems the OP described arise when either private is used carelessly, causing things to be private that really need not be, or the API is poorly designed, so that parts of the library that ought to be reusable aren't just because of some arbitrary decision made by the author.

I've never heard people complaining about how the array length data field is private, for example.  That's because it being private does not hinder the user from doing whatever he wants to do with the array (short of breaking the implementation and doing something involving UB, of course).  That's an example of proper usage of private.

An example of where private hinders what a user might wish to do is an algorithm used internally by the library, that for whatever reason is private and unusable outside of the library code, even though the algorithm itself is general and can be applied outside of the scope of the library.  Often in such cases there are immediate pragmatic reasons for it -- the implementation of the algorithm is bound to internal implementation details of other library code, for example. So you can't actually make it public without also making lots of things public that probably shouldn't be.  But at a higher level, one asks the question, why is that algorithm implemented in that way in the first place?  It could have been implemented generically, and the library could have used just a specialized instance of it to solve whatever it is it needs to solve, but the algorithm itself should be available for user code to use.  *That's* the proper design.

But alas, all too often this is not done, and you end up with 5 different implementations of the same algorithm, each with different quirks (and often, different subsets of bugs), and all of them are locked up behind `private`, or require some tangential private structure as argument that isn't constructible except via a long-winded circuitous route that probably doesn't do what the user actually wants it to do, even though the algorithm itself doesn't actually depend on this.

Ultimately these details are just the incidental symptoms. The underlying root cause is a poor design that doesn't correctly decouple orthogonal functionality into reusable pieces.

--T

July 03, 2023

Re: Algorithms should be free from rich types

Posted by claptrap
in reply to H. S. Teoh

claptrap

Posted in reply to H. S. Teoh

On Monday, 3 July 2023 at 19:27:45 UTC, H. S. Teoh wrote:
> On Mon, Jul 03, 2023 at 02:30:14PM -0400, Steven Schveighoffer via Digitalmars-d wrote:
>> On 7/3/23 2:05 PM, H. S. Teoh wrote:
> [...]
>>
>
> We're actually agreeing with each other, y'know. :-D
>
> As I said, the *ideal* is that you wouldn't have private state, or that the private state would be minimal.

the correct usage of "ideal" is..

"Ideally we would do X but we don't because the world is full of idiots"

;)

July 04, 2023

Re: Algorithms should be free from rich types

Posted by Timon Gehr
in reply to monkyyy

Timon Gehr

Posted in reply to monkyyy

On 6/30/23 17:57, monkyyy wrote:
> On Friday, 30 June 2023 at 14:41:00 UTC, bachmeier wrote:
>> On Friday, 30 June 2023 at 11:07:33 UTC, Atila Neves wrote:
>>> I didn't think about the consequences of leaving the door to my flat open all the time
>>
>> Private is more like locking everyone else's doors for their own safety.
> 
> Why do people make arguments about data ownership at all? Functions airnt people.
> 

That's why functions are not making the arguments. API design is a social activity between programmers. Programmers are people. Simple.

Anyway, it's not like private actually prevents you from deliberately accessing things, it just makes clear that that's outside the supported API.

July 03, 2023

Re: Algorithms should be free from rich types

Posted by Steven Schveighoffer
in reply to H. S. Teoh

Steven Schveighoffer

Posted in reply to H. S. Teoh

On 7/3/23 3:27 PM, H. S. Teoh wrote:
> On Mon, Jul 03, 2023 at 02:30:14PM -0400, Steven Schveighoffer via Digitalmars-d wrote:
>> On 7/3/23 2:05 PM, H. S. Teoh wrote:
> [...]
>>> I think we all agree that the mechanics of this won't (and
>>> shouldn't) change. But I think the OP was arguing at a higher level
>>> of abstraction.  It isn't so much about whether private should be
>>> overridable or not, or even whether some piece of data in an object
>>> should be private or not; the question IMO is whether the library
>>> could have been designed in such a way that there's no *need* for
>>> private data in the first place. Or at least, the need for such is
>>> minimized.
>>>
>>> A library with tons of private state and only a rudimentary public
>>> API is generally more likely to have situations where the user will
>>> be left wishing that there were a couple more knobs to turn that can
>>> be used to customize the library's behaviour.
>>
>> But that's the thing, there are parts that *simply must be private*.
>> No matter how you cut it, it has to have some level of privacy,
>> because otherwise, you can't enforce semantic invariants with the
>> type.
>>
>> Should array length (not the property, but the actual data field) be
>> public?  What about the pointer? Of course not. Yet, you still might
>> want to access those things for some reason. That doesn't mean it's
>> worth a change to public just for that one reason.
> 
> We're actually agreeing with each other, y'know. :-D
> 

Yeah kind of. It's just that there are 2 types of privacy labeling, careless and designed.

> As I said, the *ideal* is that you wouldn't have private state, or that
> the private state would be minimal.  In practice, of course, certain
> things *should* be private, and that's not a problem. The problems the
> OP described arise when either private is used carelessly, causing
> things to be private that really need not be, or the API is poorly
> designed, so that parts of the library that ought to be reusable aren't
> just because of some arbitrary decision made by the author.

If you carelessly label your fields as public, then realizing later they should have been private is costly, maybe impossible.

If you carelessly label your fields as private, while it might upset some people, making them public later is easy.

So if you are going to "not care" about public/private, technically the less risky choice is to make everything private, and worry about it later if it becomes an issue. So in that sense I disagree with the OP point.

That being said, I've done a lot of libs where I just don't care and leave everything public. It's mostly because I don't expect widespread usage, and I also don't mind breaking peoples code (I don't think any of my projects that I started are past 1.0 yet). But something like Phobos shouldn't be so careless. We really should continue to make careless things private unless there is a good reason to make them public.

> 
> I've never heard people complaining about how the array length data
> field is private, for example.  That's because it being private does not
> hinder the user from doing whatever he wants to do with the array (short
> of breaking the implementation and doing something involving UB, of
> course).  That's an example of proper usage of private.

It's an obvious example that we all can agree on. If we agree there are clearly cases where private is important, than we start working our way back to where the line should be drawn.

> An example of where private hinders what a user might wish to do is an
> algorithm used internally by the library, that for whatever reason is
> private and unusable outside of the library code, even though the
> algorithm itself is general and can be applied outside of the scope of
> the library.  Often in such cases there are immediate pragmatic reasons
> for it -- the implementation of the algorithm is bound to internal
> implementation details of other library code, for example. So you can't
> actually make it public without also making lots of things public that
> probably shouldn't be.  But at a higher level, one asks the question,
> why is that algorithm implemented in that way in the first place?  It
> could have been implemented generically, and the library could have used
> just a specialized instance of it to solve whatever it is it needs to
> solve, but the algorithm itself should be available for user code to
> use.  *That's* the proper design.

I agree that some things shouldn't be private. But what's the answer? When it should be public, just change it to public!

An actual example of this in Phobos is the absence of a binary search algorithm. It's there, in SortedRange. But that implementation is private basically for no good reason (it can be trivially extracted into its own function). And SortedRange in itself is a schizophrenic meld of overbearing restrictions and puzzling allowances.

The only reason I haven't made a PR for it is I just made a copy in my own code and have moved on. But it would probably be pretty trivial to expose.

-Steve

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation