Algorithms should be free from rich types (page 3)

Settings

Help

Index » General » Algorithms should be free from rich types (page 3)

June 30, 2023

Re: Algorithms should be free from rich types

Posted by bachmeier
in reply to Atila Neves

Permalink

bachmeier

Posted in reply to Atila Neves

Permalink

On Friday, 30 June 2023 at 11:07:33 UTC, Atila Neves wrote:

> API design is indeed hard. Which makes it all the more imperative to not accidentally design one with implementation details that users downstream start depending on. That is: API design needs to be a conscious opt-in decision and not "I guess I didn't think about the consequences of leaving the door to my flat open all the time and now there are people camping in my living room".

Private is more like locking everyone else's doors for their own safety. In the cases that it keeps an intruder out, it was helpful to them. When grandma had to sleep on the sidewalk, not so much. Many times library authors have prevented me from doing my work because of arbitrarily preventing access to implementation details. I should have the option to override those decisions. If something blows up, or if my code gets broken in the future, it's my fault, because I was the one that made that decision.

June 30, 2023

Re: Algorithms should be free from rich types

Posted by monkyyy
in reply to bachmeier

Permalink

monkyyy

Posted in reply to bachmeier

Permalink

On Friday, 30 June 2023 at 14:41:00 UTC, bachmeier wrote:
> On Friday, 30 June 2023 at 11:07:33 UTC, Atila Neves wrote:
>> I didn't think about the consequences of leaving the door to my flat open all the time
>
> Private is more like locking everyone else's doors for their own safety.

Why do people make arguments about data ownership at all? Functions airnt people.

June 30, 2023

Re: Algorithms should be free from rich types

Posted by H. S. Teoh
in reply to bachmeier

Permalink

H. S. Teoh

Posted in reply to bachmeier

Permalink

On Fri, Jun 30, 2023 at 02:41:00PM +0000, bachmeier via Digitalmars-d wrote:
> On Friday, 30 June 2023 at 11:07:33 UTC, Atila Neves wrote:
> 
> > API design is indeed hard. Which makes it all the more imperative to not accidentally design one with implementation details that users downstream start depending on. That is: API design needs to be a conscious opt-in decision and not "I guess I didn't think about the consequences of leaving the door to my flat open all the time and now there are people camping in my living room".
> 
> Private is more like locking everyone else's doors for their own safety. In the cases that it keeps an intruder out, it was helpful to them. When grandma had to sleep on the sidewalk, not so much. Many times library authors have prevented me from doing my work because of arbitrarily preventing access to implementation details. I should have the option to override those decisions. If something blows up, or if my code gets broken in the future, it's my fault, because I was the one that made that decision.

The thing is, both of the above are true.

Private does have its uses: to hide implementation details from unrelated parts of the code so that, especially in a large project with many contributors, you don't end up with accidental dependencies between parts of the code that really shouldn't depend on each other. Hairball dependencies among unrelated modules is a major factor of unmaintainability in large projects, and preventing this goes a long way to reduce long-term maintenance costs.

The other side to this, however, is that deciding what should be private and what shouldn't is a hard problem, and most people either can't figure it out, or can't be bothered to put in the effort to get it right, so they slap private on everything, making it hard to reuse their code outside of the narrow confines of how they initially envisioned it. So you end up with an API that covers the most common use cases but not others, which causes a lot of frustration when downstream code wants to do something but can't via the API, so they have to resort to copy-pasta or breaking private. (See: API design is hard.)

Most people design APIs around how they envision the module would be (or ought to be) used, at a relatively high level of abstraction, without regard to the core algorithms that would be used to implement this. What we may call a "use-centric API".  Contrary to popular belief, this is actually a mistake.  It frequently leads to the situation where a useful algorithm that might benefit other parts of the code gets locked behind the private implementation of the module, because it doesn't directly map to the external API. This in turn promotes code duplication: if my module also needs some variant of the same algorithm, I have to copy-n-paste it or re-implement it from scratch in my own module -- usually also behind `private`, so the next person that comes along will need to do it again. It actually *reduces* code reuse. It also fosters the desire to break private: I realize that the algorithm is already implemented, so I wish I could break private in order to avoid rewriting it myself.

A better approach is an algorithm-centric API design: in the course of implementing a module (or library), identify the core algorithms that solve the main problems that the module/library is trying to solve, and design the API around exposing this algorithm to user code.  Then on top of that, add some syntactic sugar that maps this to the high-level usage of the algorithm (the use-centric API). There may still be private parts (internal details of the algorithms that the user really doesn't need to know), but these are confined to things that outside code truly doesn't need to know, not a blanket default that may unintentionally exclude certain unusual, but valid, use cases.

There is an important philosophical difference between these two approaches. The first approach tends towards the philosophy of "you have problem X, no problem, hand it over to us (the library), we'll perform the magic to solve it, and we'll give you back the result Y". The method of solution is opaque and hidden from user code. IOW, the hood is welded shut; your only recourse in case of problems is to take it back to the dealer (the library author). The second approach has the philosophy "you have problem X, we (the library) will give you tools A, B, C, that you can use to solve problem X. In addition, we provide you special combo D (syntactic sugar functions) that will solve X the usual way without you having to figure out how to combine A, B, and C in the right way." The hood is open and you may fiddle with the things inside if you know what you're doing. But most of the time you won't need to -- the syntactic sugar functions handle the most common use cases for you.

The first approach empowers the library writer, the second approach empowers the user.  My argument is that the second approach is superior. No abstraction is perfect (otherwise it wouldn't be an abstraction!); there will always be cases where you need to go under the hood and do something the library author didn't envision initially. Give him the tools to do so without breaking encapsulation, instead of forcing him to come back to you for help.

T

-- 
Claiming that your operating system is the best in the world because more people use it is like saying McDonalds makes the best food in the world. -- Carl B. Constantine

June 30, 2023

Re: Algorithms should be free from rich types

Posted by Meta
in reply to bachmeier

Permalink

Meta

Posted in reply to bachmeier

Permalink

On Friday, 30 June 2023 at 14:41:00 UTC, bachmeier wrote:
> Private is more like locking everyone else's doors for their own safety. In the cases that it keeps an intruder out, it was helpful to them. When grandma had to sleep on the sidewalk, not so much. Many times library authors have prevented me from doing my work because of arbitrarily preventing access to implementation details. I should have the option to override those decisions. If something blows up, or if my code gets broken in the future, it's my fault, because I was the one that made that decision.

IMO private is extremely important for maintaining the internal invariants of a unit of encapsulation.

June 30, 2023

Re: Algorithms should be free from rich types

Posted by bachmeier
in reply to H. S. Teoh

Permalink

bachmeier

Posted in reply to H. S. Teoh

Permalink

On Friday, 30 June 2023 at 16:33:31 UTC, H. S. Teoh wrote:

> Private does have its uses: to hide implementation details from unrelated parts of the code so that, especially in a large project with many contributors, you don't end up with accidental dependencies between parts of the code that really shouldn't depend on each other.

That can never happen if you have to explicitly override something that's been marked private - it's an intentional dependency.

> The other side to this, however, is that deciding what should be private and what shouldn't is a hard problem, and most people either can't figure it out, or can't be bothered to put in the effort to get it right, so they slap private on everything, making it hard to reuse their code outside of the narrow confines of how they initially envisioned it.

It's worse than that. Saying something is private is used as a substitute for documenting or even commenting the code.

> So you end up with an API that covers the most common use cases but not others, which causes a lot of frustration when downstream code wants to do something but can't via the API, so they have to resort to copy-pasta or breaking private. (See: API design is hard.)

It's hard not because you don't know what others need, but because you're marking stuff private and there's no way for anyone else to override that decision.

One of the many examples related to the project I just released is the R shared library. The developers have not exported most of the functionality of the library. So when other developers created the Matrix package (now installed by default) for greatly expanded matrix types and operations, they had to resort to copying and pasting large amounts of C code for no obvious reason. Now there are two copies of all that code floating around, but they're probably out of sync. And as I noted above, private means the code is not documented or commented, so who knows if that hasn't resulted in bugs in hard-to-catch edge cases.

I agree with the existence of private. In some cases, strictly enforcing privacy is a good thing (though you can't prevent copy and paste). It's difficult to justify the absence of a simple override mechanism.

Where it gets really frustrating is when you've invested time getting to 95% of what you need. You're at the point where it almost works, but arbitrary decisions about private mean you'll never be able to achieve 100% of what you need.

July 01, 2023

Re: Algorithms should be free from rich types

Posted by Dom DiSc
in reply to Meta

Permalink

Dom DiSc

Posted in reply to Meta

Permalink

On Friday, 30 June 2023 at 16:48:39 UTC, Meta wrote:
> IMO private is extremely important for maintaining the internal invariants of a unit of encapsulation.

Yes. And this is pretty much the only reason to use private.
You have functions that don't keep the invariants for performance reasons, so you create public functions that call them in the correct order and with the correct parameters to keep the invariants.
So private is there, to hide unsafe interfaces, to prevent the user of a library to mess up things.

If you want to be able to mess up things, any kind of API will never be good enough for you - you simply need the source code and modify it.
And then private won't hinder you - simply remove it.

July 02, 2023

Re: Algorithms should be free from rich types

Posted by Dukc
in reply to bachmeier

Permalink

Dukc

Posted in reply to bachmeier

Permalink

On Friday, 30 June 2023 at 14:41:00 UTC, bachmeier wrote:

I should have the option to override those decisions. If something blows up, or if my code gets broken in the future, it's my fault, because I was the one that made that decision.

You do have it. __traits(getMember, /+...+/) as others have mentioned, or some ugly casting trickery. Or just patching the library yourself to make the member you want public.

July 02, 2023

Re: Algorithms should be free from rich types

Posted by Dukc
in reply to Ali Çehreli

Permalink

Dukc

Posted in reply to Ali Çehreli

Permalink

On Tuesday, 27 June 2023 at 21:53:59 UTC, Ali Çehreli wrote:

The main topic here is about the harm caused by rich types surrounding algorithms. Let's say I am interested in using an open source algorithm that works with a memory area. (Not related to D.) We all know that a memory area can be described by a fat pointer like D's slices. So, that is what the algorithm should take.

Unfortunately, the poor little algorithm is not free to be used: It is written to work with a custom type of that library; let's call it MySlice, which is produced by MyMemoryMappedFile, which is produced by MyFile, which is initialized only by types like MyFilePath. (I may have gotten the relationships wrong there.)

But my data is already in a memory area that I own! How can I call that algorithm? Should I write it to a file first and then use those rich types to access the algorithm? That should not be necessary...

The language-agnostic answer is to patch the library yourself to do what you want.

Since D is a systems programming language, you also have another choice: bypass the type system, create MySlice by pointer casting it from the data representing a D slice.

Now, neither of these solutions are exactly inviting. But they cannot be: to create MySlice in a way the library doesn't support, you have to know it's private implementation details. Even if the language didn't give the library author a way to protect those details, you'd be relying on undocumented version-specific details.

Not having private would better in the sense you'd be more likely to get compiler errors instead of memory corruption if the private details change. Maybe __traits(getMember, /+...+/), or declaring a private function as external extern(C) function, CTFE-mangling the D name, would be safer than the pointer cast I proposed.

July 03, 2023

Re: Algorithms should be free from rich types

Posted by Atila Neves
in reply to bachmeier

Permalink

Atila Neves

Posted in reply to bachmeier

Permalink

On Friday, 30 June 2023 at 14:41:00 UTC, bachmeier wrote:
> On Friday, 30 June 2023 at 11:07:33 UTC, Atila Neves wrote:
>
>> API design is indeed hard. Which makes it all the more imperative to not accidentally design one with implementation details that users downstream start depending on. That is: API design needs to be a conscious opt-in decision and not "I guess I didn't think about the consequences of leaving the door to my flat open all the time and now there are people camping in my living room".
>
> Private is more like locking everyone else's doors for their own safety.

I don't see how - it only applies to your own code, adding private doesn't make someone else's code no longer accessible.

> In the cases that it keeps an intruder out, it was helpful to them. When grandma had to sleep on the sidewalk, not so much.

This is where the analogy breaks down. The whole point of private is to make a conscious choice over what is an implementation detail and what is part of the API. If it's the default, the programmer is nudged towards thinking of a good API instead of it being ad-hoc.

> I should have the option to override those decisions.

As a library author, I don't think you should. It's on me to support usage of private functions that I'm nominally allowed to delete, but not really if someone is going to complain.

> If something blows up, or if my code gets broken in the future, it's my fault, because I was the one that made that decision.

In theory, yes. In practice, yelling. We told people that `in` was in flux and because of that, to not use it. People (including me!) did it anyway. Some of them later complained when we decided what to do with it.

July 03, 2023

Re: Algorithms should be free from rich types

Posted by Steven Schveighoffer
in reply to Atila Neves

Permalink

Steven Schveighoffer

Posted in reply to Atila Neves

Permalink

On 7/3/23 3:57 AM, Atila Neves wrote:

On Friday, 30 June 2023 at 14:41:00 UTC, bachmeier wrote:

I should have the option to override those decisions.

As a library author, I don't think you should. It's on me to support usage of private functions that I'm nominally allowed to delete, but not really if someone is going to complain.

That is the issue. For instance, if you do:

libFunction(cast(int *)0xdeadbeef);

And then complain that libFunction's author didn't handle that case, you can rightfully be told to RTFM.

Same thing with circumventing private. It should be possible, but absolutely unsupported.

> >

If something blows up, or if my code gets broken in the future, it's my fault, because I was the one that made that decision.

In theory, yes. In practice, yelling. We told people that in was in flux and because of that, to not use it. People (including me!) did it anyway. Some of them later complained when we decided what to do with it.

The definition of private shouldn't change at all. The ability to circumvent it still should remain for those wanting to muck with internal data, and I don't think there's any way to get around that (there's always reinterpret casting). The thing is, it's important to identify the consequences of changing private data -- it can never be within spec for a library to allow private data access.

So one can muck around with private data, and pay the cost of zero support (and rightfully so). Or one can petition the library author to provide access to that private data.

-Steve

Top | Forum index | About this forum

Forums