H. S. Teoh
Posted in reply to Paul Backus
| On Fri, Oct 29, 2021 at 04:14:35PM +0000, Paul Backus via Digitalmars-d wrote:
> On Friday, 29 October 2021 at 16:08:10 UTC, H. S. Teoh wrote:
> > > This is not the case with allocation/free, which are, by defintion, dependend on a global state (even if only thread local).
> >
> > Yeah, pureFree makes no sense at all. It's a disaster waiting to happen.
>
> I think the original sin here is allowing GC allocation (`new`, `~=`, closures) to be `pure`, for "pragmatic" reasons.
>
> Once you've done that, it's not hard to justify adding `pureMalloc` too. And once you have that, why not `pureFree`? It's just a little white lie; surely nobody will get hurt.
>
> Of course the end result is that `pure` ends up being basically useless for anything beyond linting, and can't be fixed without breaking lots of existing code.
I think the real root problem is mixing incompatible levels of abstraction.
At some level of abstraction, one could argue that GC allocation (or memory allocation in general) is an intrinsic feature of the layer of abstraction you're working with: a bunch of functions that do computations with arrays can be considered pure if the implementation of said arrays is abstracted away by the language, and these functions use only the array primitives given to them by the abstraction, i.e., they don't allocate or free memory directly, so they do not directly observe the external effects of allocation. Think of a program in a functional language, for example. The implementation is definitely changing global state -- doing I/O, allocating/freeing memory, etc.. But at the abstraction level of the function language itself, these implementation details are hidden away and one can meaningfully speak of the purity of functions written in that language. One may legally optimize code based on the abstracted semantics, because the semantics at the higher level are preserved in spite of the low-level implementation details being changed.
The problems come, however, when you have code that operates *both* at the abstract level *and* deal with the low-level implementation at the same time. Suddenly, there is no longer a clear separation between code in the higher-level abstraction and the lower-level implementation where you have to deal with dirty details like allocating and freeing memory. So the assumptions that the higher-level abstraction provides may no longer hold, and that's where you begin to run into trouble. Optimizations based on guarantees provided by the higher-level abstraction become invalidated by lower-level code that break these assumptions (because they operate outside of the confines of the higher-level abstraction).
This is why array manipulation in a D pure function is in some sense permissible, under certain assumptions, but things like pureFree do not make sense, because it clearly mixes incompatible levels of abstraction in a way that will inevitably lead to problems. If we were to permit array allocations in pure code, then we must necessarily also commit to not go outside of the confines of that level of abstraction -- i.e., we are not allowed to use memory allocation primitives that said array operations are based on. As soon as this is violated, the whole thing comes crashing down, because your program now has some operations that are outside the abstraction assumed by the optimizations based on `pure`. Meaning that these optimizations now may be invalid.
The situation is similar to `immutable`. If you're operating at the GC level, there is strictly speaking no such thing as immutable, because the GC code casts untyped memory into immutable and vice versa, so that the same block of memory may be immutable at one point in time but become mutable when it's later collected and reallocated to mutable data. But this does not mean we're not allowed to optimize based on immutable; by the same line of argument we might as well throw const and immutable to the winds. Instead, we declare GC code as @system, with the GC interface @trusted, i.e., the GC operates outside of the confines of immutability, but we trust it to do its job properly so that when we return to the higher-level abstraction, all our previous assumptions about immutable continue to hold.
So for pure, it's the same thing. For something to be pure you must have a well-defined set of abstractions based on which the optimizer is allowed to make certain transformations to your code. You must adhere to the restrictions imposed by this abstraction -- which is what the `pure` qualifier is ostensibly for -- otherwise you end up in UB territory, just like the situation with casting away immutable. The only sane way to maintain D's purity system is that code marked pure cannot contain anything that violates the assumptions we have imposed on pure. Otherwise we're in de facto UB territory even if the spec's definition of UB doesn't specifically state this case.
Long story short, pureFree makes no sense because it's very intent is to make a visible change to the global state of memory -- clearly at a much lower level of abstraction than `pure` is intended to operate at, and clearly outside the `pure` abstraction. In fact, I'd say that *anything* that explicitly allocates/deallocates memory ought to be prohibited from being marked `pure`. Array operations are OK if we view them as intrinsic, opaque operations that the pure abstraction grants us. But anything that explicitly deals with memory allocation is clearly an operation outside the `pure` abstraction, so allowing it to be marked `pure` will inevitably break assumptions and land us in trouble.
T
--
Meat: euphemism for dead animal. -- Flora
|