Thank you for your input.
On Thursday, 3 June 2021 at 19:36:32 UTC, sighoya wrote:
> Beside the combinatorial explosion in the required logic to check for, what happens if we copy/moving data between different memory annotated variables, e.g. nogc to gc, newcpp to gc.
Did we auto copy, cast or throw an error. If we do not throw an error, an annotation might not only restrict access but also change semantics by introducing new references.
So annotations become implied actions, that can be ok but is eventually hard to accept for the current uses of annotations.
There is no combinatorial explosion, that would be a bad idea ;-).
Annotated references behave like a super class of non-annotated references or, say, a subset of attributes is a super class of a superset of attributes. The best effect description (in the dynamic case) would be viewing memory attributes like a precondition which requires the address value to be in certain interval(s). Currently attributes only have compile-time semantics, you said, so a static check would fit, right?
On Thursday, 3 June 2021 at 19:36:32 UTC, sighoya wrote:
> There is no such thing as memory transparency, strictly speaking, even if you want to allocate things on the stack, what is if your backend doesn't have a stack at all? Or we just rename the heap to stack?
Okay, "memory transparency" is a bad name. It could seem that it reveals actual memory addresses. I mean "allocation" or "scope" transparency.
Concerning the call stack: Languages which don't provide the abstraction of scoped variables (which is implemented by a (call) stack) basically only have global variables + registers. Currently, I'm not conscious about a high-level language nor processor which wouldn't support that abstraction because it's the most basic abstraction of any high-level language. If you have "functions" then you also have a call stack or let's call it "automatic scope". It doesn't matter, whether automatic scope is allocated in heap area (which can happen with closures and continuations), static memory area (for non-recursive functions) or in its own area at the end of the memory layout, it only matters that it's automatically managed by the function. I also think that CPUs which don't support a call stack cannot be programmed with D at all.
If attributes are used with static checks, it will not care about the actual memory address value, only the location in source code where a value was allocated or about the attributes which it gets from the user. The automatic lifetime is the criterion to distinguish it from heap, GC or static memory.
For dynamic checks, I indeed made an assumption, that in real programs actual lifetime/scope can be inferred from memory addresses because allocation regions of related scope usually put variables in common memory areas (at least in common memory segments). This would result in pointer types to be value ranges instead of unconstrained 32-bit integers. Ultimately, information from a linker script could be needed for authentic dynamic checks (using relocated address for checking). I could imagine this to be difficult on top.
Data from stack frames in the heap would be treated as dynamically allocated and data from static stack frames would be treated as stack. This could lead to unexpected results, false errors, unless more information is passed with the pointer. Dynamic checks would require a separate implementation (separate type) which memorizes in some bits which allocation scope a value was created. Eventually, the dynamic solution is less lightweight in memory but it makes the value check easier.
On Thursday, 3 June 2021 at 19:36:32 UTC, sighoya wrote:
> Well, I think having both is problematic/complex. But C has only one of those and C++ has both.
It's not quite correct what arrays belong, so that's a mistake.
You mean references and pointers right? References (from C++) are immutable pointers (in theory). C++ has pointers for backwards compatibility (and probably because the designer originally didn't understand the problem) but are now discouraged from being used as "raw pointers" (when I wrote "pointer" I mean "raw pointer").
(Raw) Pointers instead are modifiable "reference variables" (like the variables in Java) which additionally provide access to the pointer address and allow modifying it. Reference variables however don't allow casting to non-pointer types.
Arrays in C and C++ are actually more like C++ references, i.e. (locally) immutable pointers.
On Thursday, 3 June 2021 at 19:36:32 UTC, sighoya wrote:
> Annotations seam to be neat, but they parametrize your code:
@allocator("X"), @lifetime("param1", "greather", "param2") void f(Type1 param1, Type2 param2)
becomes
void f(Allocator X,Lifetime lifetime(param1), Lifetime lifetime(param2))(Type1 param1, Type2 param2) if currentAllocator=X && lifetime(param1)>=lifetime(param2) {...}
which literally turns every function allocating something into a template increasing "templatism" unless we get runtime generics as Swift.
I agree that templatism is bad.
Are attributes really lowered to template-arguments by the compiler? I also didn't mean to introduce new syntax with a comma between attributes. With memory attributes I really mean attributes like scope
, ref
, private
, pure
, @nogc
... which are used with reference/pointer types, not functions. You would be right that any assignment operation to an annotated reference needs a templated overload. I can't think of another way how to implement it. In the worst case, it would become something like
Ref!(nogc, Flower) tulip; // anything but not allocated by garbage collector
Ref!(static, new, Bird) raven; // no automatic allocation
I would already be happy with the most important attributes.
- "Oh, I see, it returns me GC-allocated memory"
- "Oh, the passed argument is allocated automatically, so I can't put the address into a static reference."
- "Oh, a slice over a fixed-size array will not work with that function."
Of course, the amount of safety to get from these attributes depends on the programmer. For example they don't prevent Use-after-free with @newc
and @newcpp
in every case because it could be that a referenced value suddenly is deleted by code which interrupts the function execution. The true scope depends not only on the location of allocation but also on the location of associated deallocation. If the deallocation can happen in a code block, which interrupts normal function execution, than I would treat it either like shared
or @memory
. The compiler can't know all by itself. The memory safety thus will only work if the proper attributes are used by programmers.
But the fact, that D already implements a very small weaker subset of memory (or reference) attributes like scope
and return ref
shows, that this idea does fit to D's design.
PS:
On Thursday, 3 June 2021 at 19:36:32 UTC, sighoya wrote:
> Why not removing the distinction between values and references/pointers at all? But I think it drifts to hard in a logically high level language and isn't the right way to go in a system level language although very interesting.
I'm getting offtopic but I totally agree with you! I have a System Programming language idea which treats every variable as a reference variable to get rid of the annoying value categories and value concept by using a unified variable access interface which allows for using different reference implementations for different optimization scenarios (like using registers to store and modify the referenced value).
On Thursday, 3 June 2021 at 19:36:32 UTC, sighoya wrote:
> On Thursday, 3 June 2021 at 17:26:03 UTC, Elmar wrote:
> @stack arr2 = dynamicArr.dup; // create a copy on stack, the stack is "scope"d
[...]
What if dup is creating things on the heap (I don't know by the way). You need to make the allocator dynamically scoped.
This was supposed to be a side idea unrelated to the main idea. dup
does allocate memory with GC, so you'd be right when we talk about annotating references, but the snippet here is supposed to inject the annotated allocation of a value definition into the RHS, i.e. dup
s internal implementation.
If you like I elaborate more on that idea:
Idea: when annotating value declarations it specifies where the return value or expression value of the RHS is allocated and thus, where the variable will be located in memory. This theoretical idea would give more control over the variable's allocation.
This idea is not odd because a small subset of such attributes for value variables IS already implemented in D, like static/global variables, automatic local variables (of course) and member scope variables in structs and classes. C also features @memory
(volatile
) and to some extend @register
(C99's restrict
is only close to it (which keeps pointer dereferenced values in registers for further dereferencing) or with language extensions to map variables to specific registers).
I thought, this would be not popular because it seems like D doesn't want to be too much an alternative for C++ Systems Programming and generalizing this concept seems like a bigger change. That's why this idea was only a side note.
@gc short opal = 3;
// eqv. to ref short opal = cast(ref short)GC.make!short(); opal = 5;
@newc int emerald = 5;
// eqv. to ref int emerald = cast(ref int)malloc(int.sizeof); emerald = 5;
@new float ruby = 8.;
// @new uses the "new" operator, which is not always dynamic allocation
@rc int amethyst = 13;
// reference counted, basically an abstraction over an underlying shared pointer
...
free(&emerald); // needed because @newc is not automatically managed
A benefit is that these variables still are used like values, i.e. they are passed by value or by reference depending on the function parameter type, although, physically, they are a reference of course (because everything is actually reference which is not stored in a register, variables on the call stack are referenced via the Stack Pointer for example).
Goal: The responsibility of allocation is shifted from the service, the callee, (which doesn't know about any concrete client's allocation needs) to the client, the caller, (which knows about it's own allocation needs and actually should know what it gets). GC has been introduced to remove the symptoms of this problem (memory management problems) without solving it (consequence: it gets used way more often than needed and is inefficient). The only way, it would be solved reasonably, is letting the caller side (LHS of assignment) deside what it needs, not the callee side (RHS of assignment), because the caller side has to handle it afterwards. A generic solution would be to use some kind of Dependency Injector which handles the allocation, uses the callee to initialize the value and passes it to the caller. It would turn those attributes into a powerful abstraction. A very easy implementation of the Dependency Injector is overriding theAllocator
while the RHS is computed.