Memory/Allocation attributes for variables

Memory/Allocation attributes for variables
May 30, 2021 Elmar
May 30, 2021 Ola Fosheim Grostad
May 31, 2021 Elmar
May 31, 2021 Ola Fosheim Grøstad
Jun 01, 2021 Elmar
Jun 01, 2021 Ola Fosheim Grostad
Jun 01, 2021 Ola Fosheim Grostad
Jun 03, 2021 Elmar
Jun 03, 2021 sighoya
Jun 06, 2021 Elmar
Jun 06, 2021 sighoya
Jun 04, 2021 Ola Fosheim Grøstad
Jun 05, 2021 Elmar
Jun 05, 2021 Ola Fosheim Grøstad

May 30, 2021

Posted by Elmar

Permalink

Elmar

Permalink

Hello dear D community,

personally I'm interested in security, memory safety and such although I'm not an expert.
I would like to know, if D has memory/allocation attributes (I will use both terms interchangeably), or if someone knows a library for that which statically asserts that only values allocated in compatible allocation regions are contained in variables of attributed type. Attributes for memory safety can be seen as extension to the type system because these memory attributes constrain the address domain of values instead of the value domain itself and so they become part of the value domain of a pointer or reference.

Now there are A LOT of different allocation regions and scopes for variables for whatever purpose:

static allocation
stack frame
dynamic allocation w/o GC
dynamic allocation with GC
fast allocators optimized for specific objects or even function-specific allocators
peripheral addresses
yes even register as allocation region (which allows a value to not be stored in RAM and thus being not easily overwritten, which is useful for security reasons like storing pointer encryption keys, stack canaries or to assign common registers to variables)
or memory-only allocation (which requires a value to not be stored/hold in registers)

Memory safety problems often boil down to that the program accidentally stores a pointer value into a variable being semantically out of bounds or pointing to a too small memory area w.r.t. the variable's purpose or scope. (Aliasing of variables in languages which better type-safety this probably impossible, typical is the case for unbounded data structures like C's variadic arguments or attacker-controlled variable-length arrays).

I don't know any other language yet which has allocation attributes for pointer/reference variables and allocation attributes for value-typed variables which restricts the allocation region for the data of the variable (some kind of contract on variable level). However, value-type variables are another story because they are allocated at the same time when defined and would only serve as expressive generalization of such attributes, generalized to value-types and even control structures.

Looking at what attributes D provides I definitly see that memory safety related concerns are addressed with existing attributes. But I personally find them rather unintuitive to use or difficult to learn for beginners and not flexible enough (like two different versions of return ref scope). As currently defined, those attributes don't annotate destinations of data flow but sources (like function arguments) of data flow.

What I imagine: More specific scopes (more specifically attributed types) or allocation regions correspond to more generalized types. Some scopes/allocation regions are contained within others (these smaller contained regions are virtually base types of bigger regions) and some regions are disjoint (but they should not intersect each other incompletely which is against the structured-programming paradigm and inheritance analogy in OOP). This results in Scope Polymorphy which statically/dynamically checks the address type of RHS expressions during assignments and memory safety becomes a special case of type safety.

I could annotate return values with attributes to make clear that a function returns GC-allocated memory, e.g. using a @gc attribute.

@gc string[] stringifyArray(T : U[], U)(T arr) {
    import std.algorithm.iteration : map;
    import std.conv : to;
    return arr.map!(value => value.to!string);
}
@nogc stringtable = stringifyArray([1, 2, 3]);    // error!

// a useless factory example
@new auto makeBoat(double length, double height, Color c) {
    theAllocator = Mallocator;

    auto b = new Boat(length, height, c);

    theAllocator = processAllocator;
    return b;
}

// combining multiple attributes gives a union of both which is valid for reference variables
@new @newcpp @gc Boat = makeBoat( ... );
// technically, a union of attributes for value types is possible but would
// require inferring the most appropriate attribute from context which is difficult

Variables with no attributes allow any pointer for assignment and will infer the proper attribute from the assignment.

Some of these use cases are already covered by existing attributes:

scope makes sure, that a reference variable is not written to an allocation region outside the current function block (which corresponds to using "@scope(function)" with the argument, see below) and it would be type-unsafe to assign it to a variable type with larger scope. "Scope" basically means, the argument belongs to a stack frame in the caller chain. (It corresponds to arguments annotated with "@caller", see below.) It's used to tell the function that the referenced value has a limited lifetime in a caller stack frame despite being a reference variable and the reference could become invalid after the function returns so it must not write the value to variables outside the function. For arguments this is very useful and I would rather prefer the complementary case to be explicit. That's where in is really useful as a short form.
ref specifies that the actual allocation region of a variable's value is outside of the function scope in which the variable is visible (or used). (out is similar.)
return ref specifies that the value (referenced by the returned reference) is in the same allocation scope as the argument annotated with return ref (corresponds to the annotation of the return type with "@scope(argName)", see below).
return ref scope, a combination of two above. The return type is seen to have the same allocation region equal to the one used by this annotated argument.
__gshared, shared. Variables with these attributes save them in a scope accessible across threads. This is the default in C so that __gshared corresponds to C's volatile values which are accepted by @memory references.

Here is a (really long) collection of many possible memory attributes I am looking for. They define which addresses of values are accepted for the pointer/reference:

@auto: allocation in any stack frame, which includes fixed-size value-type variables passed as arguments or return value
@stack: dynamic allocation in any stack frame (alloca)
@loop: allocation in the region which lives as long as the current loop lives
@scope(recursion): allocation-scope not yet available in D I believe, scope which lives as long as the entire recursion lives, equivalent to loop in the functional sense. Locals in this scope are accessible to all recursive calls of the same function.
@scope(function): allocation in the current stack frame (scoped arguments are a special case of this)
@scope(label): allocation in the scope of the labeled control structure
@scope(identifier): allocation in the same scope as the specified variable name, return ref can be seen as special case for return types.
@static: allocation/scope in static memory segment (lifetime over entire program runtime), static variables and control structures are a special case of this attribute
@caller: allocation in the caller's stack frame (usuable for convenient optimizations like shown below), an "implicit argument" when used for value types, corresponds to ref scope for reference-type variables. Something in between "static" and "auto".
@gc: allocation region managed by D's garbage collector
@nogc: disallows pointer/reference to GC-allocated data
@new: allocation region managed by Mallocator
@newcpp: allocation region managed by stdcpp allocator thing, eases C++ compatibilty
@peripheral: target- or even linkerscript-specific memory region for peripherals
@register: only stored in a register (with compile-time error if not possible)
@shared: allocation region for values which are synchronized between threads
@memory: never stored in a register (use case can overlap with "@peripheral", it's used for variables whose content can change non-deterministically and must be reloaded from memory each time, for example interrupt-handler modified variables, it also prevents optimization when unwanted)
@make(allocator): allocated by the given allocator (dynamic type check required, if "allocator" is a dynamic object)

In the basic version for reference variables these attributes statically/dynamically assert that a given pointer value is in bounds of that allocation region. Of course, this is a long list of personal ideas and some of them could be unpopular in the community. But I think, all of them would be a tribute to Systems programming.

Why are such attributes useful? At first because type-safe design means to restrict value domains as much as possible so that it is only as large as required. They restrict the address (pointer value) at which a value bounded by a variable can be located and provide additional static type checks as well as allocation transparency (something which I miss in every language I used so far). The good thing is, if no attribute is provided, it can be inferred from the location where the value-typed variable is defined or is inferred from the assigned pointer value for reference types.
Maybe also useful: with additional memory safety attributes, it could become legitimate to assign to scoped reference variables.

For reference-type variables, these attributes are simple value domain checks of the pointer variable. A disadvantage of memory attributes is (like with polymorphy) that runtime checks might be needed in some cases when static analysis isn't sufficient (if attributes are casted).

An interesting extension is a generalization to value-type variables. It can generalize the scope and return attribute to value-types. While probably not un-controversal it could allow fine control over variable allocation and force where a value-typed variable is allocated exactly (allocation guarantees). You could indirectly define a variable in a nested code block which is allocated for another scope. The main disadvantage I can think of is only, that it cannot be just created as a library add on.

outer: {
    // ...
    @scope(inner) uint key = generateKey(seed);  // precomputes the RHS
    // and initializes the LHS with the memorized value when entering the "inner" block
    seed.destroy();    // do something with seed, modify/destroy it, whatever
    // key is not available/accessible here
    // Message cipher;   // <-- implicit but uninitialized
    inner: if (useSecurity) {
        // if not entered, the init-value of the variable is used
        @scope(outer) Message cipher = encrypt(key);
        // Implicitly defines "cipher" uninitialized in "outer" scope.
        // Generates default init in all other control flow paths without @scope(outer) definition
    }
    //else cipher = Messsage.init;   // <-- implicit, actual initialization
    decrypt(cipher, key);    // error, key is only available in the "inner" scope
}

Some would criticize the unconventional visibility of cipher which doesn't follow common visibility rules. For example if static variables are defined in functions, they are still only visible in the function itself and not in the entire scope in which they live. So a likely improvement would be that the visibility is not impacted by the attribute, only the point of actual creation/destruction. Just looking at the previous example, it would seem useless at first, but it's not if loops are considered (and variables which have @loop scope, that means are created on loop entry and only destructed on loop exit).

Also interesting cases can emerge for additional user optimization in order to avoid costly recomputation by using a larger scope as allocation region:

double permeability(Vec2f direction) {
    @caller Vec2f grad = calculateTextureDerivative();
    // "grad" is a common setup shared by all calls to "permeability" from the same caller instance
    // It is hidden from the caller because it's an implementation detail of this function.
    // All calls of "grad" by the same caller will use the same variable.
    // It would be implemented as invisible argument whose initialization
    // happens in the caller. The variable is stored on the caller's site as
    // invisible variable and is passed with every call.
    return scalprod(direction, grad);
}

A main benefit of this feature is readability and in some cases optimization because the executed function is not repeated for every call, only if the repetition is needed which can be computed in the callee instead.
For closures the @caller scope is clear but it also works for non-closure functions as an invisible argument. Modifications to a @caller ref variable are remembered for consecutive calls from the same caller stack frame whereas @caller without ref maybe only modifies a local copy.

Or being able to create Arrays easily on the stack which is yet a further extension

@auto arr1 = [0, 1, 2, 3];  // asserts fixed-size, okay, but variable size would fail
@stack arr2 = dynamicArr.dup;   // create a copy on stack, the stack is "scope"d

An easy but probably limited implementation would set theAllocator before the initialization of such an attributed value-type variable and resets theAllocator afterwards to the allocator from before.

Finally, one could even more generally annotate control structures with attributes to define in whose scope's entry the control structure's arguments are evaluated (e.g. static if is a special case which represents @static if in terms of attributes) but this yet another different story and unrelated to allocation.

This is it, I'm sorry for the long post. It took me a while to write it down and reread.
Regards!

May 30, 2021

Re: Memory/Allocation attributes for variables

Posted by Ola Fosheim Grostad
in reply to Elmar

Permalink

Ola Fosheim Grostad

Posted in reply to Elmar

Permalink

On Sunday, 30 May 2021 at 02:18:38 UTC, Elmar wrote:

I agree that D has jumped down the rabbithole in terms of usability and function signatures are becoming weirder. The reusage of the term "return" is particularly bad.

To a large extent this is the aftermath that comes from changing course when it went from D1 to D2. Where simplicity was sacrificed and it was opened for more and more complexity. Once a language becomes complex, it seems difficult to prevent people from adding just-one-more-feature that adds to the complexity. Also, since experienced users influence the process most... There are nobody to stop it.

The main issue is however not specifying where it allocates, but keeping track of it when pointers are put into complex datastructures.

May 31, 2021

Re: Memory/Allocation attributes for variables

Posted by Elmar
in reply to Ola Fosheim Grostad

Permalink

Elmar

Posted in reply to Ola Fosheim Grostad

Permalink

On Sunday, 30 May 2021 at 05:13:45 UTC, Ola Fosheim Grostad wrote:

On Sunday, 30 May 2021 at 02:18:38 UTC, Elmar wrote:

I agree that D has jumped down the rabbithole in terms of usability and function signatures are becoming weirder. The reusage of the term "return" is particularly bad.

The main issue is however not specifying where it allocates, but keeping track of it when pointers are put into complex datastructures.

Thank you for your reply. Also sorry for the wordyness, I'm just awkwardly detailed sometimes.

In your case it's not what I was thinking. I would count myself to the sophisticated programmers (but not the productive ones, unfortunately). I can cope with all those reused keywords even though I think at this place their design is unintuitive to use. Intuitive would be annotation of the return type because the aliasing is a property of the return type, not the argument. At least I feel like I understood the sparsely explained intension behind the current scope-related attributes but my main point is, I find they can be improved with more expressiveness. It would give programmers a hint of what kind of allocated argument is acceptable for a parameter. And no, this is not trivial. It's the reason for my decision to start this thread:

Functions in phobos accept range iterators of fixed-sized arrays as range argument but even if it fails miserably, it compiles happily and accesses illegal memory without any warning, creating fully non-deterministic results with different compilers. I noticed this when I tried to use "map" with fixed-size arrays. It simply misses any tool to check and signal that fixed-size arrays are illegal as range argument for "map". And sometimes mapping onto fixed-size arrays even works.

Without better memory safety tools, I'd discourage more memory efficient programming techniques in D although I'd really like to see D for embedded and resource constrained systems to replace C.

I wonder how programming languages don't see the obvious, to consider memory safety as a part of type safety (address/allocation properties to be type properties) and that memory unsafe code only means an incomplete type system. I also don't know whether conventional "type safety" in programming languages suffices to eliminate the possibility of deadly programming bugs (aliased reference variables e.g.). But of course, security and safety is complex and there is no way around complexity to make safe code flexible.

The important part is the first one (without generalization to allocation and control structures which I only mentioned as an interesting thought) because I think it's an easy but effective addition. D already has features in that direction which is good, the awareness exists, but it's still weak at some points. My post should be seen as a collection of ideas and a request for comment (because maybe my ideas are totally bad or don't fit D) rather than a request to implement all this. The main point is to consider references/pointers as values with critical type safety which means a way to specify stricter constraints. Memory safety is violated by storing a pointer value in a reference which is out of the intended/reasonable value domain of the pointer (not matching its lifetime).

If someone already thought the same like me, there could be a safepointer-like user library which supports additional attributes (which represent restricted pointer domains) by implementing a custom pointer/reference type. (It's not a smart pointer because smart pointers try to fix the problem at the other end and require dynamic allocation which is not that nice.) Due to D's nature, it would support safe pointers and safe references (reference variables) and provides static and dynamic type checks with overloaded operators and memory attributes. Attributes couldn't be inferred automatically I guess but annotation of variables could entirely allow static memory safety checks (which doesn't need to explicitly test whether a pointer value is contained in a set of allowed values) and maybe prevents bugs or unwanted side effects.

One important aspect which I forgot: aliasing of variables. I know, D allows aliased references as arguments by default. Many memory safety problems derive from aliased variables which were not assumed to be aliased. Aliased variables complicate formal verification of code and confuse people. I would add @alias(symbol) to my collection which indicates that a reference explicitly aliases (overlap) another reference in memory or a @noalias(symbol).

If someone thinks, I heavily missed something, please let me know.

May 31, 2021

Re: Memory/Allocation attributes for variables

Posted by Ola Fosheim Grøstad
in reply to Elmar

Permalink

Ola Fosheim Grøstad

Posted in reply to Elmar

Permalink

On Monday, 31 May 2021 at 18:21:26 UTC, Elmar wrote:

All high level programming-languages do. Only the low level don't, and that is one of the things what makes their type systems unsound.

constraints. Memory safety is violated by storing a pointer value in a reference which is out of the intended/reasonable value domain of the pointer (not matching its lifetime).

But how do you keep track of it without requiring that all graphs are acyclic? No back pointers is too constraining.

And no, Rust does not solve this. Reference counting does not solve this. How do you prove that a graph remains fully connected when you change one pointer?

So, how do you know that you don't have aliasing when you provide pointers to two graphs? How do you prove that none of the nodes in the graph are shared?

June 01, 2021

Re: Memory/Allocation attributes for variables

Posted by Elmar
in reply to Ola Fosheim Grøstad

Permalink

Elmar

Posted in reply to Ola Fosheim Grøstad

Permalink

Good questions :-) .

On Monday, 31 May 2021 at 18:56:44 UTC, Ola Fosheim Grøstad wrote:

On Monday, 31 May 2021 at 18:21:26 UTC, Elmar wrote:

All high level programming-languages do. Only the low level don't, and that is one of the things what makes their type systems unsound.

I suppose you mean the "higher" level languages (because C is by original definition also a high-level language). I neither know any "higher" level language which provides the flexibility of constraining the value domain of a pointer/reference except for restricting null (non-nullable pointers are probably the most simple domain constraint for pointers/references). I think, not even Ada nor VHDL have it.

The thing I'd like to gain with those attributes is a guarantee, that the referenced value wasn't allocated in a certain address region/scope and lives in a lifetime-compatible scope which can be detected by checking the pointer value against an interval or a range of intervals. For example a returned reference to an integer could have been created with "malloc" or even a C++ allocator or interfacing functions could annotate parameters with such attributes.

With guarantees about the scope of arguments function implementations can avoid buggy reference assignments to outside variables. The function could expect compatible references allocated with GC but the caller doesn't know it. Whether any reference variable assignment is legitimate can be checked by comparing the source attributes (the reference value which says where the value is allocated) with the destination attributes (where the reference is stored in memory). Even better are runtime checks of pointer values for a better degree of memory safety but only if the programmers want to use it. A reference assignment is legitimate if the destination scope is compatible with the source's scope, not in any other case. I would suggest a lifetime rating for value addresses as follows:

peripheral > system/kernal > global shared > private global (TLS) > extern global (TLS) > shared GC allocated > shared dynamically allocated > GC allocated (TLS) > dynamically allocated (TLS) <=> RAII/scoped/stack <=> RAII/scoped/stack > register

Heap regions are not always comparable to stack or RAII. So the current practice of not allowing assignment to RAII references (using scope attribute) is probably best to continue. Everything other than stack addresses are seen as one single lifetime region with equal lifetime. The comparison between stack addresses assumes that an address deeper in the stack has a higher or equal lifetime. The caller could also provide it's stack frame bounds which allows to consider this interval as one single lifetime.

It should constrain the possible value domain of pointers absolutely so that no attack with counterfeited pointers to certain memory addresses is possible. If I would use custom allocators for different types I could expect or delimit what the pointer value can be.

On Monday, 31 May 2021 at 18:56:44 UTC, Ola Fosheim Grøstad wrote:

> >

constraints. Memory safety is violated by storing a pointer value in a reference which is out of the intended/reasonable value domain of the pointer (not matching its lifetime).

But how do you keep track of it without requiring that all graphs are acyclic? No back pointers is too constraining.

And no, Rust does not solve this. Reference counting does not solve this. How do you prove that a graph remains fully connected when you change one pointer?

I think, this is GC-related memory management, not type checking. The memory attributes don't solve memory management problems. The problem with reference counting usually is solved by inserting weak pointers into cycles (which also solves the apparent contradiction of a cycle of references). Weak references are used by those objects which are deeper in the graph of data links. Otherwise it's a code smell and one could refactor the links into a joint object and deleted objects will deregister in this joint object. I already thought about other allocation schemes for detecting cycles that could be combined with reference counting. For example tagging structs/classes with the ID of the conntected graph in which they are linked if they aren't leaves. But this ID is difficult to change. It can also analyze at compile time which pointers can only be part of a cycle but more explanation leads to far here.

Instead the problem, my idea is intended to solve, is

giving hints to programmers (to know which kind of allocated memory works with the implementation, stack addresses apparently won't generally work with map for example)
having static or dynamic (simple) value domain checks (which checks whether a pointer value is in the allowed interval(s) of the allocation address spaces belonging to the attributes) which ensures that only allowed types of allocation are used. These checks can be used to statically or dynamically dispatch functions. Of course such a check could also be performed manually but it's tedious and requires me to put all different function bodies in one static if else.

It's more of a lightweight solution and works like an ordinary type check (value-in-range check).

Where the feature shines most is function signatures because they separate code and create intransparency which can be countered by memory attributes for return type and argument types.

On Monday, 31 May 2021 at 18:56:44 UTC, Ola Fosheim Grøstad wrote:

> >

So, how do you know that you don't have aliasing when you provide pointers to two graphs? How do you prove that none of the nodes in the graph are shared?

Okay, I didn't define aliasing. With "aliasing" I mean that "aliasing references" (or pointers) either point to the exact same address or that the immediately pointed class/struct (pointed to by the reference/pointer) does not overlap. I would consider anything else more complicated than necessary. The definition doesn't care about further indirections. I often only consider the directly pointed struct or class contiguous chunk of memory as "the type". If I code a function, I'm usually only interested in the top level of the type (the "root node" of the type) and further indirections are handled by nested function calls. For example it suffices, if two argument slices are not overlapping. For that I only need to check aliasing as just defined. If you really would like two arguments (graphs) to not share any single pointer value I would suggest using a more appropriate type than a memory attribute, a type which is recursively "unique" (in terms of only using "unique pointers").

Do you think, it sounds like a nice idea to have a data structure attribute unique next to abstract and final which recursively guarantees that any reference or pointer is a unique pointer?

If you are interested for a algorithmic answer to your questions, then the best approach (I quickly can think of) is creating an appropriate hash table from all pointers in one graph and testing all pointers in the other graph against it (if I cannot use any properties on the pointers' values, e.g. that certain types and all indirections are allocated in specific pools). But that only works with exactly equal pointer values.

June 01, 2021

Re: Memory/Allocation attributes for variables

Posted by Ola Fosheim Grostad
in reply to Elmar

Permalink

Ola Fosheim Grostad

Posted in reply to Elmar

Permalink

On Tuesday, 1 June 2021 at 00:36:17 UTC, Elmar wrote:

On Monday, 31 May 2021 at 18:56:44 UTC, Ola Fosheim Grøstad wrote:
I suppose you mean the "higher" level languages (because C is by original definition also a high-level language).

Yes, I mean system level language vs proper high level languages that abstract away the hardware. "Low level" is not correct, but the usage of the term "system level" tend to lead to debates in these fora as there is a rift between low level and high level programmers...

Well, I guess you are new, but Walter will refuse having many pointer types. Even the simple distinction between gc and raw pointers will be refused. The reason being that it would lead to a combinatorial explosion of function instances and prevent separate compilation.

So for D, this is not a probable solution.

That means you cannot do it through the regular type system, so that means you will have to do shape analysis of datastuctures.

I personally have in the past argued that it would be an interesting experiment to make all functions templates and template pointer parameter types.

That you can do with library pointer types, as a proof of concept, yourself. Then you will see what the effect is.

lifetime region with equal lifetime. The comparison between stack addresses assumes that an address deeper in the stack has a higher or equal lifetime. The caller could also provide it's stack frame bounds which allows to consider this interval as one single lifetime.

How about coroutines? Now you have multiple stacks.

No, depth does not work, you could define an acyclic graph of owning pointers and then use weak pointers elsewhere. This restricts modelling and algorithms. So compiler verified restriction of non-weak references might be too restrictive?

So basically whenever changing a non-weak reference the compiler has to prove that the graph still is non-weak acyclic. Maybe possible, but does not sound trivial.

having static or dynamic (simple) value domain checks (which checks whether a pointer value is in the allowed interval(s) of the allocation address spaces belonging to the attributes) which ensures that only allowed types of allocation are used. These checks can be used to statically or dynamically dispatch functions. Of course such a check could also be performed manually but it's tedious and requires me to put all different function bodies in one static if else.

Dynamic checks are unlikely to be accepted, I suggest you do this as a library.

Where the feature shines most is function signatures because they separate code and create intransparency which can be countered by memory attributes for return type and argument types.

Unfortunately, this is also why it will be rejected.

Insufficient for D with library container types and library smart pointers.

Do you think, it sounds like a nice idea to have a data structure attribute unique next to abstract and final which recursively guarantees that any reference or pointer is a unique pointer?

Yes, some want isolated pointers, but you have to do all this stuff as library smart pointers in D.

June 01, 2021

Re: Memory/Allocation attributes for variables

Posted by Ola Fosheim Grostad
in reply to Ola Fosheim Grostad

Permalink

Ola Fosheim Grostad

Posted in reply to Ola Fosheim Grostad

Permalink

On Tuesday, 1 June 2021 at 06:12:05 UTC, Ola Fosheim Grostad wrote:

No, depth does not work, you could define an acyclic graph of owning pointers and then use weak pointers elsewhere.

What I meant here is that depth would be too restrictive as it would prevent reasonable insertions nodes.

June 03, 2021

Re: Memory/Allocation attributes for variables

Posted by Elmar
in reply to Ola Fosheim Grostad

Permalink

Elmar

Posted in reply to Ola Fosheim Grostad

Permalink

Thank you for answering.

On Tuesday, 1 June 2021 at 06:12:05 UTC, Ola Fosheim Grostad wrote:
...

> >

The separate compilation is a good point. Binary compatibility is a common property considered for security safeguards. But at least static checking with attributes would need no memory addresses at all (also if the compiler can infer the attribute for every value-typed variable automatically from where it is defined). Dynamic checks of pointers accross binary interfaces are difficult. It would work flawlessly with library-internal memory regions but for outside pointer values it can only rely on runtime information (memory regions used by allocators) or cannot perform checks at all (because it doesn't know the address ranges to check against). Or it would work better if binaries would support relocations for application-related memory addresses which are filled at link time. Static checks strike the balance here.

I personally have in the past argued that it would be an interesting experiment to make all functions templates and template pointer parameter types.

That you can do with library pointer types, as a proof of concept, yourself. Then you will see what the effect is.

Okay, that's fine. Pointers in D are not debatable, I would not try. I think, any new language should remove the concept of pointers entirely rather than introducing new pointers. Pointers from C should be treated as reference variables, pointers to C as either an unbounded slice (if bounded, there should be another size_t argument to the function) or it passes addresses obtained from variables. As a C programmer I'd say that C's pointer concept was never needed as it stands, it just was created to be an unsafe reference variable + a reference + an iterator all-in-one-solution as the simplest generic thing which beats it all (without knowing the use case by looking at the pointer type).

Attributes only would check properness of pointer value assignments without code duplication of the function as auto ref is doing. (One can still interprete it as part of the type.)

On Tuesday, 1 June 2021 at 06:12:05 UTC, Ola Fosheim Grostad wrote:

> >

How about coroutines? Now you have multiple stacks.

Thanks, I missed that, at least true coroutines have. Other things also can dissect stack-frame memory (function-specific allocators in the stack-region). But in our case, it's already a question whether such special stack frames still should be allocated in the stack-region, statically (as I implemented it once for C) or in a heap region (like stack frames of continuations). You could at least place coroutine stack frames in some allocator region in static memory.

A probably less fragile but more costly solution (when checking stack-addresses) for stack address scope would be storing the stack depth of an address in the upper k-bit portion of a wide pointer value (for a simple check) but this is only a further unrelated idea.

Dynamic checks are unlikely to be accepted, I suggest you do this as a library.

Right, if nobody tried it so far I'd like myself. Then I can firm my D experience with further practice. I'd compare the nature of static and dynamic attribute checks to the nature of C++ static_cast and dynamic_cast of class pointers. I was thinking, such a user library could use __traits with templated operator overloads.

> >

Where the feature shines most is function signatures because they separate code and create intransparency which can be countered by memory attributes for return type and argument types.

Unfortunately, this is also why it will be rejected.

So, is that D's tenor that function signatures are thought to create intransparency and should continue to do so? Does the community think, allocation and memory transparency is a bad thing or just not needed? IMO, allocation and memory transparency is relevant to being a serious Systems programming language (even though C doesn't have it, C++ doesn't have it and C# is no Systems Programming :-D ). Isn't the missing memory transparency from outside of functions the reason why global variables are frowned upon by many? Related to referential transparency (side effects), less transparency makes programs harder to debug, decouple and APIs harder to use right. (Just the single map issue with fixed-size arrays...)

> >

Insufficient for D with library container types and library smart pointers.

Yeah. It makes no sense if we consider the pointer layers between the exposed pointer and the actual data (I assume, smart pointers in D are implemented with such a middle layer in between). But if it only means the first payload data layer that represents the actual root node of any graph-like data structure, is it still flawed? At least, if I can annotate all pointer variables in my data structures and if checks are done for every single reference/pointer assignment with any access so that no pointer value range in the entire structure ever becomes violated, isn't it closer to memory safety than without? Of course, I could still pass references to those pointers to a binary which write into it without knowing any type information but that's a deliberate risk which static type checking cannot mitigate, only dynamic value checking of the pointed data after function return. (Probably another useful safety feature for my idea.)

Of course attributes are optional, nobody has to annotate anything with the risk of obtaining falsely scoped pointer values.

But would you agree, it would be better than not having it? Of course, it doesn't make everything safe, particularly if one can omit it but annotating variables with attributes could help with ownership (I think in a better design than Walter's proposal of yet another function attribute @live instead of a variable attribute). With ownership I mean to prevent leakage of (sensible) data out of a function (not just reference values as with scope) and could provide some sanity checks and even provide more transparency for API use (because then I can see what kind of allocated memory I can expect for parameters and return value). I think, it could improve interfacing with C++ as well.
At the end, I only want certainty about the references and pointers when I look into a function signature.

I probably should (try to) implement it myself as a proof of concept.

Regards, Elmar

June 03, 2021

Re: Memory/Allocation attributes for variables

Posted by sighoya
in reply to Elmar

Permalink

sighoya

Posted in reply to Elmar

Permalink

On Thursday, 3 June 2021 at 17:26:03 UTC, Elmar wrote:

@stack arr2 = dynamicArr.dup;   // create a copy on stack, the stack is "scope"d

An easy but probably limited implementation would set theAllocator before the initialization of such an attributed value-type variable and resets theAllocator afterwards to the allocator from before.

What if dup is creating things on the heap (I don't know by the way). You need to make the allocator dynamically scoped.

@register: only stored in a register (with compile-time error if not possible)

Only if you have complete control over the backend.

Beside the combinatorial explosion in the required logic to check for, what happens if we copy/moving data between different memory annotated variables, e.g. nogc to gc, newcpp to gc.
Did we auto copy, cast or throw an error. If we do not throw an error, an annotation might not only restrict access but also change semantics by introducing new references.
So annotations become implied actions, that can be ok but is eventually hard to accept for the current uses of annotations.

Does the community think, allocation and memory transparency is a bad thing or just not needed? IMO, allocation and memory transparency is relevant to being a serious Systems programming language (even though C doesn't have it, C++ doesn't have it and C# is no Systems Programming :-D ).

There is no such thing as memory transparency, strictly speaking, even if you want to allocate things on the stack, what is if your backend doesn't have a stack at all? Or we just rename the heap to stack?

In the end, we aren't that better what C or high level languages do, we have some heuristic though that our structures map better to the underlying hardware.

As a C programmer I'd say that C's pointer concept was never needed as it stands, it just was created to be an unsafe reference variable + a reference + an iterator all-in-one-solution as the simplest generic thing which beats it all (without knowing the use case by looking at the pointer type).

Well, I think having both is problematic/complex. But C has only one of those and C++ has both.
It's not quite correct what arrays belong, so that's a mistake.

I think, any new language should remove the concept of pointers entirely rather than introducing new pointers.

Why not removing the distinction between values and references/pointers at all? But I think it drifts to hard in a logically high level language and isn't the right way to go in a system level language although very interesting.

Annotations seam to be neat, but they parametrize your code:

@allocator("X"), @lifetime("param1", "greather", "param2") void f(Type1 param1, Type2 param2)

becomes

void f(Allocator X,Lifetime lifetime(param1), Lifetime lifetime(param2))(Type1 param1, Type2 param2) if currentAllocator=X && lifetime(param1)>=lifetime(param2) {...}

which literally turns every function allocating something into a template increasing "templatism" unless we get runtime generics as Swift.

To summarize, I find these improvements interesting, but

doesn't feel system level
at all possible in a 20 years old language?

Some Info: Rust distracts me in the point being a mix of high level with low level. They have values and references for their ownership/borrowing system but then also custom pointer types which doesn't interact well with the former.

June 04, 2021

Re: Memory/Allocation attributes for variables

Posted by Ola Fosheim Grøstad
in reply to Elmar

Permalink

Ola Fosheim Grøstad

Posted in reply to Elmar

Permalink

On Thursday, 3 June 2021 at 17:26:03 UTC, Elmar wrote:

On Tuesday, 1 June 2021 at 06:12:05 UTC, Ola Fosheim Grostad wrote:
The separate compilation is a good point. Binary compatibility is a common property considered for security safeguards. But at least static checking with attributes would need no memory addresses at all (also if the compiler can infer the attribute for every value-typed variable automatically from where it is defined).

I don't think separate compilation is a good point. I think a modern language should mix object-file and IR-file linking with caching for speed improvements.

Nevertheless, it is being used as an argument for not making D a better language. So, that is what you are up against.

D isn't really "modern". It is very much in the C-mold, like C++. It has taken on too many of C++'s flaws. For instance it kept underperforming exceptions instead of making them fast.

passes addresses obtained from variables. As a C programmer I'd say that C's pointer concept was never needed as it stands, it just was created to be an unsafe reference variable + a reference + an iterator all-in-one-solution as the simplest generic thing which beats it all (without knowing the use case by looking at the pointer type).

C is mostly an abstraction over common machine language instructions. Making a non-optimizing C backend perform reasonably well for handcrafted C-code.

C pointers do have a counterpart in C++ STL iterators though. So, one could argue that C-pointers are memory-iterators.

Sounds like a fun project.

(D, as the languages stands, encourages the equivalent of reinterpret_cast, so there is that.)

Let us not confuse community with creators. :). Also, let us not assume that there is a homogeneous community.

So, you have the scripty-camp who are not bothered by the current GC and don't really deal with memory allocations much. Then there is the other camp.

As one of those in the other camp, I think that the compiler should do the memory management and be free to optimize. So I am not fond of things like "scope". I think they are crutches. I think the language is becoming arcane by patching it up here and there instead of providing a generic solution.

I probably should (try to) implement it myself as a proof of concept.

The best option is to just introduce a custom pointer-library, like in C++, that tracks what you want it to track.

Don't bother with separate compilation issues. Just template all functions. I think LDC will remove duplicates if the bodies of two functions turn into the same machine code?

Then you get a feeling for what it would be like.

Top | Forum index | About this forum

Forums