Hello dear D community,
personally I'm interested in security, memory safety and such although I'm not an expert.
I would like to know, if D has memory/allocation attributes (I will use both terms interchangeably), or if someone knows a library for that which statically asserts that only values allocated in compatible allocation regions are contained in variables of attributed type. Attributes for memory safety can be seen as extension to the type system because these memory attributes constrain the address domain of values instead of the value domain itself and so they become part of the value domain of a pointer or reference.
Now there are A LOT of different allocation regions and scopes for variables for whatever purpose:
- static allocation
- stack frame
- dynamic allocation w/o GC
- dynamic allocation with GC
- fast allocators optimized for specific objects or even function-specific allocators
- peripheral addresses
- yes even register as allocation region (which allows a value to not be stored in RAM and thus being not easily overwritten, which is useful for security reasons like storing pointer encryption keys, stack canaries or to assign common registers to variables)
- or memory-only allocation (which requires a value to not be stored/hold in registers)
Memory safety problems often boil down to that the program accidentally stores a pointer value into a variable being semantically out of bounds or pointing to a too small memory area w.r.t. the variable's purpose or scope. (Aliasing of variables in languages which better type-safety this probably impossible, typical is the case for unbounded data structures like C's variadic arguments or attacker-controlled variable-length arrays).
I don't know any other language yet which has allocation attributes for pointer/reference variables and allocation attributes for value-typed variables which restricts the allocation region for the data of the variable (some kind of contract on variable level). However, value-type variables are another story because they are allocated at the same time when defined and would only serve as expressive generalization of such attributes, generalized to value-types and even control structures.
Looking at what attributes D provides I definitly see that memory safety related concerns are addressed with existing attributes. But I personally find them rather unintuitive to use or difficult to learn for beginners and not flexible enough (like two different versions of return ref scope
). As currently defined, those attributes don't annotate destinations of data flow but sources (like function arguments) of data flow.
What I imagine: More specific scopes (more specifically attributed types) or allocation regions correspond to more generalized types. Some scopes/allocation regions are contained within others (these smaller contained regions are virtually base types of bigger regions) and some regions are disjoint (but they should not intersect each other incompletely which is against the structured-programming paradigm and inheritance analogy in OOP). This results in Scope Polymorphy which statically/dynamically checks the address type of RHS expressions during assignments and memory safety becomes a special case of type safety.
I could annotate return values with attributes to make clear that a function returns GC-allocated memory, e.g. using a @gc attribute.
@gc string[] stringifyArray(T : U[], U)(T arr) {
import std.algorithm.iteration : map;
import std.conv : to;
return arr.map!(value => value.to!string);
}
@nogc stringtable = stringifyArray([1, 2, 3]); // error!
// a useless factory example
@new auto makeBoat(double length, double height, Color c) {
theAllocator = Mallocator;
auto b = new Boat(length, height, c);
theAllocator = processAllocator;
return b;
}
// combining multiple attributes gives a union of both which is valid for reference variables
@new @newcpp @gc Boat = makeBoat( ... );
// technically, a union of attributes for value types is possible but would
// require inferring the most appropriate attribute from context which is difficult
Variables with no attributes allow any pointer for assignment and will infer the proper attribute from the assignment.
Some of these use cases are already covered by existing attributes:
scope
makes sure, that a reference variable is not written to an allocation region outside the current function block (which corresponds to using "@scope(function)
" with the argument, see below) and it would be type-unsafe to assign it to a variable type with larger scope. "Scope" basically means, the argument belongs to a stack frame in the caller chain. (It corresponds to arguments annotated with "@caller
", see below.) It's used to tell the function that the referenced value has a limited lifetime in a caller stack frame despite being a reference variable and the reference could become invalid after the function returns so it must not write the value to variables outside the function. For arguments this is very useful and I would rather prefer the complementary case to be explicit. That's wherein
is really useful as a short form.ref
specifies that the actual allocation region of a variable's value is outside of the function scope in which the variable is visible (or used). (out
is similar.)return ref
specifies that the value (referenced by the returned reference) is in the same allocation scope as the argument annotated withreturn ref
(corresponds to the annotation of the return type with "@scope(argName)
", see below).return ref scope
, a combination of two above. The return type is seen to have the same allocation region equal to the one used by this annotated argument.__gshared
,shared
. Variables with these attributes save them in a scope accessible across threads. This is the default in C so that__gshared
corresponds to C's volatile values which are accepted by@memory
references.
Here is a (really long) collection of many possible memory attributes I am looking for. They define which addresses of values are accepted for the pointer/reference:
- @auto: allocation in any stack frame, which includes fixed-size value-type variables passed as arguments or return value
- @stack: dynamic allocation in any stack frame (alloca)
- @loop: allocation in the region which lives as long as the current loop lives
- @scope(recursion): allocation-scope not yet available in D I believe, scope which lives as long as the entire recursion lives, equivalent to
loop
in the functional sense. Locals in this scope are accessible to all recursive calls of the same function. - @scope(function): allocation in the current stack frame (
scope
d arguments are a special case of this) - @scope(label): allocation in the scope of the labeled control structure
- @scope(identifier): allocation in the same scope as the specified variable name,
return ref
can be seen as special case for return types. - @static: allocation/scope in static memory segment (lifetime over entire program runtime),
static
variables and control structures are a special case of this attribute - @caller: allocation in the caller's stack frame (usuable for convenient optimizations like shown below), an "implicit argument" when used for value types, corresponds to
ref scope
for reference-type variables. Something in between "static" and "auto". - @gc: allocation region managed by D's garbage collector
- @nogc: disallows pointer/reference to GC-allocated data
- @new: allocation region managed by Mallocator
- @newcpp: allocation region managed by stdcpp allocator thing, eases C++ compatibilty
- @peripheral: target- or even linkerscript-specific memory region for peripherals
- @register: only stored in a register (with compile-time error if not possible)
- @shared: allocation region for values which are synchronized between threads
- @memory: never stored in a register (use case can overlap with "
@peripheral
", it's used for variables whose content can change non-deterministically and must be reloaded from memory each time, for example interrupt-handler modified variables, it also prevents optimization when unwanted) - @make(allocator): allocated by the given allocator (dynamic type check required, if "allocator" is a dynamic object)
In the basic version for reference variables these attributes statically/dynamically assert that a given pointer value is in bounds of that allocation region. Of course, this is a long list of personal ideas and some of them could be unpopular in the community. But I think, all of them would be a tribute to Systems programming.
Why are such attributes useful? At first because type-safe design means to restrict value domains as much as possible so that it is only as large as required. They restrict the address (pointer value) at which a value bounded by a variable can be located and provide additional static type checks as well as allocation transparency (something which I miss in every language I used so far). The good thing is, if no attribute is provided, it can be inferred from the location where the value-typed variable is defined or is inferred from the assigned pointer value for reference types.
Maybe also useful: with additional memory safety attributes, it could become legitimate to assign to scope
d reference variables.
For reference-type variables, these attributes are simple value domain checks of the pointer variable. A disadvantage of memory attributes is (like with polymorphy) that runtime checks might be needed in some cases when static analysis isn't sufficient (if attributes are casted).
An interesting extension is a generalization to value-type variables. It can generalize the scope
and return
attribute to value-types. While probably not un-controversal it could allow fine control over variable allocation and force where a value-typed variable is allocated exactly (allocation guarantees). You could indirectly define a variable in a nested code block which is allocated for another scope. The main disadvantage I can think of is only, that it cannot be just created as a library add on.
outer: {
// ...
@scope(inner) uint key = generateKey(seed); // precomputes the RHS
// and initializes the LHS with the memorized value when entering the "inner" block
seed.destroy(); // do something with seed, modify/destroy it, whatever
// key is not available/accessible here
// Message cipher; // <-- implicit but uninitialized
inner: if (useSecurity) {
// if not entered, the init-value of the variable is used
@scope(outer) Message cipher = encrypt(key);
// Implicitly defines "cipher" uninitialized in "outer" scope.
// Generates default init in all other control flow paths without @scope(outer) definition
}
//else cipher = Messsage.init; // <-- implicit, actual initialization
decrypt(cipher, key); // error, key is only available in the "inner" scope
}
Some would criticize the unconventional visibility of cipher
which doesn't follow common visibility rules. For example if static
variables are defined in functions, they are still only visible in the function itself and not in the entire scope in which they live. So a likely improvement would be that the visibility is not impacted by the attribute, only the point of actual creation/destruction. Just looking at the previous example, it would seem useless at first, but it's not if loops are considered (and variables which have @loop
scope, that means are created on loop entry and only destructed on loop exit).
Also interesting cases can emerge for additional user optimization in order to avoid costly recomputation by using a larger scope as allocation region:
double permeability(Vec2f direction) {
@caller Vec2f grad = calculateTextureDerivative();
// "grad" is a common setup shared by all calls to "permeability" from the same caller instance
// It is hidden from the caller because it's an implementation detail of this function.
// All calls of "grad" by the same caller will use the same variable.
// It would be implemented as invisible argument whose initialization
// happens in the caller. The variable is stored on the caller's site as
// invisible variable and is passed with every call.
return scalprod(direction, grad);
}
A main benefit of this feature is readability and in some cases optimization because the executed function is not repeated for every call, only if the repetition is needed which can be computed in the callee instead.
For closures the @caller
scope is clear but it also works for non-closure functions as an invisible argument. Modifications to a @caller ref
variable are remembered for consecutive calls from the same caller stack frame whereas @caller
without ref maybe only modifies a local copy.
Or being able to create Arrays easily on the stack which is yet a further extension
@auto arr1 = [0, 1, 2, 3]; // asserts fixed-size, okay, but variable size would fail
@stack arr2 = dynamicArr.dup; // create a copy on stack, the stack is "scope"d
An easy but probably limited implementation would set theAllocator
before the initialization of such an attributed value-type variable and resets theAllocator
afterwards to the allocator from before.
Finally, one could even more generally annotate control structures with attributes to define in whose scope's entry the control structure's arguments are evaluated (e.g. static if
is a special case which represents @static if
in terms of attributes) but this yet another different story and unrelated to allocation.
This is it, I'm sorry for the long post. It took me a while to write it down and reread.
Regards!