Literal types

June 25

Literal types

Posted by Quirin Schroll

Permalink

Quirin Schroll

Permalink

D’s array literals (seem to) have an unofficial type and an official type. What I mean by that is, [1, 2] has the official type int[] (if you do typeof([1, 2]) it says int[]), but unofficially, it can do a bunch of things a int[] generally doesn’t support. You can assign the literal to immutable int[] because it’s unique. You can assign it to ubyte[] because VRP can prove each value entry is within the bounds of ubyte. And you can assign it to int[2] because the length is right and it won’t heap allocate. The last thing is important because it’s not a mere optimization that’s optional, it works in @nogc, which means this is specified.

I don’t know if I’m getting this 100% correct, but saying [1, 2] is an int[] isn’t the full story since it’s more like an int[2] that “decays” to a int[] (which usually ends up on the heap) unless you “catch” it early enough. Catching such a literal isn’t too difficult, the staticArray function does it.

In a similar fashion, numeric literals decay: typeof(1) is int, but knowing something is 1 is so much more concrete than knowing it’s some int.

Add __typeof that returns a non-decayed type, one that has all the information the compiler officially retains, and add the respective results for this operator. That means, with x some run-time int variable, __typeof([1, 2, x]) isn’t int[]. It’s something like __array!(__integer!(int, 1), __integer!(int, 2), int). Lastly, add a parameter storage class __nodecay or @nodecay that only matters when the type of the parameter is a template type parameter and is inferred; then, inference does not decay the type before binding the type parameter.

This would enable low-level functionality where one can special-case certain values, e.g. giving a function a specialized overload with a parameter type of __typeof(0) that’s only matched by the constant 0 or int(0) or an enum of type int with value 0, but not 0u, 0L, or an enum of type ubyte with value 0, and of course not by anything that might have a value distinct form 0 such as 1 or a run-time value.

Because those types would be templates (or behave as such), their arguments could be matched:

void f0(T)(__integer!(T, 0) x) { }
void fi(int x) { }
alias f = f0;
alias f = fi;
f(0); // calls f0!(int)
f(1); // calls fi

(No need to change overload resolution: Partial ordering determines that fi can be called with f0’s (synthesized) parameter type __integer!(int, 0), but f0 can’t be called with fi’s parameter type int, so it’s more specific.)

This is akin to how staticArray can infer type and size from an array literal, just that it’s much finer grained. The staticArray function makes the argument decay insofar as the types of the entries must be unified. With this addition, that becomes optional: __array!(int, string) would be totally valid, it just can’t decay into a static or dynamic array type, so if it has to because it’s not caught early enough, that’s an error. A tuple type of int and string could support an opAssign that takes an __array!(int, string) parameter and that allows t = [1, "hi"] even though typeof([1, "hi"]) fails because it requires an array literal to decay into some T[] which this one can’t. Maybe a future edition could make typeof([1, "hi"]) not be an error, but as of now, this can be used to test if two values have compatible type and we can’t just take that use case away.

This subsumes enum parameters and tuples to some degree.

Recap: In another DIP idea, I suggested an enum parameter would be a compile-time constant that’s passed the same way a run-time parameter is passed to a function call. Essentially what Zig calls comptime parameters.

I also suggested in the past that static arrays are basically homogeneous tuples and that they could be generalized to tuples. That would mean the syntax would be brackets, not parentheses, but that neatly solves the 1-tuple case, since (x) must stay x, but [x] isn’t the same as x. The idea there the same decay observation and that there’s no inherent need to decay array literals to static arrays immediately and to dynamic arrays further.

Of course, this idea doesn’t solve auto enum from the enum parameter idea as neatly. It also doesn’t add any tuple decomposition support and syntax sugars one might want to have. In that context, __array is a bad name and it should be __tuple instead.

It does solve e.g. compile-time format strings and indexing into a tuple:

int format(Fmt, Ts...)(__nodecay Fmt fmt, in Ts args)
if (__traits(compiles, { enum string s = fmt; }))
{
    // If fmt is a string literal,
    // its type Fmt is a unit type,
    // i.e. enough to recreate the value
    // without even considering fmt.
    enum string s = fmt;
    // s can be analyzed like any compile-time constant
}

// string literals are zero-terminated
// and must be distinct from other array literals in undecayed form
// (hex strings are even more special)
format("%d", 10); // okay: fmt!(__string(char, "%d"))

// actual array literal
format(['%', 'd'], 10); // okay: fmt!(__array!(__integer!(char, '%'), __integer!(char, 'd'))

string fmt = "%s";
// calls some other format function
// that must do run-time checking
format(fmt, 10);

As for static indexing:

struct Tuple(Ts...)
{
    Ts expand;

    static foreach (i; 0 .. Ts.length)
        ref Ts[i] opIndex(T)(__integer!(T, i)) return => expand[i];

    // or

    static foreach (i; 0 .. Ts.length)
        ref Ts[i] opIndex(__integer!(size_t, i)) return => expand[i];
}

Tuple!(int, string) t;

auto x = t[0]; // calls t.opIndex!int(__integer!(int, 0));
auto y = t[1u]; // calls t.opIndex!uint(__integer!(uint, 1));
auto z = t[2L]; // error, none of the overloads match

// or

auto x = t[0]; // calls t.opIndex(__integer!(size_t, 0));
auto y = t[1u]; // calls t.opIndex(__integer!(size_t, 1));
auto z = t[2L]; // error, none of the overloads match

The second alternative only works if the types __integer(T, x) have semantics that they convert implicitly to each other if their values are equal.

On Wednesday, 25 June 2025 at 17:56:09 UTC, Quirin Schroll wrote:

There’s a really clever idea in here somewhere but I think you didn’t quite hit the nail on the head. The one-way implicit casting of literals often gets in my way…

auto x = 1;
ushort y = x; //ERR: `1` is an `int`… ugh
ushort z = 1; //OK: `1` is actually `ushort` now
auto a = [1,2];
ubyte[] b = a.dup(); //ERR: `[1,2]` is an `int[]`… ugh
ubyte[] c = [1,2]; //OK: on second thought, `[1,2]` can totally be a `ubyte[]`…

One thing that could mitigate this is having explicit suffixes for char, byte, and short. But what would be really nice is if the language could keep track of when a variable’s type was inferred from nothing but a literal, and then allow the usual type conversion restrictions to be bent a little based on how the literal could’ve been interpreted as a different type.
It’s a similar idea to https://dlang.org/spec/type.html#vrp which never accounts for variables that have just been unconditionally assigned a literal value.
Oddly, the idea that I described already exists for enums:

enum int[] x = [1,2];
ushort[] y = x; //no error?!
ushort[2] z = x; //still no error!!?!

Hopefully that makes sense.

TL;DR: Let variables that are initialised/unconditionally assigned literals follow the same implicit cast rules as the literal, meaning that int x=1; byte y=x; compiles since 1 can be inferred as a byte.

Forums