On Friday, 3 May 2024 at 09:15:02 UTC, rkompass wrote:
> Could DIP 1039 be restarted?
I don’t know why it couldn’t.
One thing where DIP1039 would shine over staticArray
:
String literals (typed immutable(Char)[]
) are zero terminated. The zero isn’t part of the slice, but it’s there, so one can pass "blah".ptr
to a C API and be good, because "blah"
isn’t ['b', 'l', 'a', 'h']
, but rather ['b', 'l', 'a', 'h', '\0'][0..4]
.
If you write:
auto blah = "blah".staticArray;
As far as I can tell, blah
is an array of 4 chars. It has no zero terminator, and I don’t see how it could have one.
However,
immutable(char)[$] blah = "blah";
could absolutely be implemented as making space for the 4 characters plus a zero terminator, but blah.length
would still be 4 and it would be a normal static array otherwise, but contrary to the staticArray
solution, you could pass blah.ptr
to a C API. However, you can’t pass a copy of blah
to a C API, as a copy only copies 4 bytes.
To be precise: Let Char
denote any character type; if arr
is a static array initialized by the Char[$]
and a string literal, arr.ptr[a.length]
is Char(0)
. That rule does not apply to non-character element types (e.g. int
), and it does not apply to an array object spelled out as a list, e.g. in ['x']
, or gained from a function call. This is already the behavior of Char[]
s.
One might add that the zero terminator not being part of the array might be incorrect, as in C, it is part of the array. My solution would be to add another construct:
immutable(char)[$+1] str = "Hello";
The $+1
is core syntax. There is no $+2
or anything.
It expresses that the array is (at least) 1 character longer than the initializer looks like.
This is exactly what C does. It’s the maximally faithful translation of:
const char str[] = "Hello";
It only exists for character arrays and they must be initialized by a string literal or by a compile-time known const(char)[]
ending with the zero character.
Whoever revives DIP1039 should take care to answer questions posed in the reviews. I have some:
- The DIP should provide a rationale as to why the feature is not allowed in function declarations.
- The DIP does not provide enough examples; it should clearly demonstrate behavior for every situation in which a type can be used. The DIP author agrees.
- The DIP should explain what the problem was with the first attempt and how this proposal address that problem. The DIP author disagrees.
- The DIP should specify if arrays of character types include a terminating \0 and, if so, if it is part of the length.
- The DIP fails to provide a rationale as to why std.array.staticArray is insufficient.
- Better examples are needed. The DIP author agreed.
- The DIP should use ease of reading/writing code as an argument and provide examples to that effect. The DIP author agreed.
- The benefit gained is very minor, so the DIP should address this in relation to the difficulty of the implementation and maintenance. The DIP author agreed.
My takes:
- It makes conceptually no sense as the length is inferred from the initializer, but function parameters have none. One could argue that it does make sense for defaulted parameters, though. For template value parameters, static arrays and slices are almost equivalent anyway. It should be allowed for function return types, though. If entire types can be inferred, that should be possible. Essentially,
int[$] f()
is auto f()
where the compiler errors if the return type isn’t convertible to a static array.
- (Grunt work.)
- I have no idea what the complaint even means. There is no “problem,” just a nuisance. Something that’s trivial in C is hard in D, which makes no sense.
- I stated that, precisely, the zero terminator should be present, but not part of the static array proper.
- It’s insufficient to interface with C APIs. Something that’s trivial (and safe) in C requires a library in D.
- (Grunt work.)
- (Grunt work.)
- It is minor, no doubt about that. For quite some features of D, the benefit is small or even none technically, but the implementation isn’t gigantic either. The prime example is
=>
function definitions: Those only have ease of writing on their side and added zero things that couldn't easily be done without them. The T[$]
declarations at least has the additional argument that C code can be translated to D in the core language and that literals could be stack allocated and passed to C APIs. Of course Phobos could add staticCharArrayZ
to add the zero terminator – oh wait, it can’t because either the zero is lost on copying or part of the length.
Other thoughts:
The DIP mentions T[$]
only in declarations and casts. It neither states that T[$]
is a type of its own right nor does it deny it.
If T[$]
were a type, it would be a weird type, on par with void
, probably even worse. (Hint: An int[$]
has no values of its own, and no size. It must decay into an int[n]
wherever it’s used. Probably many more issues.) My sense is, adding T[$]
requires a lot of work, both specifying it and implementing it. That sets the DIP up for failure as it becomes convoluted and full of corner cases. The DIP mentioned it as a type suffix, which would, IMO, boil down to making int[$]
a fully formed type. I can only advise anyone who thinks of rebooting the DIP: Don’t do that.
The only advantage I could think of why T[$]
should be a type is so that object.d could provide sstring
as an alias to immutable(char)[$]
. My best guess is that, if people who’d otherwise write immutable(char)[$]
over and over in their code will just define ichar = immutable(char)
and go with ichar[$]
.
That doesn’t keep us from allowing [$]
suffixes on function return types (roughly equivalent to auto
return types), as well as on parameter and variable declarations. For that, bake it into the syntax there:
VarDeclarations:
- StorageClasses? BasicType TypeSuffixes? IdentifierInitializers
+ StorageClasses? BasicType TypeSuffixes? StaticArraySuffixes? IdentifierInitializers
Declarator:
- TypeSuffixes? Identifier
+ TypeSuffixes? StaticArraySuffixes? Identifier
FuncDeclarator:
- TypeSuffixes? Identifier FuncDeclaratorSuffix
+ TypeSuffixes? StaticArraySuffixes? Identifier FuncDeclaratorSuffix
+
+ StaticArraySuffixes:
+ [ $ + 1 ] ArraySuffixes?
+ [ $ ] ArraySuffixes?
+
+ ArraySuffixes:
+ [ $ ] ArraySuffixes?
+ [ AssignExpression ] ArraySuffixes?
+ [] ArraySuffixes?
That allows, essentially, char[$+1][][$][4]
, but not char[$]*
. It does not bake into the grammar that int[$+1]
isn’t possible, but it does acknowledge that [$+1]
cannot possibly appear after another ArraySuffix
.
The reason to allow nested arrays of various kinds, but not e.g. pointers to them, is that those can be expressed in one literal:
char[$+1][] strings = [ "abc", "cd" ];
// as if:
char[4][] strings = [ "abc", "cd" ];
// 4 == [ "abc", "cd" ].map!(x => x.length).reduce!max + 1
int[$][$] matrix = [ [ 1, 2 ], [ 3, 4 ], [ 5, 6 ] ];
// as if
int[2][3] matrix = [ [ 1, 2 ], [ 3, 4 ], [ 5, 6 ] ];
The DIP went into detail how it’s supposed to work with variable declarations.
For function parameters, they need default values. Then, they’re basically identical to variable declarations and infer the size from the default argument.
For function return types, let’s say the function return type is specified as T[$]
. To infer the size, any return expression;
is treated as if return cast(T[(expression).length])expression;
. For a programmer, that would be annoying to write, but the compiler can do it. Note that what’s inside a cast
is unevaluated, so both: the expression is evaluated exactly once; and the length must be known at compile time.