June 06, 2020
On Saturday, 6 June 2020 at 15:16:06 UTC, ag0aep6g wrote:
> But Walter agrees with you: Using a void value shouldn't actually have undefined behavior; it should just be an arbitrary value. Which is why he has an open pull request to change the spec:
> https://github.com/dlang/dlang.org/pull/2260

Reading that PR, one comment stood out:

> The fact that the compilers don't define the behavior doesn't make it undefined according to the language spec

Which makes me ask: what _does_ "undefined behaviour" mean according to the D language spec?

I'm not sure that the spec actually gives a definition: it certainly (in the Introduction) starts talking about things that _result_ in undefined behaviour, without ever first saying what undefined behaviour is.
June 06, 2020
On 06.06.20 21:30, Joseph Rushton Wakeling wrote:
> On Saturday, 6 June 2020 at 15:16:06 UTC, ag0aep6g wrote:
>> But Walter agrees with you: Using a void value shouldn't actually have undefined behavior; it should just be an arbitrary value. Which is why he has an open pull request to change the spec:
>> https://github.com/dlang/dlang.org/pull/2260
> 
> Reading that PR, one comment stood out:
> 
>> The fact that the compilers don't define the behavior doesn't make it undefined according to the language spec
> ...

You have to read it in context. The PR attempts to change accessing void-initialized memory from undefined behavior to implementation-defined behavior. (I don't see how that helps in any way, but that's what it does.)
I.e., compiler implementations would have to specify the behavior. Walter is saying that if they fail to do that, that would then be an error in the documentation of those compilers.

> Which makes me ask: what _does_ "undefined behaviour" mean according to the D language spec?
> ...

Unfortunately it does not mean the behavior is not defined. It means the behavior is explicitly defined to be arbitrary. It's not very well-designed terminology, but D inherits it from C.

> I'm not sure that the spec actually gives a definition: it certainly (in the Introduction) starts talking about things that _result_ in undefined behaviour, without ever first saying what undefined behaviour is.

In practice, it means what the backend developers take it to mean. In practice, this means that they will develop code generation procedures that are correct under the assumption that the source program never triggers UB. If that assumption is violated, demons may fly out of your nose, or more likely, bad actors will be able to take control of your process by carefully crafting inputs that exploit memory corruption.

https://en.wikipedia.org/wiki/Undefined_behavior
June 06, 2020
On Saturday, 6 June 2020 at 18:43:43 UTC, Paul Backus wrote:
> That is, it would be perfectly consistent to say that void initialization of int in @safe code is fine, because every 4-byte bit pattern represents a valid int, but void-initialization of bool in @safe code is no good, because many 1-byte bit patterns do not represent valid bools.

That would be fine, yes. I just don't expect that Walter will resolve the issue that way. If he does, updating the spec will be no big deal compared to updating the compiler.
June 06, 2020
On Saturday, 6 June 2020 at 18:30:15 UTC, Avrina wrote:
> On Saturday, 6 June 2020 at 15:16:06 UTC, ag0aep6g wrote:
[...]
> So your saying void initializers don't have a defined function then? So if LDC wants it could initialise the memory.of all void initialized variables to zero and that would be considered within defined behaviour? If void initializers are undefined behaviour then the feature is useless.

It's not void initializers themselves that have undefined behavior. Using an uninitialized value has undefined behavior. In code:

----
int f() { int x = void; x = 42; return x + 1; } /* This is fine. */
int g() { int x = void; return x + 1; } /* This is not. */
----

If LDC defines behavior for `g`, it creates a dialect of D which has less undefined behavior than (current) standard D.

>> But the spec doesn't do that. Instead it says: "If a void initialized variable's value is used before it is set, the behavior is undefined." That is, the spec explicitly says that doing so is not allowed, and that compilers may assume that it doesn't happen. Because that's what "undefined behavior" means.
>
> The spec is poorly written. The language doesn't do what's written in it for a lot of cases on top of that.

True. Which is why a lot of fixing needs to be done. Sometimes the spec needs to be changed, sometimes the implementation, sometimes both.

[...]
>> But Walter agrees with you: Using a void value shouldn't actually have undefined behavior; it should just be an arbitrary value. Which is why he has an open pull request to change the spec:
>> https://github.com/dlang/dlang.org/pull/2260
>
> Yea and that pull request is years old. If that change does go into effect then the spec will reflect the actual reality of what is happening.

Sure. But as long as the PR is in limbo, Patrick Schluter has a point when he says that "undefined behaviour is undefined behaviour".
June 06, 2020
On 6/6/20 12:54 PM, Joseph Rushton Wakeling wrote:
> On Saturday, 6 June 2020 at 16:48:56 UTC, Steven Schveighoffer wrote:
>> int x = void;
>>
>> Does not and cannot corrupt memory (assuming everything that uses it is @safe).
> 
> Yes, which is presumably why the other example above -- using the wrong accessor of a union -- is also allowed in @safe code ... ?

There are two things here:
1. using the value of a union that was not set in @safe code can lead to corruption *easily*. Consider a union with perfectly overlapping fields, that are not pointers. The individual fields could have (possibly UFCS) trusted semantics that are invalid when you arbitrarily set the data.
2. The compiler cannot be made to detect when this happens (halting problem).

So we have two options I can see:

1. declare that unions are not accessible in @safe, period. This would be a huge breaking change, but would be a correct distinction. You would have to mark all such code @trusted.
2. Somehow mark that unions with further semantic @safe guarantees are not allowed in @safe code. This could be a possible compromise that allows using unions still in @safe code.

The talk proposal I was going to submit this year was going to be very much about this problem.

> That said, the definition of a safe interface is, according to spec, that it exhibits no undefined behaviour:
> https://dlang.org/spec/function.html#safe-interfaces
> 
> Don't using void-initialized variables or the wrong accessors of a union both count as undefined behaviour, even if they are memory-safe?

Yes, either the spec or the behavior definition needs to be adjusted.

-Steve
June 06, 2020
On Saturday, 6 June 2020 at 19:52:11 UTC, Timon Gehr wrote:
> Unfortunately it does not mean the behavior is not defined. It means the behavior is explicitly defined to be arbitrary. It's not very well-designed terminology, but D inherits it from C.

The C99 standard has better terminology, though: it describes it as _indeterminate_, which can be either unspecified, or a trap value (the latter of which IIUC means any arbitrary bit pattern that fits in the block of memory the variable is stored in?).

If I've understood the spec correctly, "unspecified" allows for the possibility that an individual implementation _could_ specify the behaviour, it just doesn't require it -- right? -- whereas as you say, "implementation defined" in C explicitly means that the implementation _must_ specify the behaviour.

It looks like the D spec might benefit from adapting that kind of more precisely defined terminology.
June 06, 2020
On Saturday, 6 June 2020 at 21:01:05 UTC, Steven Schveighoffer wrote:
> The individual fields could have (possibly UFCS) trusted semantics that are invalid when you arbitrarily set the data.

Doesn't that also apply to void-initialized values in the case that the implementation allows arbitrary bit-patterns (what IIUC the C99 standard calls trap values)?
June 06, 2020
On 6/6/20 5:38 PM, Joseph Rushton Wakeling wrote:
> On Saturday, 6 June 2020 at 21:01:05 UTC, Steven Schveighoffer wrote:
>> The individual fields could have (possibly UFCS) trusted semantics that are invalid when you arbitrarily set the data.
> 
> Doesn't that also apply to void-initialized values in the case that the implementation allows arbitrary bit-patterns (what IIUC the C99 standard calls trap values)?

Yes, it's the same thing. This is why I specifically said that the case of:

int x = void;

won't corrupt memory *as long as everything that uses it is @safe*. This is due to the fact that all indexing operations in @safe code are bounds-checked.

As soon as you start using @trusted, then the semantic meaning of what x actually represents comes into play.

The thing we *should* do is just disallow all these corner cases in @safe code. It's much easier to relax it in certain cases later than it is to add on band-aids for all the bad cases.

I don't think the code breakage would be tolerable for many people. Then again, maybe void initialization isn't common enough to cause a lot of breakage, I don't know. But I'm sure union usage is higher.

-Steve
June 06, 2020
On Saturday, 6 June 2020 at 22:01:51 UTC, Steven Schveighoffer wrote:
> Yes, it's the same thing. This is why I specifically said that the case of:
>
> int x = void;
>
> won't corrupt memory *as long as everything that uses it is @safe*. This is due to the fact that all indexing operations in @safe code are bounds-checked.

Ahh, gotcha.  I didn't quite follow the nuances there before.  Thanks for clarifying, and for all the previous explanation.

> The thing we *should* do is just disallow all these corner cases in @safe code. It's much easier to relax it in certain cases later than it is to add on band-aids for all the bad cases.
>
> I don't think the code breakage would be tolerable for many people. Then again, maybe void initialization isn't common enough to cause a lot of breakage, I don't know. But I'm sure union usage is higher.

Could be interesting to put together compiler options to ban one or the other, and test them on a few major codebases (starting with druntime and phobos) to see how much breaks.  If nothing else it'd be useful information to consider for that D3 proposal ... :-)
June 07, 2020
On Saturday, 6 June 2020 at 22:01:51 UTC, Steven Schveighoffer wrote:
> On 6/6/20 5:38 PM, Joseph Rushton Wakeling wrote:
>> 
>> Doesn't that also apply to void-initialized values in the case that the implementation allows arbitrary bit-patterns (what IIUC the C99 standard calls trap values)?

Here's the definition of "trap representation" from the C99 standard:

> Certain object representations need not represent a value of the object type. [...] Such a representation is called a trap representation.

So, for example, 0x42 is a trap representation for the type `bool`, because there's no value of type `bool` that corresponds to that pattern of bits.

> Yes, it's the same thing. This is why I specifically said that the case of:
>
> int x = void;
>
> won't corrupt memory *as long as everything that uses it is @safe*. This is due to the fact that all indexing operations in @safe code are bounds-checked.

It's also due to the fact that the type `int` has no trap representations. Every bit pattern corresponds to a valid integer.

>
> As soon as you start using @trusted, then the semantic meaning of what x actually represents comes into play.

As long as the @trusted code is written correctly, it's safe regardless. The @trusted code has to be prepared to receive every possible safe value [1] of type `int`. Since every value of `int` is safe, it must be prepared for all possible values of type `int`. And since `int` has no trap representations, this is the same as being prepared for all possible bit patterns that could be stored in an `int`.

> The thing we *should* do is just disallow all these corner cases in @safe code. It's much easier to relax it in certain cases later than it is to add on band-aids for all the bad cases.

It is sufficient to forbid void-initialization of any type with unsafe values [1]--including trap representations.

[1] https://dlang.org/spec/function.html#safe-values