January 17, 2019
On Thu, 17 Jan 2019 22:56:00 +0100, Johannes Loher wrote:
> There are quite a few languages which allow declaration of variables of
> their bottom type in the manner I described. These include:
> - Rust
> - Kotlin
> - Flow
> - Haskell
> - Swift
> 
> On the other hand, I don't know of any language which prevents this. Do you know any?

C and C++ don't care if you use something before initializing it. Most languages explicitly forbid it, and Haskell particularly doesn't let you declare something before initializing it. So in order to use a variable of the bottom type, you must first assign it somehow. That means calling a function that returns the bottom type.
January 17, 2019
Am 17.01.19 um 23:13 schrieb H. S. Teoh:
> Even better yet, this means we don't even need to make it an error to declare and initialize a variable of type Bottom: the initialization can just be defined to be `assert(0);`, i.e.:
> 
> 	Bottom b;
> 
> becomes equivalent to (in pseudo-code):
> 
> 	Bottom b = void;
> 	b = assert(0);
> 
> which simplifies to just:
> 
> 	assert(0);

I didn't actually think this far, but you are right! We'd still somehow need to deal with void initialization though. In particular, what does this mean:

```
Tbottom fun()
{
    Tbottom b = void;
    return b;
}
```
?

With regular types, void initialization is no problem from a type system perspective because the memory the variable uses always holds some data which can be interpreted as data of that type. This is not true for Tbottom because there can not be any data at all. So I think we would still need to make (at least explicit) void initialization an error.
January 17, 2019
On Thu, Jan 17, 2019 at 09:43:52PM +0000, Jonathan Marler via Digitalmars-d wrote:
> On Thursday, 17 January 2019 at 20:15:20 UTC, H. S. Teoh wrote:
> > On Thu, Jan 17, 2019 at 06:51:13PM +0000, Jonathan Marler via Digitalmars-d wrote: [...]
> > > To summarize, the bottom type allows you to declare a special pointer with no storage!
> > > 
> > >     TBottom*.sizeof == 0
> > > 
> > > This is also something that should be in the DIP and Walter should chime in with whether or not he thinks these semantics can actually be implemented.
> > 
> > This would introduce an asymmetry with the rest of the pointer types in the language which all have equal sizes, and this asymmetry will percolate down to other language constructs, causing special cases and inconsistencies everywhere.  I don't recommend doing this.
> 
> Yes that's kind of the point :)  It's a new pointer type that has a unique size from all other pointer types.  This means you can no-longer assume all pointers types are the same size, you have a special case where it's size can be zero. This might warrant some code changes to handle this case, but does enable some new semantics which can be useful for library writers.  Case in point was you can now have a template parameter T* that the user can instantiate in such a way as to eliminate the pointer alltogether.

No, that's backwards.  (1) Introducing asymmetry, i.e., special cases, is bad, because inevitably people will forget to check for it, leading to endless bugs. It will also percolate all over the language, creating exceptions and new corner cases, which are what we're trying to *reduce* here. (2) Eliminating a field is achieved by passing in a Unit type, not a Bottom type. E.g., if you have a struct template:

	struct S(T, U) {
		T t;
		U u;
	}

then you could instantiate a single-field struct by doing:

	S!(Unit, int) s;
	// Equivalent to: struct { int u; }

or:

	S!(string, Unit) t;
	// Equivalent to: struct { string t; }

Instantiating it with Bottom means forcing a compile error (or runtime
abort).


> > A better approach might be to make TBottom* always equal to null -- i.e., it's always illegal to dereference it because no instances of TBottom can exist.
> > 
> 
> That's conceptually the same thing.  Saying that TBottom* is "always equal to null" is the same as saying it's a Unit Type, which is the same as saying that it contains no information so the storage needed is 0 (same as void).

I'm not so sure about that.  A true Unit type conveys no information, yet TBottom* conveys at least this information: that it's a pointer, and that the pointer cannot be dereferenced.  A true Unit type would be the return type of a void function; would you equate TBottom* with the return type of a void function?  That would be very strange indeed.


T

-- 
Freedom of speech: the whole world has no right *not* to hear my spouting off!
January 17, 2019
Am 17.01.19 um 23:33 schrieb H. S. Teoh:
> I'm not so sure about that.  A true Unit type conveys no information, yet TBottom* conveys at least this information: that it's a pointer, and that the pointer cannot be dereferenced.  A true Unit type would be the return type of a void function; would you equate TBottom* with the return type of a void function?  That would be very strange indeed.

I suggested this earlier in this thread, i.e.

```
alias void = Tbottom*;
```

This would make `null` the single value `void`.

If we were to follow type theory rigorously, we would not even have another choice than doing this because all types which have exactly one value are naturally isomorphic and this means they are equal from a type theory perspective.

This is not the case in D (in particular while struct A {} and struct B {} are of the same type from a rigorous type theory perspective, they are not in D). So we don't actually have to do it like this, but it would be an option (if we fixed void to be a proper unit type).

January 17, 2019
On Thu, Jan 17, 2019 at 11:32:42PM +0100, Johannes Loher via Digitalmars-d wrote: [...]
> [...] In particular, what does this mean:
> 
> ```
> Tbottom fun()
> {
>     Tbottom b = void;
>     return b;
> }
> ```
> ?
> 
> With regular types, void initialization is no problem from a type system perspective because the memory the variable uses always holds some data which can be interpreted as data of that type. This is not true for Tbottom because there can not be any data at all. So I think we would still need to make (at least explicit) void initialization an error.

Hmm that's a good question.  I was going to suggest that the very act of declaring a variable of type Bottom should be defined to be equivalent to writing `assert(0)`, but then it wouldn't have the correct semantics when you write:

	Bottom b = func(); // func returns Bottom

since it would abort before calling func(), which is wrong.

So it looks like we'll have no choice but to special-case void-initialization of Bottom to be illegal, much as I don't like special cases.

It does lead to other corner cases like structs that contain Bottom fields -- instantiating such a struct should abort at runtime, and void initialization should be illegal -- so this special case is infectious and may lead to increased compiler complexity. OTOH, maybe this complexity can be nipped in the bud by stipulating that any aggregate that contains Bottom reduces to Bottom, i.e., any struct or class that contains a Bottom field will be defined to be Bottom itself (since it would be impossible to create an instance of such a struct or class without also creating an instance of Bottom).

A particularly interesting (in a nasty way) case is a union that contains Bottom fields. How would you define its semantics?  Since technically, if you only assign to its non-Bottom components, you aren't actually creating an instance of Bottom.  But since the field is there, you can access it anytime.  I'm tempted to say that such a union should also reduce to Bottom -- since one could argue that the Bottom field already exists in the union, even if it's uninitialized.  This could fall under the auspices of the no-void-initialization rule.

But there might be a type-theoretic issue here, though: unions being (approximately) a sum type, one would expect that it's legal to take a sum of Bottom with other types: since Bottom is uninhabited, the sum of Bottom with any other type should reduce to the other type (rather than Bottom itself!).  Then again, D unions aren't really the same thing as a sum type in type theory AFAICT, so this may not be saying very much. But it's a tricky case that needs to be addressed before we start adding Bottom to the language.


T

-- 
"I suspect the best way to deal with procrastination is to put off the procrastination itself until later. I've been meaning to try this, but haven't gotten around to it yet. " -- swr
January 17, 2019
On Thu, Jan 17, 2019 at 11:52:02PM +0100, Johannes Loher via Digitalmars-d wrote:
> Am 17.01.19 um 23:33 schrieb H. S. Teoh:
> > I'm not so sure about that.  A true Unit type conveys no information, yet TBottom* conveys at least this information: that it's a pointer, and that the pointer cannot be dereferenced.  A true Unit type would be the return type of a void function; would you equate TBottom* with the return type of a void function?  That would be very strange indeed.
> 
> I suggested this earlier in this thread, i.e.
> 
> ```
> alias void = Tbottom*;
> ```
> 
> This would make `null` the single value `void`.
> 
> If we were to follow type theory rigorously, we would not even have another choice than doing this because all types which have exactly one value are naturally isomorphic and this means they are equal from a type theory perspective.

This is where I wish someone who actually knows type theory would step up and clear up the situation, because in my mind, the type system we have in D contains an inherent, additional layer of complexity beyond what you're describing, and this is related to what you say below:


> This is not the case in D (in particular while struct A {} and struct B {} are of the same type from a rigorous type theory perspective, they are not in D). So we don't actually have to do it like this, but it would be an option (if we fixed void to be a proper unit type).

I don't think we can realistically force empty structs `struct A {}` and `struct B {}` to be the same type, because even though they are arguably both unit types, they can still be distinguished from each other by name. This distinction is also more than just mere convention; it plays a crucial role in UDAs, for example:

	struct DontSerialize {}
	struct SerializeAsBytes {}

	struct Data {
		@DontSerialize uint runtimeId;
		@SerializeAsBytes int payload;
	}

If we collapsed both structs into a single Unit type, the UDA mechanism would completely fall apart.

Furthermore, types in D cannot be identified merely by their constituent parts.  For example:

	struct CartesianCoors {
		float x;
		float y;
	}

	struct PolarCoors {
		float theta;
		float radius;
	}

Technically, both structs are products of two floats, but that does not make them the same thing, because the values of their constituent components are interpreted differently and should not be confused one for another.

At the very least, it would seem that the *name* of the type plays an essential role in its identification.  I.e., it's almost as if a struct declaration is actually defining a type that, in addition to the types of its fields, contains also an implicit string type identifying the name of the struct. Or alternatively, we're dealing with a type system where types are additionally decorated with string identifiers that distinguish otherwise-identical types from each other.  (Or it could be that I've no idea what I'm talking about, and this is the consequence of this community having very few people who actually know type theory thoroughly enough to be able to work out a sane solution to all of these issues. :-P)


In any case, coming back to TBottom*, another issue that makes me wary of defining TBottom* == void is the top pointer `void*`.  Since any pointer implicitly converts to `void*`, this means TBottom* implicitly converts to `void*` too, which in turn means `void` should also implicitly convert to `void*`:

	void procedure(...) { }

	void* ptr = procedure(...); // valid if TBottom* == Unit == void

I think the problems that such a construct would cause would far outweigh whatever benefits that we may have reaped from introducing Unit and Bottom types!


T

-- 
Life is complex. It consists of real and imaginary parts. -- YHL
January 18, 2019
Am 18.01.19 um 00:05 schrieb H. S. Teoh:
> 
> Hmm that's a good question.  I was going to suggest that the very act of declaring a variable of type Bottom should be defined to be equivalent to writing `assert(0)`, but then it wouldn't have the correct semantics when you write:
> 
> 	Bottom b = func(); // func returns Bottom
> 
> since it would abort before calling func(), which is wrong.

What we can do is simply replace this by
```
func();
assert(0);
```

> So it looks like we'll have no choice but to special-case void-initialization of Bottom to be illegal, much as I don't like special cases.
> 
> It does lead to other corner cases like structs that contain Bottom fields -- instantiating such a struct should abort at runtime, and void initialization should be illegal -- so this special case is infectious and may lead to increased compiler complexity. OTOH, maybe this complexity can be nipped in the bud by stipulating that any aggregate that contains Bottom reduces to Bottom, i.e., any struct or class that contains a Bottom field will be defined to be Bottom itself (since it would be impossible to create an instance of such a struct or class without also creating an instance of Bottom).

Structs with a bottom type member lowering to the bottom type member is the natural thing to do. Structs are product types and the product of any type with the bottom type is the bottom type.

> A particularly interesting (in a nasty way) case is a union that
> contains Bottom fields. How would you define its semantics?  Since
> technically, if you only assign to its non-Bottom components, you aren't
> actually creating an instance of Bottom.  But since the field is there,
> you can access it anytime.  I'm tempted to say that such a union should
> also reduce to Bottom -- since one could argue that the Bottom field
> already exists in the union, even if it's uninitialized.  This could
> fall under the auspices of the no-void-initialization rule.
> But there might be a type-theoretic issue here, though: unions being
> (approximately) a sum type, one would expect that it's legal to take a
> sum of Bottom with other types: since Bottom is uninhabited, the sum of
> Bottom with any other type should reduce to the other type (rather than
> Bottom itself!).  Then again, D unions aren't really the same thing as a
> sum type in type theory AFAICT, so this may not be saying very much.
> But it's a tricky case that needs to be addressed before we start adding
> Bottom to the language.

As far as I am concerned, unions are sum types, you just have no option to actually to find out which type they actually hold at runtime. While this is a feature that is very useful and often implied when talking about sum types (e.g. you can do it with std.variant.Algebraic), I donÄt think it is stricly necessary.

This means that as you mentioned, any union type with a bottom type
member should "lower" to the union type without the bottom type member.
Actually trying to access it could either be a compile error or lower to
`assert(0)`. I am not completely sure what is preferrable, it probably
also depends on what would happen in other cases of usage of the bottom
type (compile error or assert(0);). It should be as consistent as possible.

There is a slightly weird situation with the default initialization of
unions. Usually, the first member of the union is initialized in that
case. however, in the case of
```
union A
{
    Tbottom i;
    long j;
}
```
this would mean that doing `auto A = A();` basically lowers to
`assert(0);` while for
```
union A
{
    long j;
    Tbottom i;
}
```
it would work fine and `j` would be initialized to `0`. This sounds
weird at first, but it is consistent with how unions of other types work:

```
import std.stdio;

union A
{
    float i;
    long j;
}

union B
{
    long j;
    float i;
}

void main()
{
    writeln(A().j); // 2145386496
    writeln(B().j); // 0
}
```
January 18, 2019
Am 18.01.19 um 00:34 schrieb H. S. Teoh:
> I don't think we can realistically force empty structs `struct A {}` and `struct B {}` to be the same type, because even though they are arguably both unit types, they can still be distinguished from each other by name. This distinction is also more than just mere convention; it plays a crucial role in UDAs, for example:
> 
> 	struct DontSerialize {}
> 	struct SerializeAsBytes {}
> 
> 	struct Data {
> 		@DontSerialize uint runtimeId;
> 		@SerializeAsBytes int payload;
> 	}
> 
> If we collapsed both structs into a single Unit type, the UDA mechanism would completely fall apart.
> 
> Furthermore, types in D cannot be identified merely by their constituent parts.  For example:
> 
> 	struct CartesianCoors {
> 		float x;
> 		float y;
> 	}
> 
> 	struct PolarCoors {
> 		float theta;
> 		float radius;
> 	}
> 
> Technically, both structs are products of two floats, but that does not make them the same thing, because the values of their constituent components are interpreted differently and should not be confused one for another.

I completely agree. D's types are not structural types.

> At the very least, it would seem that the *name* of the type plays an essential role in its identification.  I.e., it's almost as if a struct declaration is actually defining a type that, in addition to the types of its fields, contains also an implicit string type identifying the name of the struct. Or alternatively, we're dealing with a type system where types are additionally decorated with string identifiers that distinguish otherwise-identical types from each other.  (Or it could be that I've no idea what I'm talking about, and this is the consequence of this community having very few people who actually know type theory thoroughly enough to be able to work out a sane solution to all of these issues. :-P)

This actually sounds like reasonable way to view it. You can even get that additional field at runtime (typeid).


> In any case, coming back to TBottom*, another issue that makes me wary of defining TBottom* == void is the top pointer `void*`.  Since any pointer implicitly converts to `void*`, this means TBottom* implicitly converts to `void*` too, which in turn means `void` should also implicitly convert to `void*`:
> 
> 	void procedure(...) { }
> 
> 	void* ptr = procedure(...); // valid if TBottom* == Unit == void
> 
> I think the problems that such a construct would cause would far outweigh whatever benefits that we may have reaped from introducing Unit and Bottom types!

You are absolutely correct. I don't think it makes any sense to define `void` to be `Tbottom*` if we don't fix the problem with `void*`. I am even tend to think that in this case it might also not be reasonable to give `void` a value at all because it would make the difference in meaning of void in `void fun();` and `void*` even more apparent. This could also be a good thing though because at the moment almost nobody realizes that there is a difference.

January 17, 2019
On Fri, Jan 18, 2019 at 12:34:39AM +0100, Johannes Loher via Digitalmars-d wrote:
> Am 18.01.19 um 00:05 schrieb H. S. Teoh:
> > 
> > Hmm that's a good question.  I was going to suggest that the very act of declaring a variable of type Bottom should be defined to be equivalent to writing `assert(0)`, but then it wouldn't have the correct semantics when you write:
> > 
> > 	Bottom b = func(); // func returns Bottom
> > 
> > since it would abort before calling func(), which is wrong.
> 
> What we can do is simply replace this by
> ```
> func();
> assert(0);
> ```

Hmm you're right, in this case, we're not actually declaring b as a
separate step from calling func(); we're actually *initializing* b with
the return value of func(). Meaning that, in implementation, we're
calling func() then initializing b with the return value.  So that
initialization would fall under the same case as declaring `Bottom b;`,
defined to lower to `assert(0);`.


[...]
> Structs with a bottom type member lowering to the bottom type member is the natural thing to do. Structs are product types and the product of any type with the bottom type is the bottom type.

You're right, the product of anything with the bottom type is the bottom type.  No special casing required here.


> > A particularly interesting (in a nasty way) case is a union that contains Bottom fields. How would you define its semantics?  Since technically, if you only assign to its non-Bottom components, you aren't actually creating an instance of Bottom.  But since the field is there, you can access it anytime.  I'm tempted to say that such a union should also reduce to Bottom -- since one could argue that the Bottom field already exists in the union, even if it's uninitialized.  This could fall under the auspices of the no-void-initialization rule.  But there might be a type-theoretic issue here, though: unions being (approximately) a sum type, one would expect that it's legal to take a sum of Bottom with other types: since Bottom is uninhabited, the sum of Bottom with any other type should reduce to the other type (rather than Bottom itself!). Then again, D unions aren't really the same thing as a sum type in type theory AFAICT, so this may not be saying very much.  But it's a tricky case that needs to be addressed before we start adding Bottom to the language.
> 
> As far as I am concerned, unions are sum types, you just have no option to actually to find out which type they actually hold at runtime. While this is a feature that is very useful and often implied when talking about sum types (e.g. you can do it with std.variant.Algebraic), I donÄt think it is stricly necessary.
> 
> This means that as you mentioned, any union type with a bottom type member should "lower" to the union type without the bottom type member.

Makes sense.  But worthy of note, since it may not be immediately obvious to those of us less well-versed with type theory. :-D  And definitely this needs to be stated clearly in the DIP, otherwise it will be overlooked and who knows what buggy behaviour would result.


> Actually trying to access it could either be a compile error or lower
> to `assert(0)`. I am not completely sure what is preferrable, it
> probably also depends on what would happen in other cases of usage of
> the bottom type (compile error or assert(0);). It should be as
> consistent as possible.

Yeah I'm not sure what should happen either.  But I think lowering to assert(0) makes sense.  It's how I'd implement a function declared to return Bottom:

	Bottom func() {
		for(;;) {
			doStuff();
		}

		return Bottom.init; // equivalent to assert(0);
	}

In other words, in order for func to implement its declared return type of Bottom, it ought to return an instance of Bottom, and the construction of this impossible instance should abort the program. Which would be consistent with current practice of unreachable code being self-documented by inserting `assert(0);`.


> There is a slightly weird situation with the default initialization of
> unions. Usually, the first member of the union is initialized in that
> case. however, in the case of
> ```
> union A
> {
>     Tbottom i;
>     long j;
> }
> ```
> this would mean that doing `auto A = A();` basically lowers to
> `assert(0);` while for
> ```
> union A
> {
>     long j;
>     Tbottom i;
> }
> ```
> it would work fine and `j` would be initialized to `0`. This sounds
> weird at first, but it is consistent with how unions of other types
> work:
> 
> ```
> import std.stdio;
> 
> union A
> {
>     float i;
>     long j;
> }
> 
> union B
> {
>     long j;
>     float i;
> }
> 
> void main()
> {
>     writeln(A().j); // 2145386496
>     writeln(B().j); // 0
> }
> ```

Ick.  This is one dark corner of D that I really dislike.  But it makes sense, in its own way, given D's guarantee that .init must always exist and be well-defined.  And the choice of initializing a union by its first member's .init is as good as any other choice, since we have to initialize it with *something*.  Just that when TBottom is involved, the order of declaration suddenly can mean the difference between aborting and initializing with an innocuous value.


Another case to consider: what should .typeid of Tbottom return?


T

-- 
Lottery: tax on the stupid. -- Slashdotter
January 18, 2019
Am 18.01.19 um 00:53 schrieb H. S. Teoh:
> Another case to consider: what should .typeid of Tbottom return?
> 
Indeed an interesting question. Since `Tbottom` is a type like any other, it needs to return an instance of TypeInfo.

The even more interesting question is what the methods should return.
Take for example `TypeInfo.equals`. Because there are no values of type
`Tbottom`, any two instances of type `Tbottom` are equal and the method
should return true. On the other hand, by the same argument, any two
instances of `Tbottom` are not equal and the method should return false.
From a more pragmatic point of view: It does not really matter what this
method returns. It does take pointers to void which internally will be
cast to `Tbottom*` (becasue we are actually comparing `Tbottom`s) and
then dereferencing it will simply abort the program (`null` dereferencing).

I think it is similar for the other methods. Some simply return some information about the type, e.g. `talign` should return 0.