December 20, 2010
On Monday 20 December 2010 01:19:31 spir wrote:
> On Sun, 19 Dec 2010 21:33:56 -0500
> 
> bearophile <bearophileHUGS@lycos.com> wrote:
> > >So, putting classes on the stack kind of negates the whole point of having both structs and classes in the first place.<
> > 
> > This is false, the definition of D class instance doesn't specify where the instance memory is allocated.
> 
> For me, the important difference is that classes are referenced, while structs are plain values. This is a semantic distinction of highest importance. I would like structs to be subtype-able and to implement (runtime-type-based) polymorphism.

Except that contradicts the facts that they're value types. You can't have a type which has polymorphism and is a value type. By its very nature, polymorphism requires you to deal with a reference.

C++ allows you to put classes on the stack. It even allows you to assign a derived type to a base type where the variable being assigned to is on the stack. The result is shearing. The only part assigned is the base type portion, and the data which is part of the derived type is lost. That's because the variable _is_ the base type. A value type _is_ a particular type _exactly_ and _cannot_ be any other type. This is distinctly different from a reference of a base type which points to an object which is of a derived type. In that case, the variable is a reference of the base type, but the object referenced is in fact the derived type. The indirection allows you to use the derived type as if it were the base type. It allows you to use polymorphism. Without that indirection, you can't do that.

So, you _could_ make structs have inheritance, but doing so would introduce shearing, which causes a number of problems. One of the main reasons that structs in D do _not_ have inheritance is to avoid shearing.

- Jonathan M Davis
December 20, 2010
On Mon, 20 Dec 2010 01:29:13 -0800
Jonathan M Davis <jmdavisProg@gmx.com> wrote:

> > For me, the important difference is that classes are referenced, while structs are plain values. This is a semantic distinction of highest importance. I would like structs to be subtype-able and to implement (runtime-type-based) polymorphism.
> 
> Except that contradicts the facts that they're value types. You can't have a type which has polymorphism and is a value type. By its very nature, polymorphism requires you to deal with a reference.

Can you expand on this?

At least Oberon has value structs ("records") with inheritance and polyporphism; I guess the turbo Pascal OO model was of that kind, too (unsure) -- at least the version implemented in freepascal seems to work fine that way. And probably loads of less known PLs provide such a feature.
D structs could as well IIUC: I do not see the relation with instances beeing implicitely referenced. (Except that they must be passed by ref to "member functions" they are the receiver of, but this is true for any kind of OO, including present D structs.)

(I guess we have very different notions of "reference", as shown by previous threads.)


Denis
-- -- -- -- -- -- --
vit esse estrany ☣

spir.wikidot.com

December 20, 2010
Nick Voronin:

> Here is where we diverge. Choosing struct vs class on criteria of their placement makes no sense to me.

In D you use a class if you want inheritance or when you (often) need reference semantics, and you use a struct when you need a little value passed around by value or when you want a simple form of RAII or when you want to implement something manually (like using PIMPL), or when you want max performance (and you manage structs by pointer, you may even put a tag inside the stuct or the pointer and implement manually some kind of inheritance). With structs you have a literal syntax, postblits, in-place allocation, and you are free to use align() too.

Bye,
bearophile
December 20, 2010
On Monday 20 December 2010 01:52:58 spir wrote:
> On Mon, 20 Dec 2010 01:29:13 -0800
> 
> Jonathan M Davis <jmdavisProg@gmx.com> wrote:
> > > For me, the important difference is that classes are referenced, while structs are plain values. This is a semantic distinction of highest importance. I would like structs to be subtype-able and to implement (runtime-type-based) polymorphism.
> > 
> > Except that contradicts the facts that they're value types. You can't have a type which has polymorphism and is a value type. By its very nature, polymorphism requires you to deal with a reference.
> 
> Can you expand on this?
> 
> At least Oberon has value structs ("records") with inheritance and polyporphism; I guess the turbo Pascal OO model was of that kind, too (unsure) -- at least the version implemented in freepascal seems to work fine that way. And probably loads of less known PLs provide such a feature. D structs could as well IIUC: I do not see the relation with instances beeing implicitely referenced. (Except that they must be passed by ref to "member functions" they are the receiver of, but this is true for any kind of OO, including present D structs.)
> 
> (I guess we have very different notions of "reference", as shown by
> previous threads.)

Okay. This can get pretty complicated, so I'm likely to screw up on some of the details, but this should give you a basic idea of what's going on.

In essentially any C-based language, when you declare an integer on the stack like so:

int a = 2;

you set aside a portion of the stack which is the exact size of an int (typically 32 bits, but that will depend on the language). If you declare a pointer,

int* a;

then you're setting aside a portion of the stack the size of a pointer (32 bits on a 32 bit machine and 64 bits on a 64 bit machine). That variable then holds an address - typically to somewhere on the heap, though it could be to an address on the stack somewhere. In the case of int*, the address pointed to will refer to a 32-bit block of memory which holds an int.

If you have a struct or a class that you put on the stack. Say,

class A
{
    int a;
    float b;
}

then you're setting aside exactly as much space as that type requires to hold itself. At minimum, that will be the total size of its member variables (in this case an int and a float, so probably a total of 64 bits), but it often will include extra padding to align the variables along appropriate boundaries for the sake of efficiency, and depending on the language, it could have extra type information. If the class has a virtual table (which it will if it has virtual functions, which in most any language other than C++ would mean that it definitely has a virtual table), then that would be part of the space required for the class as well (virtual functions are polymorphic; when you call a virtual function, it calls the version of the function for the actual type that an object is rather than the pointer or reference that you're using to refer to the object; when a non-virtual function function is called, then the version of the function which the pointer or reference is is used; all class functions are virtual in D unless the compiler determines that they don't have to be and optimizes it out (typically because they're final); struct functions and stand- alone functions are never virtual). The exact memory layout of a type _must_ be known at compile time. The exact amount of space required is then known, so that the stack layout can be done appropriately.

If you're dealing with a pointer, then the exact memory layout of the memory being pointed to needs to be known when that memory is initialized, but the pointer doesn't necessarily need to know it. This means that you can have a pointer of one type point to a variable of another type. Now, assuming that you're not subverting the type system (e.g. my casting int* to float*), you're dealing with inheritance. For instance, you have

class B : A
{
    bool c;
}

and a variable of type A*. That pointer could point to an object which is exactly of type A, or it could point to any subtype of A. B is derived from A, so the object could be a B. As long as the functions are virtual, you can have polymorphic functions by having the virtual table used to call the version of the function for the type that the object actually is rather than the type that the pointer is.

References are essentially the same as pointers (though they may have some extra information with them, making them a bit bigger than a pointer would be in terms of the amount of space required on the stack). However, in the case of D, pointers are _not_ treated as polymorphic (regardless of whether a function is virtual or not), whereas references _are_ treated as polymorphic (why, I don't know - probably to simplify pointers). In C++ though, pointers are polymorphic.

Now, if you have a variable of type A*, you could do something like this:

B* b = new B();
A* a = b;

A* takes up 32 or 64 bits in memory and holds the memory location on the heap where the B object is. Both pointers have the same value and point to the same object. The only difference is how the compiler treats each type (e.g. you can't call a B function on the a variable). Calling A functions on the a variable will call the B version if it has its own version and the function is virtual. However, what about this:

B b;
A a = b;

The memory layout of b and a must be known at compile time. They're laid out precisely on the stack. b has the size of a B object. a has the size of an A object. a is _exactly_ an A. It cannot be a B. So, what you get is called sheering. The A portions of the variable are assigned (in this case, the int and the float), whereas the B portions aren't assigned. a is now exactly as it would have been had you created it with its member variables having the same values that b's member variables from its A portion had. This is almost certainly _not_ what you wanted.

Now, because a is exactly an A, and b is exactly a B, when you go to call functions on them, it doesn't matter whether they're virtual or not. The type of the variable _is_ the type of the object. There is no polymorphism. You _need_ that level of indirection to get it.

Now, you could conceivably have a language where all of its objects were actually pointers, but they were treated as value types. So,

B b;
A a = b;

would actually be declaring

B* b;
A* a = b;

underneath the hood, except that the assignment would do a deep copy and allocate the appropriate meemory rather than just copying the pointer like would happen in a language like C++ or D. Perhaps that's what Oberon does. I have no idea. I have never heard of the language before, let alone used it. However, that's _not_ how C++, D, C#, or Java works. If you declare

B b;
A a = b;

then you are literally putting a B and an A on the stack, and assignments from a B to an A will cause sheering. D chose to avoid the sheering issue by making structs not have inheritance. This also means that they don't have a virtual table, which makes them more efficient. Classes have inheritance and a virtual table, but because they're on the heap, you don't get sheering and polymorphism works just fine.

So, what it comes down to is that you can't have polymorphism for a stack object because you know _exactly_ what its type is, and you can't have inheritance for a stack object without risking sheering when assignments are made (unless you disallow assignments from one type of object to another unless they're the exact same type).

So, you're never going to see inheritance for structs in D. It doesn't fit its memory model at all. What you get instead are templates, which can be used to generate the same code for different types. And that's as close as you're going to get for polymorphism for structs.

- Jonathan M Davis
December 20, 2010
Jonathan M Davis:

> So, putting classes on the stack kind of negates the whole point of having both structs and classes in the first place.

Where you put the instance is mostly a matter of implementation. This is why a smart JavaVM is able to perform escape analysis and choose where to allocate the class instance.

Keep in mind that if you allocate a class on the stack or in-place inside another class, you don't turn it into a value, because beside the class instance you reserve space for its reference too (this reference may even be immutable, if you want).


> scoped classes are definitely not in SafeD.

Well implemented scoped classes are safe enough (compared to the other things). The compiler may perform escape analysis of all the aliases of a scoped object and statically raise an error if a reference escapes. This isn't 100% safe in a language that has all kind of casts and low-level features, but it's often safe enough, compared to other things. And those casts and low level features that can fool the escape analysis can be disabled statically (with something like @safe), this makes scoped classes 100% safe, probably safer than heap allocations.


>The whole point of "safe" when talking about safe in D is memory saftey.

I know, but some people (including me) think that "safe D" is a misleading name because it just means "memory safe D".


>If the compiler can determine that a particular class object can be put on the stack and optimize it that way. Fine, but it's pretty rare that it can do that - essentially only in cases where you don't pass it to _anything_ except for pure functions (including calls to member functions).

I don't agree that it's rare. If a function that allocates an object calls a function (or member function) that's present in the same compilation unit (this more or less means same module), then the compiler is able to continue the escape analysis and determine if the called function escapes the reference. If this doesn't happen, then the class instance is free to be scoped. This situation is common enough.


>And if the compiler can do that, then it there's no need for the programmer to use scope explicitly.<

I don't agree. An annotation like "@scope" is a contract between the programmer and the compiler. It means that if the compiler sees a reference escape, then it stops the compilation.


>And no, a compiler _can't_ do pure optimizations on its own, generally-speaking, because that would require looking not only at the body of the function that's being called but at the function bodies of any functions that it calls. D is not designed in a way that the compiler even necessarily has _access_ to a function's body when compiling, and you can't generally look at a function's body when doing optimizations when calling that function. So, _some_ pure optimizations could be done, but most couldn't. This is not the case with scoped classes, because purity already gives you the information that you need.<

Quite often a function calls another function in thee same compilation unit, in this case the analysis is possible. So you limit the optimizations to this common but limited case.

And LDC compiler and in future GDC too, have link-time optimization, this means the compiler packs or sees the program code code in a single compilation unit. In this case it's able to perform a more complete analysis (including de-virtualization of some virtual functions).


>Safety by convention means that the language and the compiler do not enforce it in any way.<

This is not fully true. If the syntax of the unsafe thing is ugly and long, the programmer is discouraged to use it. This makes the unsafe thing more visible for the eyes of the programmer. Statistically this may reduce bug count.


>There's nothing contradictory about Walter's stance. He's for having safety built into the language as much as reasonably possible and against having it thrust upon the programmer to program in a particular way to avoid unsafe stuff.<

I think you have missed part of the context of my comments for Nick Voronin, he was trying to say something here:

>Yet we won't have library solution for pointers instead of language support (hopefully)? :) I think it all goes against "being practical" as an objective of the language. Safety is important but you don't achieve safety by means of making unsafe thing unconvenient and inefficient. If there is emplace() then there is no reason not to have scope storage class. At least looking from user's POV. I don't know how hard it is on the compiler.<
>In _general_ case there is no safety in D. With all low-level capabilities one can always defeat compiler. Removing intermediate-level safer (yet unsafe) capabilities arguabily gains nothing but frustration. I'm all for encouraging good practices, but this is different.<

In D the convention is to not use certain low-level means to do something (and @safe statically forbids them, so it's not just a convention, I agree with you). But I am programming in D for some years and I have seen that some times (for performance or for other causes) you need to use the unsafe low-level features. In this case D leaves you on your own. You don't have something intermediate between the safe high level features and the unsafe C features. Nick and I feel the lack of this intermediate level of safety.

Then I have explained to Nick about the failure of Cyclone. It shows that implementing that intermediate level is hard, and it may lead to a not so good language. This is probably why Walter has chosen to avoid doing it. Despite the failure of Cyclone I am not sure Walter is right, and there may be a way to implement this intermediate level. I think the ATS language shows ways to do it, but it's a very hard to use language, "fifteen" times harder than D. So it's an open problem still.

Bye,
bearophile
December 20, 2010
On Mon, 20 Dec 2010 03:11:49 -0800
Jonathan M Davis <jmdavisProg@gmx.com> wrote:

> Now, you could conceivably have a language where all of its objects were actually pointers, but they were treated as value types. So,
> 
> B b;
> A a = b;
> 
> would actually be declaring
> 
> B* b;
> A* a = b;
> 
> underneath the hood, except that the assignment would do a deep copy and allocate the appropriate meemory rather than just copying the pointer like would happen in a language like C++ or D. Perhaps that's what Oberon does. I have no idea. I have never heard of the language before, let alone used it.

I don't know how Oberon works. But I'm sure that its records are plain values, _not_ "pointed" under the hood. And their methods all are virtual (they have a virtual method table). I have no more details, sorry.

Denis
-- -- -- -- -- -- --
vit esse estrany ☣

spir.wikidot.com

December 20, 2010
On Monday, December 20, 2010 03:19:48 bearophile wrote:
> Jonathan M Davis:
> > So, putting classes on the stack kind of negates the whole point of having both structs and classes in the first place.
> 
> Where you put the instance is mostly a matter of implementation. This is why a smart JavaVM is able to perform escape analysis and choose where to allocate the class instance.
> 
> Keep in mind that if you allocate a class on the stack or in-place inside another class, you don't turn it into a value, because beside the class instance you reserve space for its reference too (this reference may even be immutable, if you want).
> 
> > scoped classes are definitely not in SafeD.
> 
> Well implemented scoped classes are safe enough (compared to the other things). The compiler may perform escape analysis of all the aliases of a scoped object and statically raise an error if a reference escapes. This isn't 100% safe in a language that has all kind of casts and low-level features, but it's often safe enough, compared to other things. And those casts and low level features that can fool the escape analysis can be disabled statically (with something like @safe), this makes scoped classes 100% safe, probably safer than heap allocations.
> 
> >The whole point of "safe" when talking about safe in D is memory saftey.
> 
> I know, but some people (including me) think that "safe D" is a misleading name because it just means "memory safe D".

Talking about SafeD meaning memory safety makes the meaning of safety clear. If you try and make the term safety encompass more than that, it takes very little for "safety" to become subjective. Regardless of whether it would be nice if SafeD gave types of safety other than memory safety, when D documentation and any of the main D devs talk about safety, it is memory safety which is being referred to. Trying to expand the meaning beyond that will just cause confusion regardless of whether the non-memory safety being discussed is desirable or not.

> >If the compiler can determine that a particular class object can be put on the stack and optimize it that way. Fine, but it's pretty rare that it can do that - essentially only in cases where you don't pass it to _anything_ except for pure functions (including calls to member functions).
> 
> I don't agree that it's rare. If a function that allocates an object calls a function (or member function) that's present in the same compilation unit (this more or less means same module), then the compiler is able to continue the escape analysis and determine if the called function escapes the reference. If this doesn't happen, then the class instance is free to be scoped. This situation is common enough.
> 
> >And if the compiler can do that, then it there's no need for the programmer to use scope explicitly.<
> 
> I don't agree. An annotation like "@scope" is a contract between the programmer and the compiler. It means that if the compiler sees a reference escape, then it stops the compilation.
> 
> >And no, a compiler _can't_ do pure optimizations on its own, generally-speaking, because that would require looking not only at the body of the function that's being called but at the function bodies of any functions that it calls. D is not designed in a way that the compiler even necessarily has _access_ to a function's body when compiling, and you can't generally look at a function's body when doing optimizations when calling that function. So, _some_ pure optimizations could be done, but most couldn't. This is not the case with scoped classes, because purity already gives you the information that you need.<
> 
> Quite often a function calls another function in thee same compilation unit, in this case the analysis is possible. So you limit the optimizations to this common but limited case.
> 
> And LDC compiler and in future GDC too, have link-time optimization, this means the compiler packs or sees the program code code in a single compilation unit. In this case it's able to perform a more complete analysis (including de-virtualization of some virtual functions).

It's trivial to get a reference or pointer to escape and make undetectable to the compiler. Some escape analysis can be and is done, but all it takes is passing a pointer or a reference to another function and the compiler can't determine it anymore unless it has access to the called functions body, and perhaps the bodies of functions that that function calls. And if the compiler can't be 100% correct with escape analysis, then any feature that requires it is not safe.

And as great as fancier optimizations such as link-time optimizations may be, the existence of dynamic libraries eliminates any and all guarantees that such optimizations would be able to make if they had all of the source to look at. So, you can't rely on them. They help, and they're great, but no feature can require them. They're optimizations only.

- Jonathan M Davis
December 20, 2010
On Monday, December 20, 2010 06:24:56 spir wrote:
> On Mon, 20 Dec 2010 03:11:49 -0800
> 
> Jonathan M Davis <jmdavisProg@gmx.com> wrote:
> > Now, you could conceivably have a language where all of its objects were actually pointers, but they were treated as value types. So,
> > 
> > B b;
> > A a = b;
> > 
> > would actually be declaring
> > 
> > B* b;
> > A* a = b;
> > 
> > underneath the hood, except that the assignment would do a deep copy and allocate the appropriate meemory rather than just copying the pointer like would happen in a language like C++ or D. Perhaps that's what Oberon does. I have no idea. I have never heard of the language before, let alone used it.
> 
> I don't know how Oberon works. But I'm sure that its records are plain values, _not_ "pointed" under the hood. And their methods all are virtual (they have a virtual method table). I have no more details, sorry.

Well, given C's memory model - which D uses - you can't do that. Oberon could use a different memory model and have some other way of doing it, but it won't work for D, so you'll never see structs with polymorphic behavior in D.

- Jonthan M Davis
December 20, 2010
On Sun, 19 Dec 2010 17:38:17 -0500, Jonathan M Davis <jmdavisProg@gmx.com> wrote:

> On Sunday 19 December 2010 14:26:19 bearophile wrote:
>> Jonathan M Davis:
>> > There will be a library solution to do it, but again, it's unsafe.
>>
>> It can be safer if the compiler gives some help. For me it's one of the
>> important unfinished parts of D.
>
> Whereas, I would argue that it's completely unnecessary. structs and classes
> serve different purposes. There is no need for scoped classes. They may
> perodically be useful, but on the whole, they're completely unnecessary.
>
> The compiler can help, but it can't fix the problem any more that it can
> guarantee that a pointer to a local variable doesn't escape once you've passed
> it to another function. In _some_ circumstances, it can catch escaping pointers
> and references, but in the general case, it can't.
>
> If we have library solutions for people who want to play with fire, that's fine.
> But scoped classes is just not one of those things that the language really
> needs. They complicate things unnecessarily for minimal benefit.

I don't mind having a solution as long as there is a solution.

The main need I see for scoped classes is for when you *know* as the programmer that the lifetime of a class or struct will not exceed the lifetime of a function, but you don't want to incur the penalty of allocating on the heap.  Mostly this is because the functions you want to call take classes or interfaces.

It's difficult to find an example with Phobos since there are not many classes.  But with Tango, scoped classes are used everywhere.

-Steve
December 21, 2010
On Mon, 20 Dec 2010 05:43:08 -0500
bearophile <bearophileHUGS@lycos.com> wrote:

> Nick Voronin:
> 
> > Here is where we diverge. Choosing struct vs class on criteria of their placement makes no sense to me.
> 
> In D you use a class if you want inheritance or when you (often) need reference semantics, and you use a struct when you need a little value passed around by value or when you want a simple form of RAII or when you want to implement something manually (like using PIMPL), or when you want max performance (and you manage structs by pointer, you may even put a tag inside the stuct or the pointer and implement manually some kind of inheritance). With structs you have a literal syntax, postblits, in-place allocation, and you are free to use align() too.

Well said. Plenty of differences there more important than stack/heap allocation.

-- 
Nick Voronin <elfy.nv@gmail.com>