June 22, 2007
Walter Bright escribió:
> Sean Kelly wrote:
>> I personally find the use of three keywords to represent three overlapping facets of const behavior to be very confusing, and am concerned about trying to explain it to novice programmers.  With three keywords, there are six possible combinations:
>>
>> final
>> const invariant
>> final const
>> final invariant
>> const invariant
>> final const invariant
> 
> Probably the thing to do is simply outlaw using more than one.
> 

How does that fit with "in" meaning "final const scope"? Will it change?

-- 
Carlos Santander Bernal
June 22, 2007
Walter Bright wrote:
> Sean Kelly wrote:
>> Walter Bright wrote:
>>> In C++, sometimes const means invariant, and sometimes it means readonly view. I've found even C++ experts who don't know how it works.
>> Odd.  The C++ system always seemed extremely simple to me.
> 
> It isn't. I run into people all the time who are amazed to discover that const references can change. Few understand when const is invariant and when it isn't. I've never even seen anyone mention the problem where the non-transitive const destroys any hope of having FP like capabilities in C++.

Matter of opinion, I suppose.  The C++ design was immediately clear to me, though it obviously wasn't for others.  I grant that the aliasing problem can be confusing, but I feel that it is a peripheral issue.

>> I personally find the use of three keywords to represent three overlapping facets of const behavior to be very confusing, and am concerned about trying to explain it to novice programmers.  With three keywords, there are six possible combinations:
>>
>> final
>> const invariant
>> final const
>> final invariant
>> const invariant
>> final const invariant
> 
> Probably the thing to do is simply outlaw using more than one.

If nothing else, I imagine "final const" will be a necessary combination.

>> That some of these may be redundant just serves to further confuse the issue in my opinion.  So I wondered whether one of the keywords could be done away with.  Previously, you said 'invariant' may only apply to data whose value can be determined at compile-time, thus I imagine it can only apply to concrete/data types (ie. not classes).  Assuming this is true, I wonder whether there is truly a point in having 'invariant' at all.  Assuming it were done away with, the system becomes much simpler to me:
>>
>> final
>> const
>> final const
>>
>> And that's it.  'final' means a reference cannot be rebound, 'const' means the data cannot be altered (through the reference), and 'final const' means that both the reference is frozen and the data cannot be changed.  And that's it.
> 
> It's missing the transitive nature of invariant.

How so?  Given the above, I would consider 'const' to apply to a declaration left-to-right and 'final' to apply to a declaration right-to-left.  If there are parenthesis, the rightmost (ie. closing) paren would effectively be the barrier between const and final.  Thus, from your example:

const (int**)* x;

Represents a mutable pointer to a const**, and:

final const (int**)* x;

Represents an immutable (ie. final) pointer to a const int**.  By default, both qualifiers would be fully transitive, so:

final int** x;

Would be an immutable pointer to an immutable pointer to a mutable int.

Or am I missing something?


Sean
June 22, 2007
Walter Bright wrote:
> http://www.digitalmars.com/d/const.html

Very nice read.

But I have a few nitpicks too.  :-)

"""
Which brings up another aspect of const in D - it's transitive. Const in C++ is not transitive, which means one can have a pointer to const pointer to mutable int. To declare a variable that is const at each level, one must write:

int const *const *const *p;   // C++

The const is left associative, so the declaration is a pointer to const pointer to const pointer to const int. Const being transitive in D means that every reference reachable through the const is also const. An entire logical region of an application can be protected by placing only one qualifier. To reflect that, the syntax is different, using constructor-like notation:

const(int **)* p;	// D

Here the const applies to the part of the type that is in parentheses.
"""

The way these two examples come one right after the other it leads one to expect that the D example is going to show the D equivalent of the C++ code.  But I don't think that's what it's doing.  I would suggest first showing the D version of the C++ snippet (which is just const int***p -- right?).  That gives people a chance to say "ooh ah D is so much simpler".  Then follow up by saying "what if we instead want part of that to be modifiable?  And give the const(int**)* p example (and maybe a C++ version of that too).  Also the way it is it doesn't explain sufficiently what "applies to the part in parenthesis" means.  Being explicit about what's modifiable there would be helpful.  Maybe remind people to read decls from right to left and spell it out "this is a mutable pointer to const pointer to const pointer"

"""
But there is a need for a constant declaration referencing a mutable type.
"""
This transition could be clearer.  This line comes shortly after introducing 'invariant', so it sounds like you're still talking about a nuance of invariant here rather than moving on to the last of the big three const-related keywords.  Maybe say something like "Finally, there is also a need ..."

Also in the discussion of final, the word that always comes to my mind is 'binding'.  It's the binding of the variable name to its value that's final.   You say, "A final declaration, once it is set by its initializer or constructor, cannot ever change its value for its lifetime."  which to me is kind of vague.  I not sure what it means that "a declaration cannot change its value".  Does a declaration even have a value?  "I'm setting this declaration to 5" -- seems like unusual terminology to me.


This is more of a question than a suggestion, but here:
"""
Since a final declaration cannot change its value, it is by nature invariant, and the address of it will become a pointer to an invariant:
"""

That means this is ok:
  int x,y;
  final int* p = &x;
  p = &y;		// error, p is final
  *p = 3;		// ok, *p is mutable

But if I take &p then I lose the ability to indirectly set x?
  **(&p) = 5;           // error *(&p) not same type as p!


Finally:
"""
Like C++, D allows the casting away of constness and invariantness. Unlike C++, if the programmer then subverts the const or invariant guarantee and changes the underlying data, then undefined behavior results.
"""

This seems like kind of a weak ending.
Lack of being able to trust const due to casts is listed as one of the big problems with C++'s const.  Yet in the end it reads like D does *less* than C++ here.  Not only can you cast at will, you can't even be sure it will work.

Maybe it's just the weak statement "undefined behavior results" that bothers me.  It makes it sound like it just happens -- because it was too difficult to implement or something --  rather than it being a proactive component of D's design.  Something like this might read better:

"""
Unlike C++, D specifies that subversion of the const or invariant guarantee will result in undefined behavior.  Casting away const is safe only if the data referenced is truly not modified.  Of course even in C++, casting away const in order to modify data is usually not /safe/, merely well defined.  D allows the compiler to make optimizations which assume const data will not change, whereas C++ requires compilers to assume that it always changes, despite being const.
"""


--bb
June 22, 2007
Sean Kelly wrote:
> Walter Bright wrote:
>> Sean Kelly wrote:
>>> Walter Bright wrote:
>>>> In C++, sometimes const means invariant, and sometimes it means readonly view. I've found even C++ experts who don't know how it works.
>>> Odd.  The C++ system always seemed extremely simple to me.
>>
>> It isn't. I run into people all the time who are amazed to discover that const references can change. 

I was surprised to find that p->q[0]=5 is always ok for a
const C*p.  Learned that from reading the const doc just now.


> Matter of opinion, I suppose.  The C++ design was immediately clear to me, though it obviously wasn't for others.  I grant that the aliasing problem can be confusing, but I feel that it is a peripheral issue.

I vaguely remember C++'s const taking some time for me to internalize. But it seems natural now.  So I'm not too worried about it.  I also have a feeling that invariant won't show up very often except at the module-level scope.  So I'm not so worried about how to grok the odd combinations of const/invariant and what they mean.  I have trouble with groking "int * const * const **foo" in C++ but that's never caused me any problem in an actual programming task.  In the end there are just a handful of common usages.  It might be nice to have these collected and documented somewhere.

--bb
June 22, 2007
I think the most important thing regarding const/final/invariant is simply to remove redundancy. If every combination/keyword/construct means only one thing when applied to a particular declaration,it drastically reduces complexity for the user. For example, having "final int x = 5", "const int x = 5" and "invariant int x = 5" all mean the same thing is difficult, because users don't expect that. Similarly, having:

struct Foo
{
int x;
int* y;
}

const Foo bar;

, in which bar.x can freely change, but *bar.y can't makes structs seem far different than classes - you can no longer replace one with the other (besides having to put in/remove a "new"), if it's const.

I think having three keywords is probably a good thing, as long as whenever they're applied they mean different things. Overlapping meaning and unexpected behavior is one of the big problems with C++ - what's "static" mean? About 5 things?

On that note, I think "invariant" should be changed to "immutable". At the cost of a new keyword and a new token, we get both backwards compatibility with class invariants and disambiguation.

As long as I'm throwing my cent out there, dare I mention in-parameter-passing-by-default again?

All that said, I think this is a far cry better than the C++ version (which I never even bothered to understand), and I'm starting to see the advantages of using const (why was it left out in C#? The Java devs mentioned at some point that if they had the chance to redo it without breaking backwards compatibility, they would have). So, thanks, Walter, for the great update! D is definitely the language to beat for anything bigger than a Perl script.

All the best,
Fraser
June 22, 2007
Sean Kelly wrote:
> Matter of opinion, I suppose.  The C++ design was immediately clear to me, though it obviously wasn't for others.  I grant that the aliasing problem can be confusing, but I feel that it is a peripheral issue.

I don't think it is a peripheral issue. It completely screws up optimization, is useless for threading support, and has spawned endless angst about why C++ code is slower than Fortran code.


> Or am I missing something?

Yes, you've missed the distinction between a readonly view and a truly immutable value. That's not surprising, since my experience is that very few who are used to C++ const see the difference, and it took me a while to figure it out.
June 22, 2007
Walter Bright wrote:
> Sean Kelly wrote:
>> Matter of opinion, I suppose.  The C++ design was immediately clear to me, though it obviously wasn't for others.  I grant that the aliasing problem can be confusing, but I feel that it is a peripheral issue.
> 
> I don't think it is a peripheral issue. It completely screws up optimization, is useless for threading support, and has spawned endless angst about why C++ code is slower than Fortran code.

As a programmer, I consider the optimization problem to be a non-issue.    Optimization is just magic that happens to make my program run faster.  As for multithreading... I'll address that below.

>> Or am I missing something?
> 
> Yes, you've missed the distinction between a readonly view and a truly immutable value. That's not surprising, since my experience is that very few who are used to C++ const see the difference, and it took me a while to figure it out.

I didn't miss it, but I'm afraid I don't see how D fares much better than C++ in this respect.  In a typical program, I see two basic situations for manipulating external data: first, it may be a reference parameter passed into a function call, or second, it may be a global variable of some sort.

In the first case I assert that it is impossible to make any assumptions about the inherent immutability of the data.  D may offer 'invariant' for this purpose, but practicality dictates that I would never apply 'invariant' to function parameters.  'invariant' by itself is too restrictive for general use, so overloads must be provided, and maintaining 2-3 instances of the same routine with different parameter qualifiers invites more trouble than it prevents.  C++ has no equivalent, so there is no direct comparison there--mutability must always be assumed.

In the second case, it is generally inherently obvious whether the data is mutable.  Either it is a simple const-qualified global declaration, a reference is returned from a global routine of some sort that has documented guarantees about its mutability, etc.  These are the cases most likely to eschew mutexes in a multi-threaded program.

I will grant that 'invariant' may well be useful for writing self-documenting code at the highest level of an application, but I continue to wonder whether this offers a benefit sufficient to outweigh the complexity it adds to the design.  This is the real crux of my argument, and what I am hoping will come clear through discussion.  My initial reaction to the const design for D was "what the heck is this? well, I guess I'll figure it out when it's explained better," and though I feel I am gaining a better understanding now, my initial reaction greatly tempers my enthusiasm for the design.  In my opinion, if something is not inherently obvious then it is probably over-complicated, and I remain hopeful that a some simplification can be found that does not sacrifice much of the expressiveness of the current design.


Sean
June 22, 2007
Sean Kelly wrote:
> Walter Bright wrote:
>> Sean Kelly wrote:
>>> Matter of opinion, I suppose.  The C++ design was immediately clear to me, though it obviously wasn't for others.  I grant that the aliasing problem can be confusing, but I feel that it is a peripheral issue.
>>
>> I don't think it is a peripheral issue. It completely screws up optimization, is useless for threading support, and has spawned endless angst about why C++ code is slower than Fortran code.
> 
> As a programmer, I consider the optimization problem to be a non-issue.    Optimization is just magic that happens to make my program run faster.

Optimization often makes the difference between a successful project and a failure. C++ has failed to supplant FORTRAN, because although in every respect but one C++ is better, that one - optimization of arrays - matters a whole lot. It drives people using C++ to use inline assembler. They spend a lot of time on the issue. Various proposals to fix it, like 'noalias' and 'restrict', consume vast amounts of programmer time. And time is money.

Optimization issues often drive the choice of language. That really makes it an issue!


>> Yes, you've missed the distinction between a readonly view and a truly immutable value. That's not surprising, since my experience is that very few who are used to C++ const see the difference, and it took me a while to figure it out.
> 
> I didn't miss it, but I'm afraid I don't see how D fares much better than C++ in this respect.  In a typical program, I see two basic situations for manipulating external data: first, it may be a reference parameter passed into a function call, or second, it may be a global variable of some sort.
> 
> In the first case I assert that it is impossible to make any assumptions about the inherent immutability of the data.

Right, hence the need for invariant.

> D may offer 'invariant' for this purpose, but practicality dictates that I would never apply 'invariant' to function parameters.

I'm less sure about that. I think we're all so used to C++ and its mushy concept of const that we don't know yet what will emerge from the use of invariant. I do know, however, that those who want to do advanced array optimizations are going to want to be using invariant function parameters.

> 'invariant' by itself is too restrictive for general use, so overloads must be provided, and maintaining 2-3 instances of the same routine with different parameter qualifiers invites more trouble than it prevents.

For a function that wants to accept invariant and mutable parameters, declare them to be 'const'.

> C++ has no equivalent, so there is no direct comparison there--mutability must always be assumed.
> 
> In the second case, it is generally inherently obvious whether the data is mutable.  Either it is a simple const-qualified global declaration, a reference is returned from a global routine of some sort that has documented guarantees about its mutability, etc.  These are the cases most likely to eschew mutexes in a multi-threaded program.

The problems with documentation are legion. It's inevitably wrong, out of date, incomplete, or missing. Furthermore, the compiler cannot make any use of documented characteristics, nor can it check them.

> I will grant that 'invariant' may well be useful for writing self-documenting code at the highest level of an application, but I continue to wonder whether this offers a benefit sufficient to outweigh the complexity it adds to the design.  This is the real crux of my argument, and what I am hoping will come clear through discussion.  My initial reaction to the const design for D was "what the heck is this? well, I guess I'll figure it out when it's explained better," and though I feel I am gaining a better understanding now, my initial reaction greatly tempers my enthusiasm for the design.  In my opinion, if something is not inherently obvious then it is probably over-complicated, and I remain hopeful that a some simplification can be found that does not sacrifice much of the expressiveness of the current design.

I share your concern, there, but we tried about everything for months, and nothing else worked.
June 23, 2007
Walter Bright wrote:
> Sean Kelly wrote:
>> Walter Bright wrote:
>>> Sean Kelly wrote:
>>>> Matter of opinion, I suppose.  The C++ design was immediately clear to me, though it obviously wasn't for others.  I grant that the aliasing problem can be confusing, but I feel that it is a peripheral issue.
>>>
>>> I don't think it is a peripheral issue. It completely screws up optimization, is useless for threading support, and has spawned endless angst about why C++ code is slower than Fortran code.
>>
>> As a programmer, I consider the optimization problem to be a non-issue.    Optimization is just magic that happens to make my program run faster.
> 
> Optimization often makes the difference between a successful project and a failure. C++ has failed to supplant FORTRAN, because although in every respect but one C++ is better, that one - optimization of arrays - matters a whole lot. It drives people using C++ to use inline assembler. They spend a lot of time on the issue. Various proposals to fix it, like 'noalias' and 'restrict', consume vast amounts of programmer time. And time is money.
> 
> Optimization issues often drive the choice of language. That really makes it an issue!

You're right of course.  However, my point was that to a programmer, optimization should be invisible.  I can appreciate that 'invariant' may be of tremendous use to the compiler, but I balk at the notion of adding language features that seem largely intended as compiler "hints." Rather, it would be preferable if the language were structured in such a way as to make such hints unnecessary.  To that end, and speaking as someone who isn't primarily involved in numerics programming, my impression of FORTRAN is that the language is syntactically suited to numerics programming, while C++ is not.  Even if C++ performed on par with FORTRAN for similar work (and Bjarne suggested last year that it could), I would likely still choose FORTRAN over C++ because the syntax seems so much more appealing for that kind of work.

>>> Yes, you've missed the distinction between a readonly view and a truly immutable value. That's not surprising, since my experience is that very few who are used to C++ const see the difference, and it took me a while to figure it out.
>>
>> I didn't miss it, but I'm afraid I don't see how D fares much better than C++ in this respect.  In a typical program, I see two basic situations for manipulating external data: first, it may be a reference parameter passed into a function call, or second, it may be a global variable of some sort.
>>
>> In the first case I assert that it is impossible to make any assumptions about the inherent immutability of the data.
> 
> Right, hence the need for invariant.
> 
>> D may offer 'invariant' for this purpose, but practicality dictates that I would never apply 'invariant' to function parameters.
> 
> I'm less sure about that. I think we're all so used to C++ and its mushy concept of const that we don't know yet what will emerge from the use of invariant. I do know, however, that those who want to do advanced array optimizations are going to want to be using invariant function parameters.

You may be right, and I'm certainly willing to give it a try.  This is simply my initial reaction to the new design, and I wanted to voice it before becoming placated by experience.  My gut feeling is that a better design is possible, and I'm not yet ready to close the door on alternatives.

>> 'invariant' by itself is too restrictive for general use, so overloads must be provided, and maintaining 2-3 instances of the same routine with different parameter qualifiers invites more trouble than it prevents.
> 
> For a function that wants to accept invariant and mutable parameters, declare them to be 'const'.

Well sure.  But that merely supports my point that 'invariant' may be too specialized to justify the conceptual complexity it adds.

>> C++ has no equivalent, so there is no direct comparison there--mutability must always be assumed.
>>
>> In the second case, it is generally inherently obvious whether the data is mutable.  Either it is a simple const-qualified global declaration, a reference is returned from a global routine of some sort that has documented guarantees about its mutability, etc.  These are the cases most likely to eschew mutexes in a multi-threaded program.
> 
> The problems with documentation are legion. It's inevitably wrong, out of date, incomplete, or missing. Furthermore, the compiler cannot make any use of documented characteristics, nor can it check them.

The compiler can inspect the code however, and a global const is as good as an invariant for optimization (as far as I know).  As for the rest, I think the majority of remaining cases aren't ones where 'invariant' would apply anyway: dynamic buffers whose contents are guaranteed not to change either in word or by design, etc.

>> I will grant that 'invariant' may well be useful for writing self-documenting code at the highest level of an application, but I continue to wonder whether this offers a benefit sufficient to outweigh the complexity it adds to the design.  This is the real crux of my argument, and what I am hoping will come clear through discussion.  My initial reaction to the const design for D was "what the heck is this? well, I guess I'll figure it out when it's explained better," and though I feel I am gaining a better understanding now, my initial reaction greatly tempers my enthusiasm for the design.  In my opinion, if something is not inherently obvious then it is probably over-complicated, and I remain hopeful that a some simplification can be found that does not sacrifice much of the expressiveness of the current design.
> 
> I share your concern, there, but we tried about everything for months, and nothing else worked.

Perhaps I'm just a few months worth of discussion behind then.  I'll admit that what I'm mostly doing here is poking the haystack to see if anyone tumbles out.


Sean
June 23, 2007

Sean Kelly wrote:
> Daniel Keep wrote:
>> final int x;        typeof(x) == int;
>> const int y;        typeof(y) == const int;
>> final const int z;    typeof(z) == final const int;
> 
> Hm.  Just to clarify, we both agree that the value of a final integer (ie. case 1 above) is effectively constant, correct?

Indeed.

>> "Wait; why is final part of the last type, but not the first?"  And what does this mean if you want a class member with final-style binding, but (final const) type semantics?
>>
>> final (final const(int*)) foo;
>>
>> As opposed to
>>
>> final invariant(int*) foo;
> 
> Perhaps I'm missing something, but I would rewrite this as:
> 
> final const int* foo;
> 
> Thus foo cannot be reassigned once set and the data foo refers to may not be changed through foo.  This is a slightly weaker guarantee than:
> 
> final invariant int* foo;
> 
> Which says that the data foo refers to is immutable, but I am skeptical that this guarantee actually matters much to users.
> 
> Or am I completely misunderstanding?  And why the parenthesis in the second declaration?

(When I wrote the below, I missed precisely what you were saying: the problem with writing it at "final const int*" is that you've got both final and const as a storage class; I assumed you were using them as a type constructor.)

Ok, let's try this instead: in the current system, we have

class Foo
{
    final invariant(FunkyType) funky;

    this(bool superFunky)
    {
        if( superFunky )
            funky = cast(invariant) new FunkyType;
        else
            funky = some_global_invariant_funky;
    }
}

In this case, we have a final storage (so we can assign the value during
the ctor), and an invariant(FunkyType) value type.

Under your proposal, invariant is replaced with (final const); if we
want the above, it'd become:

    final final const(FunkyType) funky;

But how does the compiler tell that second final is storage class or part of the type constructor?  Remember, we could be dealing with a template that's just shoving "final" out the front of things, so we can't assume two finals ==> one is a type constructor.  So we'd probably have to do this:

    final (final const(FunkyType)) funky;

We can't use *just* "final const FunkyType funky" because then we'd have an "invariant" storage class: which means the initialiser would have to be a compile-time constant.  Incidentally, the above is also probably equivalent to:

    final(final const(FunkyType)) funky;

Which really doesn't make any sense anyway...

>> I think the thing here is that you're shifting the complexity from invariant into final; instead of invariant meaning two different things with two very different appearances, you've got final meaning two slightly different things with almost identical looks.
> 
> I only see final meaning one thing: that the associated value may not be reassigned.  For concrete types like integers this is effectively the same as const, but as Walter said, the integer would be addressable when final but not when const.  Perhaps this is the source of confusion?
> 
> Sean

Yes, but it *also* means "can assign to in a ctor".

Like Lars said, I do think that if there's a way to simplify or consolidate this, then we should take it.  That said, I can see a use for each of the various cases the new system allows for, and I don't want to see any of them cut off in the name of "well *I'm* not going to use it, and I don't like that keyword, so it has to go!"[1]. :)

	-- Daniel

[1] I'm not saying that's what you're doing, but I've heard that sort of argument from a lot of people, so I've become somewhat defensive of the new stuff: don't take away my new shinies!