April 03, 2008
Walter Bright wrote:
> Janice Caron wrote:
>> However, I suspect that the following will be OK.
>>
>>     enum g = 42;
>>
>>     pure int f()
>>     {
>>         return g;
>>     }
> 
> Yes, because the value of g can never change.
> 
> The way to think about pure functions is that if the arguments are the same, the result will be the same. This precludes reading any global state that could change, and precludes writing to any global state.
> 
> But global invariants and manifest constants cannot ever change state, and cannot be written to, so using them in a pure function is ok.

So does that mean

  pure int f(const Class c)
  { ...
  }

will *not* be allowed?  Because some part of c *could* change, even if it's not f() doing the changing.  I.e. another thread could be changing c concurrently.

If so, that seems unnecessarily restrictive to me.  Any side effect or indeterminacy of f() in that case is not because of f()'s doing.  So f()'s behavior *is* pure even if it takes a const argument.  It's not f's fault if the environment it's living in is unstable.

Maybe there need to be two levels of pure?  One says this function by itself has no side effects, the other says additionally that the function is invariant with respect to the environment.  The former category of functions can all be run in parallel safely provided there are no other threads modifying the environment simultaneously.  The latter category can be run absolutely any time, any order, and any way you want, no matter what else is going on with the environment.

--bb
April 03, 2008
== Quote from Walter Bright (newshound1@digitalmars.com)'s article
> Sean Kelly wrote:
> > My traditional argument in support of logical const is this:
> >
> >     class C
> >     {
> >         mutable mutex monitor;
> >         std::string name;
> >     public:
> >         std::string name() const
> >         {
> >             scoped_lock sl( monitor );
> >             return name;
> >         }
> >
> >         void name( std::string const& n )
> >         {
> >             scoped_lock sl( monitor );
> >             name = n;
> >         }
> >     };
> >
> > Here, the mutex has nothing to do with the state of the object and must be modified even during logically non-modifying operations. Similar behavior might be necessary for a logging system in some instances, etc.  However, I'm not certain whether this is sufficient to justify logical const in general.  As you say--it has some serious problems as well.
> If C.name were invariant, there would be no need for any locks. I don't think this is a good example, because with D's invariant strings there is no need for such locking.

I'm not sure that's true.  Mutexes do two things: first, they guarantee atomicity for an arbitrary sequence of instructions and second, they provide a memory barrier so the result of those instructions is made visible to any other thread which later acquires the same mutex.  The invariant label provides a slightly different guarantee: that the data underlying an invariant reference will not ever change.  Here's an off-the-cuff example:

    string getInput()
    {
        auto buf = new char[64]; // A
        readln( buf );
        return cast(string) buf;
    }

    class Person
    {
        string name() { return m_name; }
        void name( string n ) { name = n; }
        private string m_name;
    }

    auto p = new Person;
    // start Thread B

    // Thread A
    writef( "enter your name: " );
    p.name = getInput();

    // Thread B
    while( !p.name )
        Thread.yield();
    writefln( "Your name is ", p.name );

In the above code, the only memory barrier exists at point A, where the GC acquires a mutex to handle the allocation (let's assume that no IO synchronization occurs for the sake of this example).  After this memory barrier, Thread A does this:

    * transfers input from the keyboard into buf
    * declares buf to be invariant because it will never change
    * assigns buf to the name member in p

Meanwhile, Thread B is waiting for name to be non-empty, but specifically, what I believe it's doing is this:

    while p.name.ptr is null
        wait

Then it prints p.name to the screen using the length attribute to determine how many bytes to print.  In this example, then, we are relying on the following set of operations to occur sequentially, from a global perspective:

    1. the input to be written into buf
    2. the length of p.name to be set to the length of buf
    3. the ptr of p.name to be set to the ptr of buf

(note that operations 2 and 3 are actually expected to be a single atomic operation, but I've ordered them this way based on how the code in Thread B is written)

However, as you're no doubt aware, the underlying hardware and even
the generated code dictate the order in which things actually occur.  So
it's theoretically possible for the above operations to happen in any order.
What the mutex does in my original example is provide for two things:

    1. that the assignment of p.name will be logically atomic for users of p
    2. that any operation that happens before p.name is assigned in Thread
         A will be completed before the mutex is released

Thus, with the simple addition of a mutex within the p.name get/set operations above, the code is actually safe and correct (though obviously slower, since Thread B is spinning on a mutex lock).

I'll admit that in most real code, however, issue 1 will not exist because the
buffer returned fro getInput will be created via an idup operation, which
again involves a mutex-protected GC operation (with the current GC
anyway).  So the example was a bit contrived, but I feel it does illustrate
the point that invariance as an optional attribute isn't a panacea for
multiprogramming.

At the very least with an imperative language, there must be some means
of guaranteeing expected behavior for shared data.  One route would be to
have a fairly strict memory model (like Java) and another would be to provide
constructs whereby the programmer can indicate the order of operations
('volatile' in D).  C++ 0x will be somewhere in the middle by providing specific
atomic types and operations without dictating too much about the relative order
of other operations (last time I checked anyway--I haven't read the final proposal).

And please forgive me if I'm belaboring the obvious.  It was easier to just cover the topic from end to end than to make assumptions about what was mutually understood.


Sean
April 03, 2008
Jason House wrote:
> Walter Bright wrote:
> 
>> If you do away with transitive const, you cannot have invariant either.
>> Const allows you to reuse code that works on invariants to work with
>> mutables, too.
>>
>> Logical const just utterly breaks the whole system. Every scheme we've
>> looked at for logical const winds up in some way breaking invariants. If
>> invariants are broken, the advantages of them all just go away. I
>> suspect that logical const is like perpetual motion - if you think
>> you've solved it, you've just made a mistake somewhere <g>. I also
>> suspect that the failure of logical const validates the D scheme of
>> invariants as being correct - there shouldn't be any holes in it.
>>
>> You're right that invariant by itself is not enough to specify a pure
>> function, a pure function also cannot read or write global state. But
>> invariant is still a necessary condition, it's just not sufficient.
> 
> Is it possible to describe what tricks the compiler can do given:
>  A. transitive const      w/ mutable access to globals/statics
>  B. transitive invariance w/ mutable access to globals/statics
>  C. transitive const      w/ const access to globals/statics
>  D. transitive invariance w/ invariant access to globals/statics
> 
> It'd be extremely helpful to have that kind of discussion in the const FAQ. Without it, I feel like we're all discussing theoretical stuff with little
> basis for our arguments.

Sure, but static class members are *not* part of the transitive closure state of the object. They're just global variables.

> PS: I went to digitalmars and couldn't find the const FAQ.  It's not linked
> to from the normal FAQ and is not included in the left-side bar on the D
> 2.0 page(s).

That'll happen with the next update.
April 03, 2008
Bill Baxter wrote:
> On the other hand, const preventing you from calling some method you shouldn't be calling (and thereby preventing a bug) doesn't leave much of an impression.  You just go "duh -- of course I can't call that" and move on.  Also bugs caused by data changing when you didn't expect it to don't usually jump out as due to lack of const.  Instead you go "doh! I shouldn't change that there" or "doh! I shouldn't call that function here".  So unfortunately I think the value of const is very difficult to judge using subjective measures.

Misuse of an API is a perennial problem (it has caused Microsoft endless grief trying to maintain backwards compatibility). Transitive const can help with that, as it makes it clear what is "no toucha my mustache!" and what is not.
April 03, 2008
Sean Kelly wrote:
> I know you like to talk about the unreliability of const in C++ because of
> const_cast and the like, but in my opinion those are theoretical objections
> because I've never encountered even a single use of const_cast in actual
> code.

It's not even const_cast. It's simply going *one* level of indirection, and all the const checking is gone, gone, gone. You can have const objects in C++, but not const data structures.

> So while I do agree that C++ const may not be useful for static
> analysis, I also believe that C++ const works just fine for achieving more
> demonstrably correct code.  For example, if I create this class:
> 
>     class BoxedInt
>     {
>         int val;
>     public:
>         BoxedInt( int v ) : val( v ) {}
>         int get() const { return val; }
>         void set( int v ) { val = v; }
>     };
> 
> I am confident that if someone tries to call set() on a const instance of
> BoxedInt a compile-time error will occur.  To me, this means that C++
> const helps to write demonstrably correct code because the compiler
> will trap at least some logic errors related to intended program behavior.
> Sure, the user could theoretically cast away const-ness of his BoxedInt
> object and thus break everything, but as far as I'm concerned that's his
> problem.

You've got no references in BoxedInt. How many data structures do you use that contain no references? Essentially, const in C++ only works for trivial objects. (I'm not saying trivial objects are not useful, I'm just saying they are trivial.)


>> This will become more obvious as C++ tries
>> to compete in the multiprogramming arena. The C++ compiler *cannot* help
>> with multiprogramming issues; the C++ programmer is entirely dependent
>> on visual inspection of the code, making C++ a tedious, labor intensive
>> language for multiprogramming.
> 
> I agree with this.  C++ const is intended to catch logical mistakes and is not
> useful for multiprogramming.
> 
>> How many times have you looked at the documentation for a function API,
>> and wondered which of its parameters were input and which were output?
> 
> Never.

Ok, but I do it all the time.

> Though it perhaps helps a bit that C++ classes are value types.

They are value types, true, but I don't think that is relevant to the issue of transitive const.


> I know that counter-examples could be provided along the lines of:
> 
>     struct C
>     {
>         char* buf;
>     };
> 
>     void fn( const C val )
>     {
>         val.buf[0] = 0;
>     }
> 
> But I've never seen anyone actually make this kind of mistake.  Perhaps I've
> just been lucky.

So what you're saying is you follow the *convention* that const in C++ is transitive. I suggest that this convention is the norm, and that normal idioms are ripe fruit for codification into the language rules.

I further suggest that since the C++ compiler cannot help you find any of those mistakes, your projects may unknowingly suffer from them. I do know that when I've tried to do COW in C++ (which relies on transitive const), there have been many bugs and it's been hard to get them flushed out.


>> When you talk about the "misapplication" of const, does that mean
>> failure to understand what the specific API of a function is?
> 
> Most often, the problem I've seen is for const to not be consistently applied
> to class member functions or within the program itself.  It's been a while so
> I'm actually fuzzy on what the exact problems were, but I remember modifying
> a bunch of unrelated code when I created a new member function in some class.
> If I had to guess I'd say this had something to do with the non-transitive nature
> of const in C++ and that the class contained pointers which were being used in
> questionable ways.

Doesn't something used in a questionable way sound like a bug that const helped you find, even if it was just pointing out something that should be clarified and better documented?
April 03, 2008
Bill Baxter wrote:
> So does that mean
> 
>   pure int f(const Class c)
>   { ...
>   }
> 
> will *not* be allowed?  Because some part of c *could* change, even if it's not f() doing the changing.  I.e. another thread could be changing c concurrently.

Right. That declaration of f is erroneous.

> If so, that seems unnecessarily restrictive to me.  Any side effect or indeterminacy of f() in that case is not because of f()'s doing.  So f()'s behavior *is* pure even if it takes a const argument.  It's not f's fault if the environment it's living in is unstable.

A function is pure or it isn't, there really isn't any room for "unstable" purity, or the point of purity is lost.

Look at it another way. You can take the value of the bits pushed on the stack as arguments to a pure function, and compare that with previous bits passed to the same function, and if there's a binary bit match, you can skip calling the function and return a previously cached result.

If that cannot be done, then the function is NOT pure.

> Maybe there need to be two levels of pure?  One says this function by itself has no side effects, the other says additionally that the function is invariant with respect to the environment.  The former category of functions can all be run in parallel safely provided there are no other threads modifying the environment simultaneously.

No, you cannot run them simultaneously. The problem with const (instead of invariant) is that the transitive state of the passed object is NOT guaranteed to be the same:

pure int foo(const C c) { return c.p[0]; }
class C { int* p; }
C c;
c.p = q;
int i = foo(c);
q[0] = 3;
int j = foo(c);

Note that these foo's cannot be run in parallel, despite c never having been changed. I think that notion of purity is not useful.
April 03, 2008
Walter Bright wrote:

> Jason House wrote:
>> Walter Bright wrote:
>> 
>>> If you do away with transitive const, you cannot have invariant either. Const allows you to reuse code that works on invariants to work with mutables, too.
>>>
>>> Logical const just utterly breaks the whole system. Every scheme we've looked at for logical const winds up in some way breaking invariants. If invariants are broken, the advantages of them all just go away. I suspect that logical const is like perpetual motion - if you think you've solved it, you've just made a mistake somewhere <g>. I also suspect that the failure of logical const validates the D scheme of invariants as being correct - there shouldn't be any holes in it.
>>>
>>> You're right that invariant by itself is not enough to specify a pure function, a pure function also cannot read or write global state. But invariant is still a necessary condition, it's just not sufficient.
>> 
>> Is it possible to describe what tricks the compiler can do given:
>>  A. transitive const      w/ mutable access to globals/statics
>>  B. transitive invariance w/ mutable access to globals/statics
>>  C. transitive const      w/ const access to globals/statics
>>  D. transitive invariance w/ invariant access to globals/statics
>> 
>> It'd be extremely helpful to have that kind of discussion in the const FAQ. Without it, I feel like we're all discussing theoretical stuff with little basis for our arguments.
> 
> Sure, but static class members are *not* part of the transitive closure state of the object. They're just global variables.

That's why I lumped the two together :)

Does saying sure mean that we can look forward to seeing some info along those lines?


>> PS: I went to digitalmars and couldn't find the const FAQ.  It's not linked to from the normal FAQ and is not included in the left-side bar on the D 2.0 page(s).
> 
> That'll happen with the next update.

Thanks.
April 03, 2008
Invariants won't remove the need for mutexes and the like, but invariants themselves won't need to be wrapped inside mutexes. Furthermore, accessing invariants won't need to consider memory fences and order of evaluation issues.
April 03, 2008
Walter Bright wrote:
> Bill Baxter wrote:

>> If so, that seems unnecessarily restrictive to me.  Any side effect or indeterminacy of f() in that case is not because of f()'s doing.  So f()'s behavior *is* pure even if it takes a const argument.  It's not f's fault if the environment it's living in is unstable.
> 
> A function is pure or it isn't, there really isn't any room for "unstable" purity, or the point of purity is lost.
> 
> Look at it another way. You can take the value of the bits pushed on the stack as arguments to a pure function, and compare that with previous bits passed to the same function, and if there's a binary bit match, you can skip calling the function and return a previously cached result.
> 
> If that cannot be done, then the function is NOT pure.

If you want to define it that way then ok.  So lets say the kind of function I'm talking about would be labeled "nosfx".

>> Maybe there need to be two levels of pure?  One says this function by itself has no side effects, the other says additionally that the function is invariant with respect to the environment.  The former category of functions can all be run in parallel safely provided there are no other threads modifying the environment simultaneously.
> 
> No, you cannot run them simultaneously. The problem with const (instead of invariant) is that the transitive state of the passed object is NOT guaranteed to be the same:
> 
> pure int foo(const C c) { return c.p[0]; }
> class C { int* p; }
> C c;
> c.p = q;
> int i = foo(c);
> q[0] = 3;
> int j = foo(c);
> 
> Note that these foo's cannot be run in parallel, despite c never having been changed. I think that notion of purity is not useful.

They can be run in parallel, as long as you don't stick non-pure mutating operations in between them.

That's quite useful for automatically parallelizing tight loops like this:

nosfx int foo(const C c) { return c.p[0]; }
...

C[] array;
...// put bunch of C's in array
int sum = 0;
foreach(c; array) {
   sum += foo(c);
}

Those foreach bodies could be run in parallel (with appropriate reduce logic added to handling of 'sum' by compiler) since we know each call to foo() in that loop has no external side effects.

This is the kind of thing OpenMP lets you do with directives like "#pragma pfor reduce".  Except I don't believe OpenMP has any way to verify that foo() is really side-effect free.

--bb
April 03, 2008
Craig Black wrote:
> This is due to the fact that it has a completely different design goal from C++'s const, based on a hypothetical compiler optimization benefit that no one seems to fully understand.

See http://www.digitalmars.com/d/2.0/const-faq.html#const for what the goals are.