June 03, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #10 from art.08.09@gmail.com 2012-06-03 06:28:09 PDT ---
(In reply to comment #7)

> argument value is all the data reachable via the parameters.  Argument result is all the data reachable via the result.
[...]
> the optimizer.  My gut feeling is the compiler/optimizer should trust the code "knows what it's doing." and so should expect that the code implicitly knows how much data it can access after the pointer.

Having "pure" as an user provided attribute, the compiler completely trusting the programmer and only checking/enforcing certain assumptions when it is easy to do, is a reasonable solution. Anybody that understands the purity concept will have no problem determining if some function is "pure" or not, this is how it is in C, in dialects supporting pure.

Unfortunately, D has purity inference.

   uint f()(immutable ubyte* p) {
      uint r;
      foreach (i; 0..size_t.max)
         r += p[i];
      return r;
   }

Can this still be considered pure?
What about "uint f2()(Struct* p) {/*same body*/}"?
Or

   uint f3()(ubyte* p) {
      uint r;
      foreach (i; 0..size_t.max)
         r += p[i]++;
      return r;
   }

?

All three functions are tagged as pure by the compiler...

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
June 03, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8185


timon.gehr@gmx.ch changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |timon.gehr@gmx.ch


--- Comment #11 from timon.gehr@gmx.ch 2012-06-03 12:18:33 PDT ---
(In reply to comment #0)
> The Question: What exactly does these pure functions consider as `argument value` and as `returned value`? Looks like this is neither documented nor obvious.
> 

Pointers may only access their own memory blocks, therefore exactly those
blocks participate in argument value and return value.
But why does it even matter? Isn't this discussion mostly philosophical?

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
June 03, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #12 from art.08.09@gmail.com 2012-06-03 12:46:28 PDT ---
(In reply to comment #11)
> Pointers may only access their own memory blocks, therefore exactly those blocks participate in argument value and return value.

What does 'their own memory block' mean? The problem is a pointer is basically an unbounded array, and, if the access isn't restricted somehow, makes the function dependent on global memory state.

> But why does it even matter? Isn't this discussion mostly philosophical?

The compiler will happily assume that template functions are pure even when
they clearly are not, and there isn't even a way to mark such functions as
"impure" (w/o using hacks like calling dummy functions etc).
Example - a function that is designed to operate on arrays, will always be
called with a pointer to inside an array, and can assume that the previous and
next element is always valid:

  f4(T)(T* p) {
      p[-1] += p[0];
   }

The compiler thinks f4() is pure, when it clearly is not; optimizations based on that assumption are likely to result in corrupted data.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
June 03, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #13 from timon.gehr@gmx.ch 2012-06-03 12:53:36 PDT ---
(In reply to comment #12)
> (In reply to comment #11)
> > Pointers may only access their own memory blocks, therefore exactly those blocks participate in argument value and return value.
> 
> What does 'their own memory block' mean?

The allocated memory block it points into.

> The problem is a pointer is basically an unbounded array,

That is wrong. The pointer is bounded, but it is generally impossible to devise the exact bounds from the pointer alone. This is why D has dynamic arrays.

> and, if the access isn't restricted somehow, makes the function dependent on global memory state.

? A function independent of memory state is useless.

> 
> > But why does it even matter? Isn't this discussion mostly philosophical?
> 
> The compiler will happily assume that template functions are pure even when
> they clearly are not, and there isn't even a way to mark such functions as
> "impure" (w/o using hacks like calling dummy functions etc).
> Example - a function that is designed to operate on arrays, will always be
> called with a pointer to inside an array, and can assume that the previous and
> next element is always valid:
> 
>   f4(T)(T* p) {
>       p[-1] += p[0];
>    }
> 
> The compiler thinks f4() is pure, when it clearly is not; optimizations based on that assumption are likely to result in corrupted data.

f4 _is_ 'pure' (it does not access non-immutable free variables). The compiler is not allowed to perform optimizations that change defined program behavior.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
June 03, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #14 from art.08.09@gmail.com 2012-06-03 13:52:53 PDT ---
(In reply to comment #13)
> (In reply to comment #12)
> > (In reply to comment #11)
> > > Pointers may only access their own memory blocks, therefore exactly those blocks participate in argument value and return value.
> > 
> > What does 'their own memory block' mean?
> 
> The allocated memory block it points into.

But, as the bounds are unknown to the compiler, it does not have the this information, it has to assume everything is reachable via the pointer. This is why i suggested above that only dereferencing a pointer should be allowed in pure functions.

> > The problem is a pointer is basically an unbounded array,
> 
> That is wrong. The pointer is bounded, but it is generally impossible to devise the exact bounds from the pointer alone. This is why D has dynamic arrays.

And one way to make it work is to forbid dereferencing pointers and require fat ones. Then the bounds would be known. But i don't think anybody would want to write "f(pointer_to_some_struct[0..1])"...

> > and, if the access isn't restricted somehow, makes the function dependent on global memory state.
> 
> ? A function independent of memory state is useless.

int n(int i) {return i+42;}

> > 
> > > But why does it even matter? Isn't this discussion mostly philosophical?
> > 
> > The compiler will happily assume that template functions are pure even when
> > they clearly are not, and there isn't even a way to mark such functions as
> > "impure" (w/o using hacks like calling dummy functions etc).
> > Example - a function that is designed to operate on arrays, will always be
> > called with a pointer to inside an array, and can assume that the previous and
> > next element is always valid:
> > 
> >   f4(T)(T* p) {
> >       p[-1] += p[0];
> >    }
> > 
> > The compiler thinks f4() is pure, when it clearly is not; optimizations based on that assumption are likely to result in corrupted data.
> 
> f4 _is_ 'pure' (it does not access non-immutable free variables). The compiler is not allowed to perform optimizations that change defined program behavior.

f4 isn't pure, by any definition - it depends on (or in this example modifies) state, which the caller may not even consider reachable. The compiler can assume that a pure function does not access any mutable state other than what can be directly or indirectly reached via the arguments -- that is what function purity is all about. If the compiler has to assume that a pure function that takes a pointer argument can read or modify everything, the "pure" tag becomes worthless. And what's worse, it allows other "truly" pure function to call our immoral one.

Hmm, another way out of this could be to require all pointers args in a pure function to target 'immutable' - but that, again, seems to limiting; "bool f(in Struct* s)" could not be pure.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
June 03, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #15 from Jonathan M Davis <jmdavisProg@gmx.com> 2012-06-03 14:40:12 PDT ---
The _only_ thing that the pure attribute means by itself is that that function cannot directly access any mutable global or static variables. That is _all_. It means _nothing_ else. It can mess with pointers. It can mess with in, ref, out, and lazy parameters. It can mess with the elements in a slice (thereby alterining external state). It can mess with mutable global or static variables _indirectly_ via the arguments that it's passed (e.g if a pointer or ref is passed to a global variable). It just cannot _directly_ access any mutable global or static variables.

pure by itself indicates a weakly pure function. That function enables _zero_ optimizations. It is _not_ pure in the sense that the functional or mathematical community would consider pure. It is not even _trying_ to be pure in that sense. What weak purity does is enable _strong_ purity to actually be useful.

When the compiler can guarantee that all of a pure function's arguments _cannot_ be altered by that function, _then_ it is strongly pure. Currently, that gurantee is in effect only when all of the parameters of the function are immutable or implicitly convertible to immutable. It could be extended to const parameters in the case when they're passed immutable arguments, but that isn't currently done.

A strongly pure function cannot alter its arguments at all, but it _can_ allocate memory, and it _can_ mutate any of its local state. _weakly_ pure functions can therefore be called from within a strongly pure function, because the only state that they can alter is the state of what's passed to them (because the fact that they're marked with pure means that they cannot access mutable global or mutable static state except via their arguments), and the only state that the strongly pure function _can_ pass to them is local to it, because it can't access global or static mutable state any more than they can, and it can't even access it via its arguments, because it's strongly pure.

This is all very clear and well-defined.

Having pointers sent off into la-la land doing unsafe @system stuff is a _completely_ separate issue. You can break pretty much _anything_ with @system code. You could even cast a function which called writeln so that that the signature was pure and then call it from a pure function. All bets are off when you're in @system land. It's _your_ job to make sure that your code isn't doing something completely screwy at that point. Any function or operation which the compiler doesn't consider pure would still make a templated function be considered impure in such cases, but because it's @system, you can trick it if you want to (e.g. by casting a function's signature). But it's @system code - unsafe code - so it's your fault at that point, not the compiler's.

I really don't know how the documentation could be much clearer. ref and pointer arguments are't "returned." Only the return value is returned. And arguments are clearly the arguments to the function. And as long as the compiler can determine that nothing has been done to an argument to alter it, it's going to consider to be the same value (and it's going to be _extremely_ conservative about that - even altering a reference or pointer of the same type would make its value be considered different, because they both might point to the same thing).

As for stuff like strlen, in that case, you're doing the @system thing of saying that yes, I know what I'm doing. I know that this function isn't marked as pure, because it's a C function, but I also know that it _is_ actually pure. I know that it won't access global mutable state. So, I will mark it as pure so that it can be used in pure code. I'm telling the compiler that I know better than it does. And in this caes, I do. If I didn't, then you'd have a bug, and it would be the my fault, because they I the compiler what was best, and I was wrong. At that point, it's up to me to make sure that that the compiler's guarantees aren't being violated. That's @system for you. D is a systems programming language. You can do that sort of thing.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
June 03, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #16 from art.08.09@gmail.com 2012-06-03 15:50:29 PDT ---
(In reply to comment #15)
> pure by itself indicates a weakly pure function. That function enables _zero_

Inventing terminology doesn't help, especially when the result is so confusing.


> optimizations. It is _not_ pure in the sense that the functional or mathematical community would consider pure. It is not even _trying_ to be pure in that sense. What weak purity does is enable _strong_ purity to actually be useful.
> 
> When the compiler can guarantee that all of a pure function's arguments _cannot_ be altered by that function, _then_ it is strongly pure. Currently, that gurantee is in effect only when all of the parameters of the function are immutable or implicitly convertible to immutable. It could be extended to const parameters in the case when they're passed immutable arguments, but that isn't currently done.
[...]

tl;dr.

The bugtracker is probably not the right place for this discussion; we could move it to the ML, but talking about it only makes sense if D can be fixed; otherwise we would be wasting our time...

Limiting "pure" to just immutable data would work indeed, but it's much too limiting.

   struct S {int a,b; int[64] c; bool f() const pure {return a||b;}}

   int g(S* p) {
      int r;
      foreach (i; 0..64)
         if (p.f())
            r |= p.c[i];
      return r;
   }


Using your "weak pure" definition, f's "pure" would be a NOOP - that is not
what most people would expect, and is not a sane purity implementation.
It's not a problem for trivial examples such as this one because inlining
should take care of it, but would make "pure" almost useless in real code, as
it would almost never be, to use your terminology again, "strongly" pure (and
couldn't be moved out of the loop).

Note that, even when using your "strong purity" definition, the compiler still does the wrong thing - some of the examples I gave previously in this bug are (and others can be trivially modified to be) inferred as "strongly" pure functions, when they are not pure at all.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
June 03, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #17 from Jonathan M Davis <jmdavisProg@gmx.com> 2012-06-03 16:02:19 PDT ---
They aren't _my_ definitions. They're official. They've been discussed in the newsgroup. They've even been used by folks like Walter Bright in talks at conferences. How purity is implemented in D has been discussed and was decided a while ago. It works well and is not going to change. Weak purity solved a real need. All we had before was strong purity, and it was almost useless, because it was so limited. It is _far_ more useful now that it was before.

A pure function is clearly defined as a function which cannot access global or static state which is mutable. It doesn't matter how other languages use the term pure. That's how D uses it. And in cases where a function is strongly pure, you _do_ get the optimizations based on passing the same arguments to the same pure function multiple times that you'd expect from a more functional language.

If you don't like how D's pure works, that's fine - you're free to have your own opinion, be it dissenting or otherwise - but how pure works in D is _not_ going to change. If bugs are found in the compiler's implementation of it, they will be addressed, but at this point, the design is what it is.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
June 04, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #18 from Denis Shelomovskij <verylonglogin.reg@gmail.com> 2012-06-04 09:38:21 MSD ---
(In reply to comment #15)
> I really don't know how the documentation could be much clearer.

Once it will have examples showing what asserts have to/may/shouldn't pass and/or (I prefer and) what optimizations can be done. Even Setting Dynamic Array Length section has such examples but it is far more simple.

> As for stuff like strlen, in that case, you're doing the @system thing of
saying that yes, I know what I'm doing.

And the missing now words "What exactly does these pure functions consider as
`argument
value` and as `returned value`" from my original question because it's treated
by someone as "only pointer dereferencing" and by someone "access to any
logically accessible address".

Again, all misunderstanding of pure functions in D can be easily solved by just adding (lots of) examples with difficult cases into docs.

IMHO, Jonathan M Davis e.g. will save at least lots of his time (yes, and our time too) by just adding such examples with minimal comments into docs instead of writing such big answers.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
June 04, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #19 from Jonathan M Davis <jmdavisProg@gmx.com> 2012-06-03 22:58:33 PDT ---
I honestly don't understand why much in the way of examples are needed. The documentation explains what pure is. When the compiler is able to optimize out calls to pure functions is an implementation detail - just like optimizations with const or immutable are. You use pure wherever you can, and the compiler will optimize where it can.

The documentation could go into more detail on weakly pure vs strongly pure (since it doesn't mention either), but that's pretty much the only relevant improvement that I can think of, and I know that Don would be annoyed by that, since he wants the terms strongly pure and weakly pure to die and just leave them as implementation details (though I think that he's the only one who really feels that way).

I think that there's a lot of overthinking of this going on here. The documentation quite clearly states what a pure function is and what it can and can't do. I don't see how more examples would really help much with that. But anyone has an idea that they think will improve the documentation, then feel free to create a pull request with the changes.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------