June 04, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #30 from Jonathan M Davis <jmdavisProg@gmx.com> 2012-06-04 03:42:40 PDT ---
> I think you've provided a good explanation of the high-level design of the pure keyword, more than once, but it seems that you are missing that this issue, at least as stated in comment 3, is actually about a very specific detail: The extent to which memory reachably by manipulating passed in pointers is still considered »local«, i.e. accessible by pure functions.

pure doesn't restrict pointers in any way shape or form. That's an @safe/@trusted/@system issue, and is completely orthogonal to pure.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
June 04, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #31 from klickverbot <code@klickverbot.at> 2012-06-04 06:18:02 PDT ---
(In reply to comment #30)
> pure doesn't restrict pointers in any way shape or form. That's an @safe/@trusted/@system issue, and is completely orthogonal to pure.

I guess I _might_ have understood what purity entails and what it doesn't… To quote myself, the question here is the extent to which memory reachable by manipulating passed in pointers is still considered local, i.e. accessible by pure functions. This, conceptually, has nothing to do with @safe/@trusted/@system, even though @safe code cannot manipulate pointers for other reasons.

There are two options: Either, allow pure functions taking pointers to read other memory locations in the same block of allocated values, or restrict access to just the data directly pointed at (which incidentally is also what @safe does, but, again, that's not relevant). Both options are equally valid, and I think the current »spec« is not clear on which one should apply.

The first option, which is currently implemented in DMD, allows functions like strlen() to be pure. On the other hand, it also makes the semantics/implications of `pure` a lot more complex, because it links it to something which is fundamentally not expressible by the type system, namely that for any level of indirection, surrounding parts of the memory might be accessible or not, depending on how it was originally allocated. This is assuming C semantics, because, as Timon mentioned as well, OTOH the D docs don't have a formal definition for this as all.

For example, consider »struct Node { int val; Node* next; } int foo(in Node* head) pure;«. Using the first rule, it is almost impossible to figure out statically what parts of the program state »foo(someHead)« depends on, because if any of the Node instances in the chain was allocated as part of a contiguous block (i.e. array), it would be legal for foo() to read them as well, even though the function calling foo() might not even have been involved in the construction of the list. Thus, the compiler is forced to always assume the worst case in terms of optimization (at least without elaborate DFA), which, in most D programs, is needlessly conservative.

The second option avoid such complications, and allows functions calls with parameters on the heap (and thus pointers) to receive the same kind of optimizations as if the parameters were passed on the stack, which might be impractical. It is also the expected behavior if you are thinking of a pointer literally just as an indirection to a single value stored somewhere else.

Personally, I am not sure what is the better choice; the second option seems like the cleaner design, but I can see the merits of the first one as well. But that's not my point – I am just trying to convince you that the »spec« (or whatever it should really be called) needs improvement in this area, because it frequently confuses people. Your revised version (#128) doesn't define »through their arguments« either, yet this is the crucial point.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
June 04, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8185


Don <clugdbug@yahoo.com.au> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |clugdbug@yahoo.com.au


--- Comment #32 from Don <clugdbug@yahoo.com.au> 2012-06-04 06:31:41 PDT ---
(In reply to comment #31)
> (In reply to comment #30)
> > pure doesn't restrict pointers in any way shape or form. That's an @safe/@trusted/@system issue, and is completely orthogonal to pure.
> 
> I guess I _might_ have understood what purity entails and what it doesn't… To quote myself, the question here is the extent to which memory reachable by manipulating passed in pointers is still considered local, i.e. accessible by pure functions. This, conceptually, has nothing to do with @safe/@trusted/@system, even though @safe code cannot manipulate pointers for other reasons.

I
> 
> There are two options: Either, allow pure functions taking pointers to read other memory locations in the same block of allocated values, or restrict access to just the data directly pointed at (which incidentally is also what @safe does, but, again, that's not relevant). Both options are equally valid, and I think the current »spec« is not clear on which one should apply.
> 
> The first option, which is currently implemented in DMD, allows functions like strlen() to be pure. On the other hand, it also makes the semantics/implications of `pure` a lot more complex, because it links it to something which is fundamentally not expressible by the type system, namely that for any level of indirection, surrounding parts of the memory might be accessible or not, depending on how it was originally allocated. This is assuming C semantics, because, as Timon mentioned as well, OTOH the D docs don't have a formal definition for this as all.
> 
> For example, consider »struct Node { int val; Node* next; } int foo(in Node* head) pure;«. Using the first rule, it is almost impossible to figure out statically what parts of the program state »foo(someHead)« depends on, because if any of the Node instances in the chain was allocated as part of a contiguous block (i.e. array), it would be legal for foo() to read them as well, even though the function calling foo() might not even have been involved in the construction of the list. Thus, the compiler is forced to always assume the worst case in terms of optimization (at least without elaborate DFA), which, in most D programs, is needlessly conservative.

That's correct. You should not expect *any* optimizations from weakly pure functions. The ONLY purpose of weakly pure functions is to increase the number of strongly pure functions. In all other respects, they are no different from an impure function.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
June 04, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #33 from klickverbot <code@klickverbot.at> 2012-06-04 06:43:32 PDT ---
(In reply to comment #32)
> That's correct. You should not expect *any* optimizations from weakly pure functions. The ONLY purpose of weakly pure functions is to increase the number of strongly pure functions. In all other respects, they are no different from an impure function.

Const-pure functions invoked with immutable _arguments_ (even though parameters might only be const) can receive exactly the same amount of optimizations. Even if not implemented in DMD today (as are many other possible purity-related optimizations), this is very useful, because otherwise functions would have to accept immutable values just for the sake of optimization even though they could work with const values just as well otherwise.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
June 04, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #34 from Denis Shelomovskij <verylonglogin.reg@gmail.com> 2012-06-04 19:08:08 MSD ---
(In reply to comment #33)
> (In reply to comment #32)
> > That's correct. You should not expect *any* optimizations from weakly pure functions. The ONLY purpose of weakly pure functions is to increase the number of strongly pure functions. In all other respects, they are no different from an impure function.
> 
> Const-pure functions invoked with immutable _arguments_ (even though parameters might only be const) can receive exactly the same amount of optimizations. Even if not implemented in DMD today (as are many other possible purity-related optimizations), this is very useful, because otherwise functions would have to accept immutable values just for the sake of optimization even though they could work with const values just as well otherwise.

Have you noticed that as I wrote in comment 20 strong unsafe pure functions like
---
size_t f(size_t) nothrow pure;
---
also almost always can't be optimized out?

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
June 04, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #35 from Denis Shelomovskij <verylonglogin.reg@gmail.com> 2012-06-04 19:18:33 MSD ---
For Jonathan M Davis: here (as before) when I say "optimization" I mean
"doesn't behave such way that can be optimized" which means "doesn't behave
such way that is expected/desired (IMHO)/etc.".

Example (for everybody):
---
int f(size_t) pure;

__gshared int tmp;
void g(size_t, ref int dummy = tmp) pure;

void h(size_t a, size_t b) pure
{
    int res = f(a);
    g(b);
    assert(res == f(a)); // may fail, no guaranties by language!
}
---

So pure looks for me more then just useless. It looks dangerous because it confuses people and forces them to think that the second `assert` will pass. At least, with existing docs (or with pull 128).

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
June 04, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #36 from Jonathan M Davis <jmdavisProg@gmx.com> 2012-06-04 08:45:00 PDT ---
> int f(size_t) pure;

> __gshared int tmp;
> void g(size_t, ref int dummy = tmp) pure;

> void h(size_t a, size_t b) pure
> {
>    int res = f(a);
>    g(b);
>    assert(res == f(a)); // may fail, no guaranties by language!
>}

Your g(b) causes h to be impure, because it accesses tmp, which is __gshared. Also, as far as eliding additional calls to pure functions, at present, they only occur within the same line, and I think that may only ever occur within the same expression (it's either expression or statement, I'm not sure which). So, the eliding of additional pure function calls is going to be quite rare. The _primary_ benefit of pure is how it enables you to reason about your code. You _know_ that f doesn't mess with anything other than the argument that you passed to it without having to look at its body at all.

Oh, and the assertion _is_ guaranteed to pass. a and res are both value types. Neither res nor a are passed to anything or accessed in any way other than in the the lines with the calls to f, and even if g were impure, and it screwed with whatever argument was passed as the first argument to the h call, it wouldn't be able to mess with the value of a, because it was already copied.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
June 04, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #37 from klickverbot <code@klickverbot.at> 2012-06-04 09:03:18 PDT ---
(In reply to comment #34)
> […] strong unsafe pure functions […]

Please note that @safe-ty of a function has nothing to do with purity. Yes in a @system/@trusted pure function, it's easy to do impure things, but if you do, it's your fault, not that of the language/type system.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
June 04, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #38 from art.08.09@gmail.com 2012-06-04 09:08:38 PDT ---
(In reply to comment #23)
> (In reply to comment #14)
> > (In reply to comment #13)
> > > (In reply to comment #12)
> > > > (In reply to comment #11)
> > > > > Pointers may only access their own memory blocks, therefore exactly those blocks participate in argument value and return value.
> > > > 
> > > > What does 'their own memory block' mean?
> > > 
> > > The allocated memory block it points into.
> > 
> > But, as the bounds are unknown to the compiler, it does not have the this information, it has to assume everything is reachable via the pointer.
> 
> 1. It does not need the information. Dereferencing a pointer outside the valid
>    bounds results in undefined behavior. Therefore the compiler can just ignore
>    the possibility.

The problem is there are no "valid bounds". Unless you'd like to declare
   (char* p) {return p[1];}
as invalid, which as you yourself say is restrictive (but IMO acceptable for
pure functions, at least the ones that are automatically inferred as pure).

> 2. It can gain some information at the call site. Eg:
> 
> int foo(const(int)* y)pure;
> 
> void main(){
>     int* x = new int;
>     int* y = new int;
>     auto a = foo(x);
>     auto b = foo(y);
>     auto c = foo(x);
> 
>     assert(a == c);
> }

According to certain replies in this report, that assertion could fail. :)

But i get what you're saying - now consider this foo() definition instead:

   int foo()(const(int)* y) {
      int r;
      foreach (i; 0..size_t.max)
         r += y[i];
      return r;
   }

   /* same main () */

The compiler will treat foo() as pure, so if it would be able to act on the
a==c assumption above, it could also do the same here. And now it would be
completely wrong - the function doesn't even try to pretend that it's pure, yet
it will be inferred as if it were and there's no (clean) way to prevent that.
If the compiler optimizes based on a==c, it will miscompile the program.
This is why the restrictions on what is accessed via a pointer in a pure
function is necessary. Note it only matters for templates/literals/lambdas, ie
the cases where purity is inferred; the programmer can always add the purity
tag when he knows it is (logically) safe (eg most C string functions).

And yes, my example code doesn't make sense as-is, but it only servers to illustrate the problem, there are sane implementations of foo(T*p) which under the right conditions will have the same issues.

BTW, is my foo() above @safe? According to the compiler here - it is.


> 3. Aliasing is the classic optimization killer even without 'pure'.

Yes. Maybe it's a good thing that D doesn't attempt to define it, given the amount of confusion something like "pure" causes...


> 4. Invalid use of pointers can break every other aspect of the type system.
>    Why single out 'pure' ?

It has nothing to do with "invalid use of pointers", unless, again, p[1] is deemed invalid.


> > This is
> > why i suggested above that only dereferencing a pointer should be allowed in
> > pure functions.
> > 
> 
> This is too restrictive.

What else do you want to be able to do with a pointer in a pure function? Dereferencing it and working with the value itself should work, anything else? Note that you should be able to explicitly tell the compiler to assume something is pure even when the code accesses more than just the pointed-to element.


> > And one way to make it work is to forbid dereferencing pointers and require fat ones. Then the bounds would be known.
> 
> The bounds are usually known only at runtime.
> The compiler does not have more to work with.
> From the compiler's point of view, an array access out of bounds
> and an invalid pointer dereference are very similar.

Having well defined aliasing rules would help, yes, but I think that's beyond the scope of this bug.


> > > > and, if the access isn't restricted somehow, makes the function dependent on global memory state.
> > > 
> > > ? A function independent of memory state is useless.
> > 
> > int n(int i) {return i+42;}
> > 
> 
> Where do you store the parameter 'i' if not in some memory location?

I said "global memory state". The parameters are *local* state, just like variables - they can not escape (you can't return their address) and the values depend only on function inputs. Arguments containing references can be seen as part of the global state, but those are explicitly defined as inputs that the function depends on. And that definition wrt to pointers is exactly what this bug is about.


> > > f4 _is_ 'pure' (it does not access non-immutable free variables). The compiler is not allowed to perform optimizations that change defined program behavior.
> > 
> > f4 isn't pure, by any definition - it depends on (or in this example modifies) state, which the caller may not even consider reachable.
> 
> Then it is the caller's fault. What is considered reachable is well-defined, and f4 must document its valid inputs.

f4() takes a pointer; AFAICT you've said above that it should be able to do more than just dereference it. So what exactly is considered reachable?


> > The compiler can
> > assume that a pure function does not access any mutable state other than what
> > can be directly or indirectly reached via the arguments -- that is what
> > function purity is all about. If the compiler has to assume that a pure
> > function that takes a pointer argument can read or modify everything, the
> > "pure" tag becomes worthless.
> 
> No pointer _argument_ necessary.
> 
> int foo()pure{
>     enum int* everything = cast(int*)...;
>     return *everything;
> }
> 
> As I already pointed out, unsafe language features can be used to subvert the

p[i] can be just as dangerous as the cast. The questions is - can the compiler treat a function containing these constructs as still pure? If the programmer says so, it's fine - purity by convention works.


> type system. If pure functions should be restricted to the safe subset, they can be marked @safe, or compiled with the -safe compiler switch.

   int foo()(int* y) @safe {
      int r;
      foreach (i; 0..size_t.max)
         r += y[i]++;
      return r;
   }

But it's not related to this bug.


> > And what's worse, it allows other "truly" pure
> > function to call our immoral one.
> > 
> 
> Nothing wrong with that.

It is wrong - if a pure functions can be optimized out and it calls another one that has side effects. Again, the case when a human incorrectly tags a function is not really the problem, it's when the compiler does that behind the programmers back.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
June 04, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #39 from klickverbot <code@klickverbot.at> 2012-06-04 09:13:14 PDT ---
(In reply to comment #38)
> BTW, is my foo() above @safe? According to the compiler here - it is.

If so, please open a new issue – this is clearly a bug.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------