June 04, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #20 from Denis Shelomovskij <verylonglogin.reg@gmail.com> 2012-06-04 11:54:40 MSD ---
(In reply to comment #19)
> I honestly don't understand why much in the way of examples are needed.

OK. I have written some examples. Are they too obvious to not be in docs? Honestly, I'll be amazed if most of D programmers have thought about most of that cases.

Examples:

pure functions (not sure if @system only or @safe too) in D are guaranteed to be pure only if used according to it's documentation. There is no guarantees in other case.
---
/// b argument have to be true or result will depend on global state size_t f(size_t i, bool b) pure; // strongly pure

void main()
{
    size_t i1 = f(1, false); // can depend on global state
    size_t i2 = f(1, false); // f is free to produce different result here
    // And if second f call is optimized out using i2 = i1,
    // (because f is strongly pure) a program will behave
    // differently in release mode so be careful.
}
---

For @system pure functions, it's your responsibility to pass correct arguments to functions. These functions (even strongly pure) can be impure for "incorrect" arguments and even results in "undefined behavior".
---
extern (C) size_t strlen(in char* s) nothrow pure; // strongly pure

/// cstr must be zero-ended
size_t myStrlen(in char[] cstr) pure // strongly pure
{
    return strlen(cstr.ptr);
}

void main()
{
    char[3] str = "abc";
    // str isn't zero-ended so myStrlen call
    // results in undefined behavior.
    size_t l1 = myStrlen(str);
    size_t l2 = myStrlen(str); // can give different result
}
---

@system strongly pure functions often can't be optimized out:
---
extern (C) size_t strlen(in char* s) nothrow pure; // strongly pure

void f(in char* cstr, int* n) pure
{
    // strlen have to be executed every iteration,
    // because compiler doesn't know if n is
    // connected with cstr someway
    for(size_t i = 0; i < strlen(cstr); ++i)
    {
        *n += cstr[i];
    }
}
---

Same apply even if these functions hasn't pointers/arrays in it's signature:
---
size_t f(size_t) nothrow pure; // strongly pure

void g(size_t i1, ref size_t i2) pure
{
    // f have to be executed every iteration,
    // because compiler doesn't know if i1 is
    // connected with i2 someway (f can expect
    // that it's argument is an address of i2)
    for(size_t i = 0; i < f(i1); ++i)
    {
        i2 *= 3;
    }
}
---

One has to carefully watch if a function is strongly pure by it's signature (the compiler is guaranteed to determine function purity type by it's signature only to prevent different behavior between cases with/without a signature):
---
void f(size_t x) pure // strongly pure, can't have side effects
{
    *cast(int*) x = 5; // undefined behavior
}


__gshared int tmp;
void g(size_t x, ref int dummy = tmp) pure // weakly pure, can have side
effects
{
    *cast(int*) x = 5; // correct
}
---

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
June 04, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #21 from Jonathan M Davis <jmdavisProg@gmx.com> 2012-06-04 01:22:53 PDT ---
???

Why would you be marking a function as pure if it can access global state? The
compiler would flag that unless you cheated through casts or the use of
extern(C) functions where you marked the declaration as pure but not the
definition (since pure isn't part of the name mangling for extern(C)
functions).

Also, none of your examples using in are strongly pure. At present, the parameters must be _immutable_ or implicitly convertible to immutable for the function to be strongly pure. The only way that const or in would work is if they were passed immutable arguments, but the compiler doesn't treat that as strongly pure right now.

@system has _nothing_ to do with purity. There's no need to bring it up. It's just that @system will let you do dirty tricks (such as casting) to get around pure. Certainly, an @system pure function isn't pure based on its arguments unless it's doing something very wrong. The function would have to be specifically trying to break purity to do that, and then it's the same as when you're dealing with const and the like. There's no need to even bring it up. It's a given with _anything_ where you can cast to do nasty @system stuff.

Adding a description of weakly pure vs strongly pure to the documentation may be valuable, but adding any examples like these would be pointless without it. Also, if you'll notice, the documentation in general is very light on unnecessary examples. It explains exactly what the feature does and gives minimal examples on it. Any that are added should add real value.

pure functions cannot access global mutable state or call any other functions which aren't pure. The compiler will give an error if a function marked as pure does either of those things. What the compiler does in terms of optimizations is up to its implementation. I don't see how going into great detail on whether this particular function signature or that particular function signature can be optimized is going to help much.

It seems to me that the core problem is that many programmers are having a hard time understanding that all that pure means is that pure functions cannot access global mutable state or call any other functions which aren't pure. They keep thinking that it means more than that, and it doesn't. The compiler will use that information to do optimizations where it can (which aren't even always related to strongly pure - e.g. combining const and weakly pure enable optimizations, just not the kind which elide function calls). If programmers would just believe what the description says about what pure means and stop trying to insist that it must mean more than that, I think that they would be a lot less confused. In some respects, discussing stuff like weakly pure and strongly pure just confuses matters. They're effectively implementation details of how some pure-related optimizations are triggered.

It's so very simple and understandable if you leave it at something like "pure functions cannot access global or static variables which are at all mutable - either by the pure function or anything else - and they cannot call other functions which are not pure." That tells you all that you really need to know, and is quite valuable even if _zero_ optimizations were done based on pure, because it helps immensely in being able to think about and understand your program, because you know that a pure function cannot mutate anything which isn't passed to it. I think that you're just overthinking this and overcomplicating things.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
June 04, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #22 from Denis Shelomovskij <verylonglogin.reg@gmail.com> 2012-06-04 13:07:21 MSD ---
(In reply to comment #21)
> Why would you be marking a function as pure if it can access global state? The
> compiler would flag that unless you cheated through casts or the use of
> extern(C) functions where you marked the declaration as pure but not the
> definition (since pure isn't part of the name mangling for extern(C)
> functions).

From your comment before:
> As for stuff like strlen, in that case, you're doing the @system thing of saying that yes, I know what I'm doing. I know that this function isn't marked as pure, because it's a C function, but I also know that it _is_ actually pure.

`strlen` is now pure (marked by Andrei Alexandrescu) and it can access global state once used with non-zero-ended string. I just made situation more evident.

> Also, none of your examples using in are strongly pure. At present, the parameters must be _immutable_ or implicitly convertible to immutable for the function to be strongly pure. The only way that const or in would work is if they were passed immutable arguments, but the compiler doesn't treat that as strongly pure right now.

From your comment before:
> When the compiler can guarantee that all of a pure function's arguments _cannot_ be altered by that function, _then_ it is strongly pure.

So I just don't know how strlen can change its argument...

> @system has _nothing_ to do with purity. There's no need to bring it up.

IMHO, yes it is. Because @safe and @system pure functions looks very different for me. And yes, I can be wrong.

> It's just that @system will let you do dirty tricks (such as casting) to get around
> pure. Certainly, an @system pure function isn't pure based on its arguments
> unless it's doing something very wrong. The function would have to be
> specifically trying to break purity to do that, and then it's the same as when
> you're dealing with const and the like. There's no need to even bring it up.
> It's a given with _anything_ where you can cast to do nasty @system stuff.

Does strlen doing something very wrong or specifically trying to break purity when it accessing random memory?

> Adding a description of weakly pure vs strongly pure to the documentation may be valuable, but adding any examples like these would be pointless without it. Also, if you'll notice, the documentation in general is very light on unnecessary examples. It explains exactly what the feature does and gives minimal examples on it. Any that are added should add real value.
> 
> pure functions cannot access global mutable state or call any other functions which aren't pure. The compiler will give an error if a function marked as pure does either of those things. What the compiler does in terms of optimizations is up to its implementation. I don't see how going into great detail on whether this particular function signature or that particular function signature can be optimized is going to help much.

Yes it is because as I wrote:
> Once it will have examples showing what asserts have to/may/shouldn't pass and/or (I prefer and) what optimizations can be done.

optimizations = what asserts should pure functions confirm = what is pure function

> It seems to me that the core problem is that many programmers are having a hard time understanding that all that pure means is that pure functions cannot access global mutable state or call any other functions which aren't pure. They keep thinking that it means more than that, and it doesn't. The compiler will use that information to do optimizations where it can (which aren't even always related to strongly pure - e.g. combining const and weakly pure enable optimizations, just not the kind which elide function calls). If programmers would just believe what the description says about what pure means and stop trying to insist that it must mean more than that, I think that they would be a lot less confused. In some respects, discussing stuff like weakly pure and strongly pure just confuses matters. They're effectively implementation details of how some pure-related optimizations are triggered.

strlen and other system functions does access global state in some cases. It's pure. And I'm confused if there is no explanation on _how exactly pure functions can access global state_.

> It's so very simple and understandable if you leave it at something like "pure functions cannot access global or static variables which are at all mutable - either by the pure function or anything else - and they cannot call other functions which are not pure."

No. They call everything that want and do everything they want (see druntme pull 198). They just should behave like a pure functions for a user. And I don't clearly understand what does it mean "to behave like a pure function". That's why this issue is created. That's why I want to see what asserts should pure functions confirm.

> That tells you all that you really need to know,
> and is quite valuable even if _zero_ optimizations were done based on pure,

Again, I'm not interesting in optimizations for optimization now. They just can explain what is a pure function.

> because it helps immensely in being able to think about and understand your program, because you know that a pure function cannot mutate anything which isn't passed to it.

It gives me nothing because I still doesn't know what is passed to it as I wrote:
> What exactly does these pure functions consider as `argument value` and as `returned value`?


> I think that you're just overthinking this and overcomplicating things.

May be. Just like a contrary case.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
June 04, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #23 from timon.gehr@gmx.ch 2012-06-04 02:22:54 PDT ---
(In reply to comment #14)
> (In reply to comment #13)
> > (In reply to comment #12)
> > > (In reply to comment #11)
> > > > Pointers may only access their own memory blocks, therefore exactly those blocks participate in argument value and return value.
> > > 
> > > What does 'their own memory block' mean?
> > 
> > The allocated memory block it points into.
> 
> But, as the bounds are unknown to the compiler, it does not have the this information, it has to assume everything is reachable via the pointer.

1. It does not need the information. Dereferencing a pointer outside the valid
   bounds results in undefined behavior. Therefore the compiler can just ignore
   the possibility.
2. It can gain some information at the call site. Eg:

int foo(const(int)* y)pure;

void main(){
    int* x = new int;
    int* y = new int;
    auto a = foo(x);
    auto b = foo(y);
    auto c = foo(x);

    assert(a == c);
}

3. Aliasing is the classic optimization killer even without 'pure'.
4. Invalid use of pointers can break every other aspect of the type system.
   Why single out 'pure' ?

> This is
> why i suggested above that only dereferencing a pointer should be allowed in
> pure functions.
> 

This is too restrictive.


> And one way to make it work is to forbid dereferencing pointers and require fat ones. Then the bounds would be known.

The bounds are usually known only at runtime.
The compiler does not have more to work with.
From the compiler's point of view, an array access out of bounds
and an invalid pointer dereference are very similar.

> > > and, if the access isn't restricted somehow, makes the function dependent on global memory state.
> > 
> > ? A function independent of memory state is useless.
> 
> int n(int i) {return i+42;}
> 

Where do you store the parameter 'i' if not in some memory location?


> > f4 _is_ 'pure' (it does not access non-immutable free variables). The compiler is not allowed to perform optimizations that change defined program behavior.
> 
> f4 isn't pure, by any definition - it depends on (or in this example modifies) state, which the caller may not even consider reachable.

Then it is the caller's fault. What is considered reachable is well-defined, and f4 must document its valid inputs.

> The compiler can
> assume that a pure function does not access any mutable state other than what
> can be directly or indirectly reached via the arguments -- that is what
> function purity is all about. If the compiler has to assume that a pure
> function that takes a pointer argument can read or modify everything, the
> "pure" tag becomes worthless.

No pointer _argument_ necessary.

int foo()pure{
    enum int* everything = cast(int*)...;
    return *everything;
}

As I already pointed out, unsafe language features can be used to subvert the type system. If pure functions should be restricted to the safe subset, they can be marked @safe, or compiled with the -safe compiler switch.

> And what's worse, it allows other "truly" pure
> function to call our immoral one.
> 

Nothing wrong with that.

> Hmm, another way out of this could be to require all pointers args in a pure
> function to target 'immutable' - but that, again, seems to limiting; "bool f(in Struct* s)" could not be pure.

This is why the restriction was dropped.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
June 04, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #24 from timon.gehr@gmx.ch 2012-06-04 02:41:16 PDT ---
(In reply to comment #22)
> 
> `strlen` is now pure (marked by Andrei Alexandrescu) and it can access global state once used with non-zero-ended string. I just made situation more evident.
> 

It may not be used with a non-zero-ended string.
See eg. http://www.cplusplus.com/reference/clibrary/cstring/strlen/

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
June 04, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #25 from klickverbot <code@klickverbot.at> 2012-06-04 03:07:52 PDT ---
I am partly playing Devil's advocate here, but:

(In reply to comment #23)
> > This is
> > why i suggested above that only dereferencing a pointer should be allowed in
> > pure functions.
> > 
> This is too restrictive.

Why?

> > And one way to make it work is to forbid dereferencing pointers and require fat ones. Then the bounds would be known.
> 
> The bounds are usually known only at runtime.
> The compiler does not have more to work with.
> From the compiler's point of view, an array access out of bounds
> and an invalid pointer dereference are very similar.

There is an important semantic difference between these two – a slice is a bounded region of memory, whereas a pointer per se just represents a reference to a single value.
---
int foo(int* p) pure {
  return *(p - 1); // Is this legal?
}

auto a = new int[10];
foo(a.ptr + 1);
---

> > > ? A function independent of memory state is useless.
> > 
> > int n(int i) {return i+42;}
> Where do you store the parameter 'i' if not in some memory location?

In a register, but that's besides the point – which is that the type of i, int, makes it clear that n depends on exactly four bytes of memory. In »struct Node { Node* next; } void foo(Node* n) pure;«, on the other hand, following your interpretation foo() might depend on an almost arbitrarily large amount of memory (consider e.g. uninitialized memory in the area between a heap-allocated Node instance and the end of the block where it resides, which, if interpreted as Node instance(s), might have »false pointers« to other memory blocks, etc.).

> > > f4 _is_ 'pure' (it does not access non-immutable free variables). The compiler is not allowed to perform optimizations that change defined program behavior.
> > 
> > f4 isn't pure, by any definition - it depends on (or in this example modifies) state, which the caller may not even consider reachable.
> 
> Then it is the caller's fault. What is considered reachable is well-defined […]

Is it? Could you please repeat the definition then, and point out how this is clear from the definition of purity according to the spec, »Pure functions are functions that produce the same result for the same arguments«.

> and f4 must document its valid inputs.
---
/// Passing anything other than `false` is illegal.
int g_state;
void foo(bool neverTrue) pure {
   if (neverTrue) g_state = 42;
}
---

Should this be allowed to be pure? Well, if strlen is, then ostensibly yes, but isn't this too permissive of an interpretation, as the type system can't actually guarantee it? Shouldn't rather a cast to pure at the _call site_ be required if called with know good values, just as in other cases where the type system can't prove a certain invariant, but the programmer can? Purity by convention works just fine without the pure keyword as well…

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
June 04, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #26 from Jonathan M Davis <jmdavisProg@gmx.com> 2012-06-04 03:19:22 PDT ---
I'd actually argue that the line "Pure functions are functions that produce the same result for the same arguments" should be removed from the spec. Ostensibly, yes. The same arguments will result in the same result, but that doesn't really have anything to do with how pure is defined. It's more like it's a side effect of the fact that you can't access global mutable state. It's true that the compiler will elide additional function calls within an expression in cases where the same function is called multiple times with the same arguments and the compiler can guarantee that the result will be the same, but that's arguably an implementation detail of the optimizer.

While the origin and original motivation for pure in D was to enable optimizations based on functional purity (multiple calls to the same function with the same arguments are guaranteed to have the same results), that's not really what pure in D does now, and talking about that clouds the issue something awful, as this bug report demonstrates.

Pure means solely that the function cannot access any global or static variables which can be mutated either directly or indirectly once instantiated and that the function cannot call any other functions which are not pure. That enables the whole "same result for the same arguments" thing, but it does _not_ mean that in and of itself. The simple fact that an argument could have a function on it which returns the value of a mutable global variable without that variable being part of its state at all negates that.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
June 04, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #27 from klickverbot <code@klickverbot.at> 2012-06-04 03:38:12 PDT ---
(In reply to comment #26)
> While the origin and original motivation for pure in D was to enable optimizations based on functional purity (multiple calls to the same function with the same arguments are guaranteed to have the same results), that's not really what pure in D does now, and talking about that clouds the issue something awful, as this bug report demonstrates.

I think you've provided a good explanation of the high-level design of the pure keyword, more than once, but it seems that you are missing that this issue, at least as stated in comment 3, is actually about a very specific detail: The extent to which memory reachably by manipulating passed in pointers is still considered »local«, i.e. accessible by pure functions.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
June 04, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #28 from Jonathan M Davis <jmdavisProg@gmx.com> 2012-06-04 03:39:33 PDT ---
https://github.com/D-Programming-Language/d-programming-language.org/pull/128

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
June 04, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #29 from timon.gehr@gmx.ch 2012-06-04 03:39:52 PDT ---
(In reply to comment #25)
> I am partly playing Devil's advocate here, but:
> 
> (In reply to comment #23)
> > > This is
> > > why i suggested above that only dereferencing a pointer should be allowed in
> > > pure functions.
> > > 
> > This is too restrictive.
> 
> Why?

Because safety is an orthogonal concern. eg. strlen is a pure function.
By the same way of reasoning, all unsafe features could be banned in all parts
of the code, not just in pure functions.

> 
> > > And one way to make it work is to forbid dereferencing pointers and require fat ones. Then the bounds would be known.
> > 
> > The bounds are usually known only at runtime.
> > The compiler does not have more to work with.
> > From the compiler's point of view, an array access out of bounds
> > and an invalid pointer dereference are very similar.
> 
> There is an important semantic difference between these two – a slice is a bounded region of memory, whereas a pointer per se just represents a reference to a single value.

Yes, 'per se'. Effectively, it references all memory in the same allocated memory block. (This is also the view taken by the GC.)

> ---
> int foo(int* p) pure {
>   return *(p - 1); // Is this legal?
> }
> 

If it is legal depends on whether or not *(p-1) is part of the same memory
block. A conservative analysis (as is done in @safe code) would have to flag
the access as illegal.

> auto a = new int[10];
> foo(a.ptr + 1);
> ---

a.ptr is a pointer. The arithmetics are flagged as illegal in @safe code even though it is safe. What do the examples show?


> 
> > > > ? A function independent of memory state is useless.
> > > 
> > > int n(int i) {return i+42;}
> > Where do you store the parameter 'i' if not in some memory location?
> 
> In a register, but that's besides the point

Indeed, because a register is just memory after all.

> – which is that the type of i, int,
> makes it clear that n depends on exactly four bytes of memory. In »struct Node
> { Node* next; } void foo(Node* n) pure;«, on the other hand, following your
> interpretation foo() might depend on an almost arbitrarily large amount of
> memory (consider e.g. uninitialized memory in the area between a heap-allocated
> Node instance and the end of the block where it resides,
> which, if interpreted as Node instance(s), might have »false pointers« to other memory blocks, etc.).
> 

The language does not define such a thing. Accessing this area therefore results in undefined behavior.

> > > > f4 _is_ 'pure' (it does not access non-immutable free variables). The compiler is not allowed to perform optimizations that change defined program behavior.
> > > 
> > > f4 isn't pure, by any definition - it depends on (or in this example modifies) state, which the caller may not even consider reachable.
> > 
> > Then it is the caller's fault. What is considered reachable is well-defined […]
> 
> Is it? Could you please repeat the definition then,

It is written down in the C standard. There is no formal specification for D.

> and point out how this is
> clear from the definition of purity according to the spec,

This would not be defined in the pages about purity, but rather in the pages about pointer arithmetics, which are missing, presumably because they would be the same as in C.

> »Pure functions are
> functions that produce the same result for the same arguments«.
> 

This is not a definition of the 'pure' keyword. It relies on informal terms such as 'the same' and does not require annotation of a function. Therefore the sentence should be dropped from the documentation.

If a function is marked with 'pure', then it may not reference mutable free variables.

> > and f4 must document its valid inputs.
> ---
> /// Passing anything other than `false` is illegal.
> int g_state;
> void foo(bool neverTrue) pure {
>    if (neverTrue) g_state = 42;
> }
> ---
> 
> Should this be allowed to be pure? Well, if strlen is, then ostensibly yes, but

No, because it is trivial to devise an equivalent implementation that does not require the compiler to read documentation comments:

int g_state;
void foo(bool neverTrue) pure in{assert(!neverTrue);} body { }

The same does not hold for 'strlen', therefore the analogy immediately breaks down.

> isn't this too permissive of an interpretation, as the type system can't actually guarantee it? Shouldn't rather a cast to pure at the _call site_ be required if called with know good values, just as in other cases where the type system can't prove a certain invariant, but the programmer can?

The type system of an unsafe language cannot prove _any_ invariants, because unsafe operations may result in undefined behavior. This does not imply we'd better have to drop the entire type system.

> Purity by convention works just fine without the pure keyword as well…

This is not only about purity by convention, it is about memory safety by convention. In @safe code, all the concerns raised immediately disappear.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------