Jump to page: 1 26  
Page
Thread overview
[Issue 8185] New: Pure functions and pointers
Jun 02, 2012
Denis Shelomovskij
Jun 02, 2012
klickverbot
Jun 02, 2012
klickverbot
Jun 02, 2012
Denis Shelomovskij
Jun 02, 2012
klickverbot
Jun 02, 2012
Denis Shelomovskij
Jun 03, 2012
Jonathan M Davis
Jun 03, 2012
Denis Shelomovskij
Jun 03, 2012
timon.gehr@gmx.ch
Jun 03, 2012
timon.gehr@gmx.ch
Jun 03, 2012
Jonathan M Davis
Jun 03, 2012
Jonathan M Davis
Jun 04, 2012
Denis Shelomovskij
Jun 04, 2012
Jonathan M Davis
Jun 04, 2012
Denis Shelomovskij
Jun 04, 2012
Jonathan M Davis
Jun 04, 2012
Denis Shelomovskij
Jun 04, 2012
timon.gehr@gmx.ch
Jun 04, 2012
timon.gehr@gmx.ch
Jun 04, 2012
klickverbot
Jun 04, 2012
Jonathan M Davis
Jun 04, 2012
klickverbot
Jun 04, 2012
Jonathan M Davis
Jun 04, 2012
timon.gehr@gmx.ch
Jun 04, 2012
Jonathan M Davis
Jun 04, 2012
klickverbot
Jun 04, 2012
Don
Jun 04, 2012
klickverbot
Jun 04, 2012
Denis Shelomovskij
Jun 04, 2012
Denis Shelomovskij
Jun 04, 2012
Jonathan M Davis
Jun 04, 2012
klickverbot
Jun 04, 2012
klickverbot
Jun 04, 2012
Denis Shelomovskij
Jun 04, 2012
Jonathan M Davis
Jun 04, 2012
Denis Shelomovskij
Jun 04, 2012
klickverbot
Jun 04, 2012
Denis Shelomovskij
Jun 04, 2012
klickverbot
Jun 04, 2012
klickverbot
Jun 04, 2012
klickverbot
Jul 02, 2012
Walter Bright
June 02, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8185

           Summary: Pure functions and pointers
           Product: D
           Version: D2
          Platform: All
        OS/Version: All
            Status: NEW
          Keywords: spec
          Severity: major
          Priority: P2
         Component: DMD
        AssignedTo: nobody@puremagic.com
        ReportedBy: verylonglogin.reg@gmail.com


--- Comment #0 from Denis Shelomovskij <verylonglogin.reg@gmail.com> 2012-06-02 12:10:50 MSD ---
Look's like there is a big problem with pure functions and pointers.

Consider these functions:
---
int*   f1(in int*   i) pure;
int**  f2(in int**  i) pure;
void*  g1(in void*  p) pure;
void** g2(in void** p) pure;

struct MyArray { int* p; size_t len; }
void** h(in MyArray arg) pure;
---
The Question: What exactly does these pure functions consider as `argument value` and as `returned value`? Looks like this is neither documented nor obvious.

I see the only two ways to document it properly (yes, the main problem is with
`h` function):
 * disallow pure functions to accept pointers or types with pointers;
 * once pure function accepts a pointer it is considered depending on all
process memory;
 * state with BIG RED LETTERS that pure function depends on the address only
and restrict dereferencing of the pointer on a compiler level.

The second way obviously just means the function isn't pure any more.
The third way means the pointer isn't a pointer any more so I'd prefer to
replace is with "The first way" + "f(cast(size_t) ptr)".

More than that, the situation is very dangerous now. E.g. one can consider `strlen` to be pure. It should be clearly stated that purity is compiler checkable, not user checkable with examples like `strlen`. See discussion in Issue 3057.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
June 02, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8185


klickverbot <code@klickverbot.at> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |code@klickverbot.at
           Severity|major                       |enhancement


--- Comment #1 from klickverbot <code@klickverbot.at> 2012-06-02 01:44:18 PDT ---
The current behavior is by design, and perfectly fine – note that `pure` in D just means that a function doesn't access global (mutable) state. A pointer somewhere isn't a problem either, since the caller must have obtained the address from somewhere, and if it was indeed from global state, the calling code couldn't be pure.

Do you have any suggestions on how to make this clearer in the spec? I admit that the design can take some time to wrap one's head around, but I'm not sure what's the best way to make the concept easier to grasp.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
June 02, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #2 from klickverbot <code@klickverbot.at> 2012-06-02 03:12:04 PDT ---
Also, please note that issue 3057 is really old – I think at that point we didn't even have the relaxed purity rules yet.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
June 02, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #3 from Denis Shelomovskij <verylonglogin.reg@gmail.com> 2012-06-02 14:29:01 MSD ---
(In reply to comment #1)
> The current behavior is by design, and perfectly fine – note that `pure` in D just means that a function doesn't access global (mutable) state. A pointer somewhere isn't a problem either, since the caller must have obtained the address from somewhere, and if it was indeed from global state, the calling code couldn't be pure.

OK. Looks like everything works but I don't understand how. So could you please answer the question (read this to the end).

According to http://dlang.org/function.html#pure-functions
> Pure functions are functions that produce the same result for the same arguments.

And my original question is
> The Question: What exactly does these pure functions consider as `argument
value` and as `returned value`?

Illustration:
---
int f(in int* p) pure;

void g()
{
    auto arr = new int[5];
    auto res = f(arr.ptr);

    assert(res == f(arr.ptr));

    assert(res == f(arr.ptr + 1)); // *p isn't changed

    arr[1] = 7;
    assert(res == f(arr.ptr)); // neither p nor *p is changed

    arr[0] = 7;
    assert(res == f(arr.ptr)); // p isn't changed
}
---
Which asserts must pass?

The second assert is here according to http://klickverbot.at/blog/2012/05/purity-in-d/  (yes, it's "Indirections in the Return Type?" section, but sentences looks general and I think it can be treated this way):
> The first essential point are addresses, respectively the definition of equality applied when considering referential transparency. In functional languages, the actual memory address that some value resides at is usually of little to no importance. D being a system programming language, however, exposes this concept.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
June 02, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8185


klickverbot <code@klickverbot.at> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|enhancement                 |normal


--- Comment #4 from klickverbot <code@klickverbot.at> 2012-06-02 07:50:05 PDT ---
(In reply to comment #3)
> And my original question is
> > The Question: What exactly does these pure functions consider as `argument
> value` and as `returned value`?
> 
> Illustration:
> ---
> int f(in int* p) pure;

Thanks for the example, this certainly makes your concerns easier to see. You are right, the spec is really not clear in this regard – but in my opinion, only a single interpretation makes sense, in that it is actually enforceable by the compiler:

---
>     auto res = f(arr.ptr);
>     assert(res == f(arr.ptr));
This one obviously has to pass.

>     assert(res == f(arr.ptr + 1)); // *p isn't changed
Might fail, f is allowed to return cast(int)p.

>     arr[1] = 7;
>     assert(res == f(arr.ptr)); // neither p nor *p is changed
Must pass, reading/modifying random bits of memory inside pure functions is obviously a bad idea. Bad idea meaning that pointer arithmetic is disallowed in @safe code anyway, and in @system code, you as the programmer are responsible for not violating the type system guarantees – for example, you can just call any impure function in a pure context using a cast. This also means that e.g. C string functions cannot not be pure in D.

>     arr[0] = 7;
>     assert(res == f(arr.ptr)); // p isn't changed
Might fail, as discussed in the »What about Referential Transparency« section of the article – only if the parameters are _transitively_ equal (as defined by their type), then pure functions are guaranteed to return the same value.

> The second assert is here according to http://klickverbot.at/blog/2012/05/purity-in-d/.
Then this aspect of the article is apparently not as clear as it could be – thanks for the feedback, I'll incorporate it in the next revision.
---

Do you disagree with any of these points? If so, I'd be happy to provide a more in-depth explanation of my view, so we can clarify the spec afterwards.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
June 02, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8185


art.08.09@gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |art.08.09@gmail.com


--- Comment #5 from art.08.09@gmail.com 2012-06-02 08:22:14 PDT ---
(In reply to comment #0)

> I see the only two ways to document it properly (yes, the main problem is with
> `h` function):

>  * once pure function accepts a pointer it is considered depending on all
> process memory;

That would work, but would probably be too limiting.


 * Allow only dereferencing the pointer, disallow any kind of indexing. Note
it's not trivial, as pointer arithmetic should still work. But probably doable,
by disallowing dereferencing at all, and making a special exception for
accessing via an unmodified argument. This would also have to work recursively,
so it basically comes down to introducing a special kind of pointer, that
behaves a bit more like a reference. The alternatives are the ones you listed,
either banning pointers or assuming the function depends on everything -
neither is really acceptable. A pure function shouldn't deal with unbounded
arrays, so this kind of restriction should be fine (the alternative is to have
to slice everything, which is not a sane solution, eg when working with
pointers to structs)

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
June 02, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #6 from Denis Shelomovskij <verylonglogin.reg@gmail.com> 2012-06-02 19:59:12 MSD ---
(In reply to comment #4)
> (In reply to comment #3)
> >     assert(res == f(arr.ptr + 1)); // *p isn't changed
> Might fail, f is allowed to return cast(int)p.

Am I understanding correct that:
---
int[] f() pure;
int g(in int[] a) pure;
int gs(in int[] a) @safe pure;

void h()
{
    assert(g(f()) == g(f()));   // May or may not pass
    assert(gs(f()) == gs(f())); // Should pass
}
---
?

> >     arr[1] = 7;
> >     assert(res == f(arr.ptr)); // neither p nor *p is changed
> Must pass,...

So this code is invalid:
---
void f(int* i) pure @safe // or unsafe, doesn't matter
{ ++i[1]; }
---
and this is invalid too:
---
struct MyArray {
    int* p;
    size_t len;

    ...

    int opIndex(size_t i) pure @safe // or unsafe, doesn't matter
    in { assert(i < len); }
    body {
        return p[len];
    }
}
---
?

And this is valid:
---
void f(int* i) pure @safe // or unsafe, doesn't matter
{ ++*i; }
---
?

> reading/modifying random bits of memory inside pure functions is
> obviously a bad idea. Bad idea meaning that pointer arithmetic is disallowed in
> @safe code anyway, and in @system code, you as the programmer are responsible
> for not violating the type system guarantees – for example, you can just call
> any impure function in a pure context using a cast. This also means that e.g. C
> string functions cannot not be pure in D.

I'm a bit confused because I didn't mention @safe attribute. If you have a time I'd like to see about @safe/unsafe pure functions differences in your article because it looks like these things are really different.

> > The second assert is here according to http://klickverbot.at/blog/2012/05/purity-in-d/.
> Then this aspect of the article is apparently not as clear as it could be – thanks for the feedback, I'll incorporate it in the next revision.

Not sure, my English is rather bad so I could just misunderstand something.

> Do you disagree with any of these points? If so, I'd be happy to provide a more in-depth explanation of my view, so we can clarify the spec afterwards.


`void f(void*) pure;` is still unclear for me. What can it do? What can it do
if it's @safe?

And I completely misunderstand why pure functions can't be optimized out as Steven Schveighoffer sad in druntime pull 198 comment:
> The fact that it returns mutable makes it weak pure (the optimizer cannot remove any calls to gc_malloc)
(yes, this is a general question, not pointers only)

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
June 03, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8185


Steven Schveighoffer <schveiguy@yahoo.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |schveiguy@yahoo.com


--- Comment #7 from Steven Schveighoffer <schveiguy@yahoo.com> 2012-06-02 17:48:23 PDT ---
All of the functions(In reply to comment #3)
> 
> According to http://dlang.org/function.html#pure-functions
> > Pure functions are functions that produce the same result for the same arguments.

This is certainly true.  However, it's not practical nor always possible for the compiler to determine if a call can be optimized out.  Consider that on any call to a pure function that takes mutable data, the function could modify the data, so even calling with the same exact pointer again may result in a new effective parameter.

However, if a function has only immutable or implicitly convertible to immutable parameters and return values, the function *can* be optimized out, because it's guaranteed nothing ever changes.

This situation is what has been called "strong pure".  It's the equivalent to functional language purity.

It's possible in certain situations for a "weak pure" function to be considered strong pure.  For example, consider a function which takes a const parameter, and returns a const.  Pass an immutable into it, and nothing could possibly have changed before the next call, it can be optimized out.  The compiler does not take advantage of these yet.

> And my original question is
> > The Question: What exactly does these pure functions consider as `argument
> value` and as `returned value`?

argument value is all the data reachable via the parameters.  Argument result is all the data reachable via the result.

For pointers, you are under the same rules as normal functions -- @safe functions cannot use pointers, unsafe ones can.  If an unsafe pure function is called, a certain degree of freedom to screw up is available, just like any other unsafe function.

> int f(in int* p) pure;
> 
> void g()
> {
>     auto arr = new int[5];
>     auto res = f(arr.ptr);
> 
>     assert(res == f(arr.ptr));

obviously this passes, all the parameters are identical, and nothing could have changed between the two calls.  The call will not currently be optimized out, because the compiler isn't smart enough yet.

> 
>     assert(res == f(arr.ptr + 1)); // *p isn't changed

may or may not pass, parameter is different.

> 
>     arr[1] = 7;
>     assert(res == f(arr.ptr)); // neither p nor *p is changed

may or may not pass.  f is not @safe, so it could possibly access arr[1].

> 
>     arr[0] = 7;
>     assert(res == f(arr.ptr)); // p isn't changed

may or may not pass, the parameter is different.

> And I completely misunderstand why pure functions can't be optimized out as Steven Schveighoffer sad in druntime pull 198 comment:

I hope I have helped to further your understanding with this post.  Don just looked up the original thread which outlined the weak-pure proposal, which was submitted to digitalmars.D on August 2010.  You may want to read that entire thread.

In general response to this bug, I'm unsure how pointers should be treated by the optimizer.  My gut feeling is the compiler/optimizer should trust the code "knows what it's doing." and so should expect that the code implicitly knows how much data it can access after the pointer.

Consider an interesting case, using BSD sockets:

int f(immutable sockaddr *addr) pure;

sockaddr is a specific size, yet it's a "base class" of different types of address structures.  Typically, one casts the sockaddr into the correct struct based on the sa_family member.

But this may technically mean f accesses more data than it is given, based on a rigid interpretation of the type system.  Should the compiler enforce this given it makes this kind of function practically useless?  I think not.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
June 03, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8185


Jonathan M Davis <jmdavisProg@gmx.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jmdavisProg@gmx.com


--- Comment #8 from Jonathan M Davis <jmdavisProg@gmx.com> 2012-06-02 21:29:24 PDT ---
This isn't true:

> @safe functions cannot use pointers, unsafe ones can.

@safe functions can use pointers just fine. Pointers themselves are considered @safe (e.g. the AA's in operator works just fine in @safe code). It's unsafe pointer operations such as pointer arithmetic which are not @safe.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
June 03, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #9 from Denis Shelomovskij <verylonglogin.reg@gmail.com> 2012-06-03 10:23:09 MSD ---
Such a mess! The more people write here the more different opinions I see. IMHO, Walter and Andrei must also participate here to help with conclusion (or to finally mix everything up).

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
« First   ‹ Prev
1 2 3 4 5 6