View mode: basic / threaded / horizontal-split · Log in · Help
April 10, 2008
Re: pure or not pure?
"Janice Caron" wrote
> On 10/04/2008, Steven Schveighoffer wrote:
>>  Then I should be able to do
>>
>>  h(x)
>>  {
>>     statement1;
>>     statement2;
>>  }
>>
>>  f(x)
>>  {
>>    h(x);
>>  }
>>
>>  g(x)
>>  {
>>    h(x);
>>  }
>
> What you're asking for is what someone else called "amoral" (slightly
> pure). You're suggesting that h be pure-/ish/. It doesn't have to be
> completely pure per se, but it must be pure /enough/ such that when
> embedded within f or g, f and g remain pure.
>
> I have no idea whether or not D will be able to accomodate this
> concept. Strictly functional programming doesn't have it. D might or
> might not. If we do, I suspect it will require another keyword.

We may be able to do it without a keyword.  For example, if a pure function 
returns an invariant(char)[], it means that the return value can be 
memoized, if not, it means that the return value is well-defined, but 
another call to the function will return another copy of the same data. 
Reordering is OK, because it always returns the same data.  You could even 
wrap this in another version that returns invariant data, which then allows 
for memoization, if you know you're going to call it a lot, and don't care 
about mutating the data afterwards.

>>  char[] c = new char[5];
>>
>> is invalid inside a pure function because the compiler cannot verify
>>  uniqueness of mutable heap data, right?
>
> No. As I said, new is special. It's integral to the language.

Of course, but it is like a building block.  If I have a function that is 
pure, and that function calls nothing but other pure functions AND 'new', 
then why can't it also be proven to be pure?

-Steve
April 10, 2008
Re: pure or not pure?
On 10/04/2008, Steven Schveighoffer <schveiguy@yahoo.com> wrote:
> Of course, but it is like a building block.  If I have a function that is
>  pure, and that function calls nothing but other pure functions AND 'new',
>  then why can't it also be proven to be pure?

It's not the calling of new that's a problem, it's the returning of
mutable data.

If we allow "pureish" functions, then maybe a "pureish" function could
return mutable data, but not a truly function.

The reason is that the compiler is allowed to cache the result (the
return value from any pure function), and to re-use that result,
instead of calling the function, whenever the input compares equal
with last time. If any of that data is mutable, it all goes haywire.

Seems to me that what's really needed here is a macro, because a macro
is basically just a function that's always inlined, so the function
body will be parsed /inside/ the pure specifier. Maybe macros could
solve that problem completely.
April 10, 2008
Re: pure or not pure?
On 10/04/2008, Janice Caron <caron800@googlemail.com> wrote:
>  If we allow "pureish" functions, then maybe a "pureish" function could
>  return mutable data, but not a truly function.

I meant:
...but not a truly pure function.
April 10, 2008
Re: pure or not pure?
On Thu, 10 Apr 2008 20:34:29 +0400, Janice Caron <caron800@googlemail.com>  
wrote:

> On 10/04/2008, Steven Schveighoffer <schveiguy@yahoo.com> wrote:
>
>>  char[] c = new char[5];
>>
>> is invalid inside a pure function because the compiler cannot verify
>>  uniqueness of mutable heap data, right?
>
> No. As I said, new is special. It's integral to the language.

OK, new is 'special', but what about malloc or my own allocator? (I happen  
to work in a gamedev and we never use new nor malloc).
I don't see how

special extern(C) void* malloc(size_t size);

differs from:

void* my_malloc(size_t size)
{
   return malloc(size);
}

and why can't it be 'special' too.
April 10, 2008
Re: pure or not pure?
Steven Schveighoffer wrote:
> "Georg Wrede" wrote
>>Steven Schveighoffer wrote:
>>>"Janice Caron" wrote
>>>
>>>So how is new char[] a pure function?  And why can't I 'wrap' new?  For 
>>>instance, if I have a function:
>>>
>>>int[] createIncreasingSequence(int s, int e, int step){...}
>>>
>>>which creates an array of ints that start at s and end at e, increasing 
>>>by step each time, why can't I make that pure?  i.e. I want to specify my 
>>>own way to initialize an array.  If I do that in 10 different pure 
>>>functions, you're saying I have to re-implement that code in each one?
>>
>>The return value must be immutable if you're going to use it several 
>>times.
> 
> Agreed.  But I don't want to use it several times, I want to use it once :)
> 
> I want it to act like new does.  For example, if I have
> 
> char[] c = new char[5];
> 
> in two different pure functions, new better not return the same exact memory 
> space for both!  It should be called every time.  But I can't do this with a 
> custom function because pure functions can only call other pure functions. 
> Basically, I'm asking if there will be a way to move a common piece of a 
> function that allocates and initializes data out of a pure function and into 
> another.
> 
> It would be cool if the compiler did not perform special optimizations (such 
> as memoization) on pure functions that return mutable heap data, but still 
> enforced the rules of purity.
> 
>>And if you want to cache the return values outside of the function (as in 
>>only returning a new array when new parameters are used, and otherwise 
>>just a reference to a previously returned one), then I guess I don't have 
>>an opinion. :-?
>>
>>>This was misleading on my part.  I should have asked instead of these two 
>>>questions, if substr is allowed to be pure, can it be called from a 
>>>non-pure function?  In the context of being called from a pure function, 
>>>src is unique, because pure has to have unique data.  But in the context 
>>>of calling from a non-pure function, it might not be...
>>
>>Ultimately, any function is called from main(). Which is a non-pure 
>>function.

I'll have to take that back. Of course main can be a pure function. Take 
the unix sort function for an example. It receives data (which by the 
way, is /assumed/ immutable, that is, the input file is assumed not to 
change while sort is running) and _without_ otherwise changing its 
environment (read, files on the file system), returns a value (the 
sorted input).

Actually a lot of unix functionality is based on this. (Almost (just so 
I'm not caught lying :-) )) all the unix commands that take stdin as 
input and stdout as output, can be considered pure functions. (That most 
of them also output to stderr doesn't count in this discussion. :-) Nor 
the fact that they can change files (like the tee command) if the user 
wants. Also, they're excellent examples of functions that can be pure or 
not pure, depending on the invocation.)

> This is a special case of pure function.  Let's say I have a pure function:
> 
> pure invariant(char)[] f()
> {
>     char[] c = new char[15];
>     g(c);
>     return assumeUnique!(c);
> }
> 
> Now, if g is a pure function that takes a char[] argument and accesses the 
> data referenced by the argument, it could be OK to call it from another pure 
> function because we could assume that the pure function can only use heap 
> data that is 'local' to the function.  But it might be invalid to call such 
> a pure function from a non-pure function because data held by a non-pure 
> function is not always considered 'local'.  On the other hand, if a pure 
> function can hold pointers to heap data that is shared, but just can't use 
> it, then a pure function that takes mutable heap data might not be able to 
> use it, so a function like g() isn't possible.

I'd say that

 - you can call a pure function from a non-pure function, anytime

 - you can call a non-pure function from a pure function, BUT THAT 
makes your pure function lose its purity. It'll still work, however. 
(And of course, the compiler could be polite enough to remind you of 
losing purity -- if the compiler can see that you're actually trying to 
have a pure function, as opposed to the function being pure just by 
happenstance.)

For a program none of this matters. I'd say that you can mix freely 
whatever you like. BUT, if you wish the compiler has the possibility to 
do serious optimizing, the you better not soil your pure functions.

I'd say, that is your only punishment for frivolous mixing. The data 
doesn't get corrupted, and the sky doesn't turn checkered above you.

> If pure functions only allow invariant heap parameters and invariant heap 
> returns, then this whole discussion becomes moot, but if they accept 
> non-invariant heap parameters, or allow non-invariant heap returns, I want 
> to know what the rules for those are.  It seems to me that only allowing 
> invariant heap data is a severe limitation that will leave D functional 
> programming crippled in the face of an actual FP language.

One of the dreamed benefits of deciding that _D_ pure functions can only 
access invariant data is to ensure that the compiler can do memoisation 
when it wants to. That is, then the programmer doesn't have to do 
memoisation by hand. This lessens bugs, eases coding, and gives the 
compiler potentially big possibilities of very serious optimizing. (Not 
that DMD probably will achieve such in some time, but still. After all, 
we're designing a laguage here, too, which should make it _possible_ to 
write such compilers.)

If we define

 - pure functions can only receive immutable data as arguments
 - pure functions can only access immutable other data
 - pure functions can only return immutable data
 - pure functions can't change or output other than the return "value"
 - pure functions can only use other pure functions

and then we define

 - "return value" can also be a reference to immutable heap data
 - pure functions can use the heap as their local storage

and we decide that the last one means that pure functions can create and 
manipulate mutable data on the heap, but only so that nobody else has 
access to this data (including previous, later, or concurrent instances 
of the same function),

then we should be half-way to the goals A/W are trying to reach.

---

Now, why all this heap stuff?

Simply because even Real Functional Languages, that make it look like 
the stack is the only place in the world, can actually use the heap all 
they want, only they do it behind the back of the programmer. It's not 
about *literally* using the stack only, it's about the metaphor.

This metaphor is dictated by the fact that pure functionality means that 
stuff is passed via return values only.

Now, in D (which is a multi-paradigm, practical language, for Systems 
Programming), we don't want to restrict ourselves any more by purity 
than we'd want to do that with OO, for example.

Before C++ was invented, we had languages that were OO-only, and 
languages that were non-OO. (Even Tiobe still uses this distinction, and 
I always wonder to which category they've put D.) The reason for this 
either-or was simply that people were new to OO, and hadn't yet figured 
out how (or whether) they can mix OO and procedural.

To take another example, when I got a VIC-20 in the early eighties, it 
only had a Microsoft Basic interpreter. You couldn't do structured 
programming with it simply because the facilities were missing in the 
language. This made the programming effort more than exponential into 
the number of lines written. Today, I only do structural programming (as 
in versus unstructural), and admire Walter, who still mixes structured 
and unstructured, quite fluently. (Just see Phobos source, especially 
the C files.)

Now, with D we are charting new ground here. We are (arguably) the first 
to make a serious effort of mixing functional (as in pure) with 
procedural programming.


And this is where the heap stuff comes along. It would be impractical 
for D to pass everything via the stack. First of all, non-trivial 
progams would potentially need huge stack spaces, and in many 
environments the stack space allocated for a program is given at startup 
time. So, either we need to allocate stack just-in-case, or risk running 
out of stack in mid-run. Wasteful, and impolite, especially in a 
multiuser environment.

But to achieve this, we, at this point, have to decide that both input 
to pure functions and output (i.e. the return "value") have to be 
invariant. Only this lets us get away with only "pretending" to use the 
stack, while in essence we pass references to heap stuff via the stack.

---

Later, when we're more at ease with mixing functional with 
non-functional, we might well find ways to ease these restrictions. But 
that might be several years after we've actually started using this in 
real projects.


PS, oh Bob, I wish Walter would comment on this post, in any way. 
Suppose my understanding on this differs from that of A/W, then I'd only 
be misleading everybody here.
April 10, 2008
Re: pure or not pure?
On 10/04/2008, Koroskin Denis <2korden+dmd@gmail.com> wrote:
>  OK, new is 'special', but what about malloc or my own allocator? (I happen
> to work in a gamedev and we never use new nor malloc).
>  I don't see how
>
>  special extern(C) void* malloc(size_t size);
>
>  differs from:
>
>  void* my_malloc(size_t size)
>  {
>    return malloc(size);
>  }
>
>  and why can't it be 'special' too.

Because you're modifying global state.

One of the reasons why new is special is because you never have to
delete (because the garbage collector takes care of it). This is
directly comparable to, for example, Haskell, where new "things" are
created all the time, and are never freed. It's built into the system.
More - it /is/ the system. That's the status of new. It's the de facto
heap allocation method in D.

If you want to use custom allocators, there's nothing to stop you, but
that doesn't change the fact that the minute you modify global state,
you have a function that isn't pure.

I think you're worrying too much. There should never be a need to call
malloc and free inside a pure function. If you need a function that
does impure stuff, just write the function anyway, and don't call it
pure.
April 10, 2008
Re: pure or not pure?
On 10/04/2008, Janice Caron <caron800@googlemail.com> wrote:
> On 10/04/2008, Koroskin Denis <2korden+dmd@gmail.com> wrote:
>  >  OK, new is 'special', but what about malloc or my own allocator? (I happen
>  > to work in a gamedev and we never use new nor malloc).
>  >  I don't see how
>  >
>  >  special extern(C) void* malloc(size_t size);
>  >
>  >  differs from:
>  >
>  >  void* my_malloc(size_t size)
>  >  {
>  >    return malloc(size);
>  >  }

Sorry, I should have added: /neither/ of those functions is pure.

In fact, I strongly doubt that any function declared extern(C) will be
pure, for the simple and obvious reason that C doesn't have the "pure"
keyword. :-)
April 10, 2008
Re: pure or not pure?
Steven Schveighoffer wrote:
> Now, I realize I have very similar code that initializes the int arrays, so 
> I want to put that into a function that creates and initializes an array 
> with the given parameters:
> 
> int[] createIncreasingSequence(int length, int start, int step)
> {
>     int[] result = new int[length];
>     for(int i = 0; i < result.length; i++)
>         result[i] = start + (i * step);
>     return result;
> }
> 
> Now, I think we all agree that createIncreasingSequence is as pure as 
> calling new, right?  That is, it doesn't affect any global state (except for 
> allocating heap data) and will return an equivalent array every time the 
> same arguments are given.  But if I can't declare it as 'pure', then I can't 
> call it from f and g:

Yes. I think your function is pure. Except for the fact that the 
returned heap data has to be pure.

> pure int f()
> {
>    int[] x = createIncreasingSequence(5, 1, 1);
> 
>    /* do stuff */
>    return result;
> }
> 
> pure int g()
> {
>    int[] y = createIncreasingSequence(15, 10, 3);
> 
>    /* do stuff */
>    return result;
> }
> 
> So should there be a way to define createIncreasingSequence such that I can 
> call it from a pure function?  Or do I have to re-implement it in every pure 
> function I use?

You should be able to call it from a pure function.

> Not having the ability to pass around mutable heap data to and from pure 
> functions is going to limit severely the usefulness of pure functions.

This example does not need mutable data. And it is an example of how you 
can get along just fine with immutable data.

(It is also an example of CTFE. That is, already existing DMD should 
convert f and g into compile time constants. :-) But that's obviously 
besides the point here!)
April 10, 2008
Re: pure or not pure?
"Janice Caron" wrote
> On 10/04/2008, Steven Schveighoffer wrote:
>> Of course, but it is like a building block.  If I have a function that is
>>  pure, and that function calls nothing but other pure functions AND 
>> 'new',
>>  then why can't it also be proven to be pure?
>
> It's not the calling of new that's a problem, it's the returning of
> mutable data.
>
> If we allow "pureish" functions, then maybe a "pureish" function could
> return mutable data, but not a truly (a pure) function.
>
> The reason is that the compiler is allowed to cache the result (the
> return value from any pure function), and to re-use that result,
> instead of calling the function, whenever the input compares equal
> with last time. If any of that data is mutable, it all goes haywire.

I look at it differently.

To me, a pure function is a function that has no side effects and has 
promised to return the same thing every time it is called.  Allocating 
memory is the only function that is considered pure, but alters global state 
(i.e. the heap), and so we assume it's pure.  A function can still allocate 
data and be considered pure.

A function whose result can be memoized is to me a function that is pure, 
and also one that returns either stack or invariant data.  So a function has 
to be pure to be memoized, but not all pure functions can be memoized.

If you separate out the memoization optimization, then it's possible to have 
both worlds, without any extra keywords or terminology.  We are breaking new 
ground here, so this is the time to explore options that are not part of the 
precedent.

You might not get the compiler optimizations with a pure function that 
returns mutable heap data that you would with one that returns invariant 
heap data, but sometimes that is desirable.  The compiler cannot know all 
things about all optimizations that have been or ever will be.  At some 
point, it has to be nudged in the right direction.

> Seems to me that what's really needed here is a macro, because a macro
> is basically just a function that's always inlined, so the function
> body will be parsed /inside/ the pure specifier. Maybe macros could
> solve that problem completely.

Maybe, but this results in code bloat :)  There are always tradeoffs, it's a 
question of what is important to you.  I used to work with a device that had 
256 bytes of RAM and 8K of code space.  When working on that device, I 
learned how important it was to split common functionality into functions.

-Steve
April 10, 2008
Re: pure or not pure?
Leandro Lucarella wrote:
> Steven Schveighoffer, el 10 de abril a las 10:42 me escribiste:
> 
>>So let me restate: not having the ability to pass around UNIQUE mutable heap 
>>data to and from pure functions is going to limit severely the usefulness of 
>>pure functions.
> 
> 
> This is the exact same problem of the mutable methods of a class used in a
> stack allocated temporary of a pure function, talked in another thread.
> 
> class A {
> 	int x;
> 	void f() { x = 1; }
> }
> 
> pure void g()
> {
> 	scope a = new A;
> 	a.f();
> }
> 
> g() is clearly pure (it has no side effects, besides allocating memory in
> the stack, which shouldn't be considered a side effect, I guess allocating
> in the heap shouldn't be considered a side effect either, since new is
> thread safe). Even when A.f() is not pure, nor const.
> 
> There is an implementation problem, about how hard is to detect that by
> the compiler, but in theory there is no reason for g() not being pure.

g doesn't return a value.
1 2 3 4
Top | Discussion index | About this forum | D home