April 10, 2008
"Janice Caron" wrote
> On 10/04/2008, Steven Schveighoffer wrote:
>>  Then I should be able to do
>>
>>  h(x)
>>  {
>>     statement1;
>>     statement2;
>>  }
>>
>>  f(x)
>>  {
>>    h(x);
>>  }
>>
>>  g(x)
>>  {
>>    h(x);
>>  }
>
> What you're asking for is what someone else called "amoral" (slightly pure). You're suggesting that h be pure-/ish/. It doesn't have to be completely pure per se, but it must be pure /enough/ such that when embedded within f or g, f and g remain pure.
>
> I have no idea whether or not D will be able to accomodate this concept. Strictly functional programming doesn't have it. D might or might not. If we do, I suspect it will require another keyword.

We may be able to do it without a keyword.  For example, if a pure function returns an invariant(char)[], it means that the return value can be memoized, if not, it means that the return value is well-defined, but another call to the function will return another copy of the same data. Reordering is OK, because it always returns the same data.  You could even wrap this in another version that returns invariant data, which then allows for memoization, if you know you're going to call it a lot, and don't care about mutating the data afterwards.

>>  char[] c = new char[5];
>>
>> is invalid inside a pure function because the compiler cannot verify
>>  uniqueness of mutable heap data, right?
>
> No. As I said, new is special. It's integral to the language.

Of course, but it is like a building block.  If I have a function that is pure, and that function calls nothing but other pure functions AND 'new', then why can't it also be proven to be pure?

-Steve


April 10, 2008
On 10/04/2008, Steven Schveighoffer <schveiguy@yahoo.com> wrote:
> Of course, but it is like a building block.  If I have a function that is
>  pure, and that function calls nothing but other pure functions AND 'new',
>  then why can't it also be proven to be pure?

It's not the calling of new that's a problem, it's the returning of mutable data.

If we allow "pureish" functions, then maybe a "pureish" function could return mutable data, but not a truly function.

The reason is that the compiler is allowed to cache the result (the return value from any pure function), and to re-use that result, instead of calling the function, whenever the input compares equal with last time. If any of that data is mutable, it all goes haywire.

Seems to me that what's really needed here is a macro, because a macro is basically just a function that's always inlined, so the function body will be parsed /inside/ the pure specifier. Maybe macros could solve that problem completely.
April 10, 2008
On 10/04/2008, Janice Caron <caron800@googlemail.com> wrote:
>  If we allow "pureish" functions, then maybe a "pureish" function could
>  return mutable data, but not a truly function.

I meant:
...but not a truly pure function.
April 10, 2008
On Thu, 10 Apr 2008 20:34:29 +0400, Janice Caron <caron800@googlemail.com> wrote:

> On 10/04/2008, Steven Schveighoffer <schveiguy@yahoo.com> wrote:
>
>>  char[] c = new char[5];
>>
>> is invalid inside a pure function because the compiler cannot verify
>>  uniqueness of mutable heap data, right?
>
> No. As I said, new is special. It's integral to the language.

OK, new is 'special', but what about malloc or my own allocator? (I happen to work in a gamedev and we never use new nor malloc).
I don't see how

special extern(C) void* malloc(size_t size);

differs from:

void* my_malloc(size_t size)
{
   return malloc(size);
}

and why can't it be 'special' too.
April 10, 2008
Steven Schveighoffer wrote:
> "Georg Wrede" wrote
>>Steven Schveighoffer wrote:
>>>"Janice Caron" wrote
>>>
>>>So how is new char[] a pure function?  And why can't I 'wrap' new?  For instance, if I have a function:
>>>
>>>int[] createIncreasingSequence(int s, int e, int step){...}
>>>
>>>which creates an array of ints that start at s and end at e, increasing by step each time, why can't I make that pure?  i.e. I want to specify my own way to initialize an array.  If I do that in 10 different pure functions, you're saying I have to re-implement that code in each one?
>>
>>The return value must be immutable if you're going to use it several times.
> 
> Agreed.  But I don't want to use it several times, I want to use it once :)
> 
> I want it to act like new does.  For example, if I have
> 
> char[] c = new char[5];
> 
> in two different pure functions, new better not return the same exact memory space for both!  It should be called every time.  But I can't do this with a custom function because pure functions can only call other pure functions. Basically, I'm asking if there will be a way to move a common piece of a function that allocates and initializes data out of a pure function and into another.
> 
> It would be cool if the compiler did not perform special optimizations (such as memoization) on pure functions that return mutable heap data, but still enforced the rules of purity.
> 
>>And if you want to cache the return values outside of the function (as in only returning a new array when new parameters are used, and otherwise just a reference to a previously returned one), then I guess I don't have an opinion. :-?
>>
>>>This was misleading on my part.  I should have asked instead of these two questions, if substr is allowed to be pure, can it be called from a non-pure function?  In the context of being called from a pure function, src is unique, because pure has to have unique data.  But in the context of calling from a non-pure function, it might not be...
>>
>>Ultimately, any function is called from main(). Which is a non-pure function.

I'll have to take that back. Of course main can be a pure function. Take the unix sort function for an example. It receives data (which by the way, is /assumed/ immutable, that is, the input file is assumed not to change while sort is running) and _without_ otherwise changing its environment (read, files on the file system), returns a value (the sorted input).

Actually a lot of unix functionality is based on this. (Almost (just so I'm not caught lying :-) )) all the unix commands that take stdin as input and stdout as output, can be considered pure functions. (That most of them also output to stderr doesn't count in this discussion. :-) Nor the fact that they can change files (like the tee command) if the user wants. Also, they're excellent examples of functions that can be pure or not pure, depending on the invocation.)

> This is a special case of pure function.  Let's say I have a pure function:
> 
> pure invariant(char)[] f()
> {
>     char[] c = new char[15];
>     g(c);
>     return assumeUnique!(c);
> }
> 
> Now, if g is a pure function that takes a char[] argument and accesses the data referenced by the argument, it could be OK to call it from another pure function because we could assume that the pure function can only use heap data that is 'local' to the function.  But it might be invalid to call such a pure function from a non-pure function because data held by a non-pure function is not always considered 'local'.  On the other hand, if a pure function can hold pointers to heap data that is shared, but just can't use it, then a pure function that takes mutable heap data might not be able to use it, so a function like g() isn't possible.

I'd say that

 - you can call a pure function from a non-pure function, anytime

 - you can call a non-pure function from a pure function, BUT THAT makes your pure function lose its purity. It'll still work, however. (And of course, the compiler could be polite enough to remind you of losing purity -- if the compiler can see that you're actually trying to have a pure function, as opposed to the function being pure just by happenstance.)

For a program none of this matters. I'd say that you can mix freely whatever you like. BUT, if you wish the compiler has the possibility to do serious optimizing, the you better not soil your pure functions.

I'd say, that is your only punishment for frivolous mixing. The data doesn't get corrupted, and the sky doesn't turn checkered above you.

> If pure functions only allow invariant heap parameters and invariant heap returns, then this whole discussion becomes moot, but if they accept non-invariant heap parameters, or allow non-invariant heap returns, I want to know what the rules for those are.  It seems to me that only allowing invariant heap data is a severe limitation that will leave D functional programming crippled in the face of an actual FP language.

One of the dreamed benefits of deciding that _D_ pure functions can only access invariant data is to ensure that the compiler can do memoisation when it wants to. That is, then the programmer doesn't have to do memoisation by hand. This lessens bugs, eases coding, and gives the compiler potentially big possibilities of very serious optimizing. (Not that DMD probably will achieve such in some time, but still. After all, we're designing a laguage here, too, which should make it _possible_ to write such compilers.)

If we define

 - pure functions can only receive immutable data as arguments
 - pure functions can only access immutable other data
 - pure functions can only return immutable data
 - pure functions can't change or output other than the return "value"
 - pure functions can only use other pure functions

and then we define

 - "return value" can also be a reference to immutable heap data
 - pure functions can use the heap as their local storage

and we decide that the last one means that pure functions can create and manipulate mutable data on the heap, but only so that nobody else has access to this data (including previous, later, or concurrent instances of the same function),

then we should be half-way to the goals A/W are trying to reach.

---

Now, why all this heap stuff?

Simply because even Real Functional Languages, that make it look like the stack is the only place in the world, can actually use the heap all they want, only they do it behind the back of the programmer. It's not about *literally* using the stack only, it's about the metaphor.

This metaphor is dictated by the fact that pure functionality means that stuff is passed via return values only.

Now, in D (which is a multi-paradigm, practical language, for Systems Programming), we don't want to restrict ourselves any more by purity than we'd want to do that with OO, for example.

Before C++ was invented, we had languages that were OO-only, and languages that were non-OO. (Even Tiobe still uses this distinction, and I always wonder to which category they've put D.) The reason for this either-or was simply that people were new to OO, and hadn't yet figured out how (or whether) they can mix OO and procedural.

To take another example, when I got a VIC-20 in the early eighties, it only had a Microsoft Basic interpreter. You couldn't do structured programming with it simply because the facilities were missing in the language. This made the programming effort more than exponential into the number of lines written. Today, I only do structural programming (as in versus unstructural), and admire Walter, who still mixes structured and unstructured, quite fluently. (Just see Phobos source, especially the C files.)

Now, with D we are charting new ground here. We are (arguably) the first to make a serious effort of mixing functional (as in pure) with procedural programming.


And this is where the heap stuff comes along. It would be impractical for D to pass everything via the stack. First of all, non-trivial progams would potentially need huge stack spaces, and in many environments the stack space allocated for a program is given at startup time. So, either we need to allocate stack just-in-case, or risk running out of stack in mid-run. Wasteful, and impolite, especially in a multiuser environment.

But to achieve this, we, at this point, have to decide that both input to pure functions and output (i.e. the return "value") have to be invariant. Only this lets us get away with only "pretending" to use the stack, while in essence we pass references to heap stuff via the stack.

---

Later, when we're more at ease with mixing functional with non-functional, we might well find ways to ease these restrictions. But that might be several years after we've actually started using this in real projects.


PS, oh Bob, I wish Walter would comment on this post, in any way. Suppose my understanding on this differs from that of A/W, then I'd only be misleading everybody here.

April 10, 2008
On 10/04/2008, Koroskin Denis <2korden+dmd@gmail.com> wrote:
>  OK, new is 'special', but what about malloc or my own allocator? (I happen
> to work in a gamedev and we never use new nor malloc).
>  I don't see how
>
>  special extern(C) void* malloc(size_t size);
>
>  differs from:
>
>  void* my_malloc(size_t size)
>  {
>    return malloc(size);
>  }
>
>  and why can't it be 'special' too.

Because you're modifying global state.

One of the reasons why new is special is because you never have to delete (because the garbage collector takes care of it). This is directly comparable to, for example, Haskell, where new "things" are created all the time, and are never freed. It's built into the system. More - it /is/ the system. That's the status of new. It's the de facto heap allocation method in D.

If you want to use custom allocators, there's nothing to stop you, but that doesn't change the fact that the minute you modify global state, you have a function that isn't pure.

I think you're worrying too much. There should never be a need to call malloc and free inside a pure function. If you need a function that does impure stuff, just write the function anyway, and don't call it pure.
April 10, 2008
On 10/04/2008, Janice Caron <caron800@googlemail.com> wrote:
> On 10/04/2008, Koroskin Denis <2korden+dmd@gmail.com> wrote:
>  >  OK, new is 'special', but what about malloc or my own allocator? (I happen
>  > to work in a gamedev and we never use new nor malloc).
>  >  I don't see how
>  >
>  >  special extern(C) void* malloc(size_t size);
>  >
>  >  differs from:
>  >
>  >  void* my_malloc(size_t size)
>  >  {
>  >    return malloc(size);
>  >  }

Sorry, I should have added: /neither/ of those functions is pure.

In fact, I strongly doubt that any function declared extern(C) will be pure, for the simple and obvious reason that C doesn't have the "pure" keyword. :-)
April 10, 2008
Steven Schveighoffer wrote:
> Now, I realize I have very similar code that initializes the int arrays, so I want to put that into a function that creates and initializes an array with the given parameters:
> 
> int[] createIncreasingSequence(int length, int start, int step)
> {
>     int[] result = new int[length];
>     for(int i = 0; i < result.length; i++)
>         result[i] = start + (i * step);
>     return result;
> }
> 
> Now, I think we all agree that createIncreasingSequence is as pure as calling new, right?  That is, it doesn't affect any global state (except for allocating heap data) and will return an equivalent array every time the same arguments are given.  But if I can't declare it as 'pure', then I can't call it from f and g:

Yes. I think your function is pure. Except for the fact that the returned heap data has to be pure.

> pure int f()
> {
>    int[] x = createIncreasingSequence(5, 1, 1);
> 
>    /* do stuff */
>    return result;
> }
> 
> pure int g()
> {
>    int[] y = createIncreasingSequence(15, 10, 3);
> 
>    /* do stuff */
>    return result;
> }
> 
> So should there be a way to define createIncreasingSequence such that I can call it from a pure function?  Or do I have to re-implement it in every pure function I use?

You should be able to call it from a pure function.

> Not having the ability to pass around mutable heap data to and from pure functions is going to limit severely the usefulness of pure functions.

This example does not need mutable data. And it is an example of how you can get along just fine with immutable data.

(It is also an example of CTFE. That is, already existing DMD should convert f and g into compile time constants. :-) But that's obviously besides the point here!)

April 10, 2008
"Janice Caron" wrote
> On 10/04/2008, Steven Schveighoffer wrote:
>> Of course, but it is like a building block.  If I have a function that is
>>  pure, and that function calls nothing but other pure functions AND
>> 'new',
>>  then why can't it also be proven to be pure?
>
> It's not the calling of new that's a problem, it's the returning of mutable data.
>
> If we allow "pureish" functions, then maybe a "pureish" function could return mutable data, but not a truly (a pure) function.
>
> The reason is that the compiler is allowed to cache the result (the return value from any pure function), and to re-use that result, instead of calling the function, whenever the input compares equal with last time. If any of that data is mutable, it all goes haywire.

I look at it differently.

To me, a pure function is a function that has no side effects and has promised to return the same thing every time it is called.  Allocating memory is the only function that is considered pure, but alters global state (i.e. the heap), and so we assume it's pure.  A function can still allocate data and be considered pure.

A function whose result can be memoized is to me a function that is pure, and also one that returns either stack or invariant data.  So a function has to be pure to be memoized, but not all pure functions can be memoized.

If you separate out the memoization optimization, then it's possible to have both worlds, without any extra keywords or terminology.  We are breaking new ground here, so this is the time to explore options that are not part of the precedent.

You might not get the compiler optimizations with a pure function that returns mutable heap data that you would with one that returns invariant heap data, but sometimes that is desirable.  The compiler cannot know all things about all optimizations that have been or ever will be.  At some point, it has to be nudged in the right direction.

> Seems to me that what's really needed here is a macro, because a macro is basically just a function that's always inlined, so the function body will be parsed /inside/ the pure specifier. Maybe macros could solve that problem completely.

Maybe, but this results in code bloat :)  There are always tradeoffs, it's a question of what is important to you.  I used to work with a device that had 256 bytes of RAM and 8K of code space.  When working on that device, I learned how important it was to split common functionality into functions.

-Steve


April 10, 2008
Leandro Lucarella wrote:
> Steven Schveighoffer, el 10 de abril a las 10:42 me escribiste:
> 
>>So let me restate: not having the ability to pass around UNIQUE mutable heap data to and from pure functions is going to limit severely the usefulness of pure functions.
> 
> 
> This is the exact same problem of the mutable methods of a class used in a
> stack allocated temporary of a pure function, talked in another thread.
> 
> class A {
> 	int x;
> 	void f() { x = 1; }
> }
> 
> pure void g()
> {
> 	scope a = new A;
> 	a.f();
> }
> 
> g() is clearly pure (it has no side effects, besides allocating memory in
> the stack, which shouldn't be considered a side effect, I guess allocating
> in the heap shouldn't be considered a side effect either, since new is
> thread safe). Even when A.f() is not pure, nor const.
> 
> There is an implementation problem, about how hard is to detect that by
> the compiler, but in theory there is no reason for g() not being pure.

g doesn't return a value.