Ideas, thoughts, and criticisms, part two. About functions.

August 27, 2002
Posted by Antti Sykäri
Permalink
Antti Sykäri
Permalink
Ideas, thoughts, and criticisms (part 2)
----------------------------------------

(The topic says functions. Today's rant is actually about a bit more than that: multiple-value expressions, the unification of expressions and statements, and a sidenote about how to implement generics like I would do it. (The C++ way that is. *g*))

Here we go again.

I thought a bit about functions.

Let's start from an example.

http://www.digitalmars.com/d/function.html contains the following example:

int foo(int x, out int y, inout int z, int q);

To me it just doesn't seem right. Something is wrong. Maybe it is the words "inout" and "out" which just I feel like should not be there. (sorry, all due respect to IDL but I don't use it, I don't like it, and don't think it's worthwhile modeling a language to match IDL syntax. Just my opinion. Just the two cents, or something.)

Probably it's just my view that there should not be _two_ places where a function can return values - the "ordinary" return value and then the "out" return value.  Could not the function return multiple values?

int, int foo(int x, inout int z, int q);

Now something is still wrong. I put the second argument to the function, "int y" to the left of the function. Everything's fine except that we can't see its name there. We cannot document the meanings of function's return value, because we cannot give it a name.

But it has always been so. function's return value has indeed been nameless. In C that is. But I don't see why it should be that way. Let's give names to the return values of the function:

int val, int q
foo(int x, inout int z, int q);

Now that's pretty. We have documented our return values (they have names), functions can return multiple values, and they are put in one place. Now we need to see what we can do to the poor inout variable. Maybe a reference à la C++?

int val, int q
foo(int x, int& z, int q);

IMHO this syntax is nicer to read because there are no "inout" words there, and the "in" and "out" values are grouped nicely.

This would also get rid of the little annoyance that in C++, if you want a function that takes no arguments, you use just:

int f();        // function taking no arguments, returning int

but if you want to return no arguments, you have to use void:

void f(int);    // function taking int, returning nothing

Now, if you want to return nothing, just use:

f();

This return value thing has some other implications, too. It doesn't come cheap. Grab a good hold on your chairs.

We need a new type expression altogether, let's call it a multivalued expression. Like I said, I'm no grammar expert so I'm not going to concentrate on that side. But it seems we'd have to scrap the well-served comma expression. Let's see what we can figure out to replace it later, and now concentrate on the multivalued function thing.

int val1, int val2 f();

g()
{
    // you could, of course, do this:
    int x, y;
    x, y = f();

    // and naturally this:
    x, y = y, x;

    // and maybe this:
    int a, b = f();

    // now if you would want to ignore one of the return values, what to
    // do then?
    x = f();    // only take the first one
    , y = f();  // only take the second one (but this does not fulfill my
                // aesthetic needs, so maybe not like this...)
    // so perhaps something else is needed: (this _could_ be interpreted
    // to assignment to an unnamed variable of type void - or just as a
    // syntactic sugar for "int dummy, y = f();"
    void, y = f();
}

While we're at it, why not exploit the naming of the return values a bit. We could actually represent returning multiple values as returning a struct. Suppose that we have two f's, so we need to specify one in order to get a type out of it:

int   val1,   int val2 f(int x)     { return x, x*2;    }
float val1, float val2 f(float x)   { return x, x*2.0f; }

// assuming that the form f() is unevaluated, and only its type is
// requested via the .type property:
// for practical purposes, f_int_return_type is a struct { int val1; int val2; }
// f(int) is there just for disambiguation purposes.
alias f(int).type f_int_return_type;

g()
{
    f_int_return_type ret = f(5);
    assert(ret.val2 == 10);
}

With this, and the "with" expression (or even without this, but with "with" expression and some sweet syntactic sugar) we could also do

g()
{
    with (f(5))
    {
        do_whatever_with(val1, val2);
    }
}

Now, there is just no end for the syntactic sugar if that's the way to go. Now wouldn't it taste good if you could do the following:

int, int f(int x) {           // of course, you don't need to name them if you
    return x, x+1;            // don't want to. I just didn't bother to
}                             // invent names right now

g(int first, int second, int third)
{
    printf("%d, %d, %d\n", first, second, third);
}

main()
{
    g(1, f(2));     // will print "5, 6, 7"
}

Now how does this sound like?

And of course, we wouldn't have to specify return parameters at a first "return" line if we didn't want to. It could be done as they do in pascal (I'm not one of them, mind you, but anyway I could imagine this style of returning values being easier to optimize or whatever. I'm no premature optimizer either. No way, not me. *g*).

For example, suppose that we have a function that calculates a sine and a cosine. That's not really out of this world, x87 math coprosessor has one. It's called fsincos (see intel:ia32_vol1) and it produces a sine and cosine of an argument faster than a fsin and fcos in succession. So, if we would like to commit a little sin (no pun intended) and premature-optimize a bit:

module intrinsic; // or whatever

float sin_ret,
float cos_ret
inline sincos(float argument)
{
    asm
    {
        // now, here we assume that the compiler can handle putting
        // argument and the sine and cosine in the right places...
        // I might wrong in my assembly, I just learned it yesterday.
        // This is just an example.
        fsincos argument;
        mov     cos_ret, st(1)
        mov     sin_ret, st(0)
    }
}

Or, for a more common example, everyone probably knows that a common DIV instruction on IA-32 architecture (intel:ia32_vol2) produces not only the quotient but also the remainder as a side effect; so we could actually make

int quotient, int remainder inline op.div(int dividee, int divisor)
{
    // assembly code to DIV dividee by divisor,
    // and move EAX to quotient (which probably is EAX anyway)
    // and move EDX to remainder.
    // If one of the return values turns out not to be used,
    // compiler is smart enough to forget unneeded MOV. EDX/EAX
}

and then we could have (since modulo is in no way special)
int remainder, int quotient inline op.mod(int dividee, int divisor)
{
	quotient, remainder = dividee, divisor; // will call op.div()
}
Oh yeah, I forgot what to do with the poor comma operator.

My idea would be to simply do away with it and replace it with the block expression, so that

expression1, expression2;
would yield expression1 in a single-valued context
(and a (expression1, expression2) in a multivalued context)

and

{ expression1;
  expression2; }

would yield expression2, and that would be more general anyway.

Of course, that would like a serious rearrangement of things. Expression would be statement, and a statement would be expression. And everything would have to be functional, not just half-functional as it is now.

Besides, I like the idea of functional-style "if" where if is an expression, not a statement:

// the return value need not be named, since the name of the function
// documents it clearly enough
int max(int lhs, int rhs)
{
     return
        if (lhs < rhs)
            rhs;
        else
            lhs;
}

Of course, that can be taken further and make the last statement/expression of the function to yield the return value:

int max(int lhs, int lhs)
{
    // if yields one of its statements
    if (lhs < rhs)
        rhs;
    else
        lhs;
}

(This of course renders needless the weird trophy of the last century, the infamous :? operator, which is a bit annoying to use repeatedly. You have possibly seen this, and run away from it as fast as you could:

    int x = test1() ? something
                    : test2() ? somethingElse
                              : testEvenMore() ? yetanotherthing
                                               : lastTest() ? awwintired
                                                            : phew;
)

If you went so far as to make tail recursion optimization a feature of the language (as in scheme), we are sure to see code like:

// recursive helper function
private int inline max(int[] array, int maxSoFar, uint idx)
{
    if (idx == array.length)
        maxSoFar;
    else
        max(array, max(array[i], maxSoFar), idx + 1);

}

// the main entrance to the helper function.
// return maximum entry in the array.
// in case of empty array, return int.min
int inline max(int[] array)
{
    return max(array, int.min, 0);
}

Quelle elegance! Well, that's a question of opinion but this kind of stuff _would_ make it easier to use functional-style elements in D and make it even more multi-paradigm than C++ is. (For the good or the worse.)

Now while we're at it, and we've invented multivalued expressions (tuples, if you like that name better) on the way, let's go generic. But let's not go totally generic quite yet. Only generic when it comes to the argument values. See, let's first introduce a new keyword, "rest", our friend on the way to the general handling of the parameters:

int inline sum()
{
    return 0;
}

// calculate the sum of its arguments.
int inline sum(int head, rest tail)
{
    // head is the first argument,
    // and "tail", formed with the help of the "rest" keyword,
    // is the tuple containing the rest of them. tail may be
    // empty, which is as good as "void".
    // Of course, you'd better inline this function.
    return head + sum(tail);
}

Now what happens when the compiler starts to generate the code for, for
example, sum(1, 2, 3), is:

sum(1, 2, 3)
-> sum(head = 1, tail = (2, 3))
   -> 1
    + sum(2, 3)
      -> 2
       + sum(3)
         -> 3
          + sum()
            -> 0
-> 1 + 2 + 3
-> 6

And then it stops. If the user had provided something like (sum(1,x,2)) in between, it would've generated the code to calculate 1 + x + 2. Et cetera. Of course, there might be a way to do stuff like this in run-time too - but it just might not be needed because of arrays. But let's see something that arrays are not able to do. To the generics.

I'd really really really very much like to have a generic function syntax which I could use like I can in C++. Only if it can be fitted into the existing D language, of course.

Like as follows:

print() { }

print(int i)    {    printf("%d", i);    }
print(float f)  {    printf("%f", f);    }
print(String s) {  /* print a string */  }
print(...)      { /* you get the point*/ }

// note we don't need a template parameter for tail since it's
// implicitly matched to be the right type
template<type T>
inline print(T head, rest tail)
{
    print(head);
    print(tail);
}

Then you could do, type-safely,

print("value of i is = ", i, "\n", /* as many arguments as you like */);

and the compiler will generate you nice code for that. Inline-only. Of course.

Of course, it goes without mentioning that C++ did this ages ago with its << syntax, which is IMHO kind of a cool thing but it turns out some people don't like it. Maybe this syntax is friendlier. It can probably be made better. Give me your best shots.

And please apply the same C++-style generic function syntax for to the examples above - you probably did - and see how it shines in proportion to any other generic syntax ever invented. It would be very nice to see that in D, too.

Oh yeah, let me warn you (regarding multiple return values), functions
return multiple return values might (or might now) also mess with
function pointer syntax and/or overloading issues.  (I'm thinking now
that since a i32 (*f)() must pass just one parameter, a i32, i32 (*f)()
could not be be used as one because it might clobber stack or registers
or whatever.  But on the other hand, if it uses an unused part of the
stack, what the hell - just leave the caller the first value and leave
to the programmer the problem that he has a function which does unused
computation.)

Say what you think.

Antti.

References:

(intel:ia32_vol1)
http://www.intel.com/design/pentium4/manuals/245470.htm

(intel:ia32_vol2)
http://www.intel.com/design/pentium4/manuals/245471.htm
Forums