Jump to page: 1 2
Thread overview
type variables
Aug 02, 2020
Bruce Carneal
Aug 02, 2020
Paul Backus
Aug 02, 2020
Bruce Carneal
Aug 02, 2020
Bruce Carneal
Aug 02, 2020
Stefan Koch
Aug 02, 2020
Bruce Carneal
Aug 02, 2020
Bruce Carneal
Aug 02, 2020
Paul Backus
Aug 02, 2020
Bruce Carneal
Aug 02, 2020
Stefan Koch
Aug 02, 2020
Paul Backus
Aug 02, 2020
Bruce Carneal
Aug 02, 2020
Stefan Koch
Aug 02, 2020
Paul Backus
Aug 02, 2020
Bruce Carneal
Aug 02, 2020
Paul Backus
Aug 02, 2020
Bruce Carneal
Aug 02, 2020
Stefan Koch
Aug 02, 2020
Bruce Carneal
August 02, 2020
Recent discussions in other threads, and at beerconf, have prompted thoughts about type variables.  Below is a snapshot of my thinking on the topic. Corrections/simplifications/additions are solicited, particularly from those who know more about this than I do.  (if you're in doubt, answer in the affirmative)


Type variables carry with them quite a few things.  Most understandably they carry structural information, how the builtin types and compile-time constants were combined to form this type.

They can also carry naming information: phobos Typedef, struct field names, method names, aliasings, scope information as to where/how the type was formed, ...

Current dlang type 'variables' are created in a strictly functional style.  IOW, they come from declarative syntax, CTFE constants, and possibly recursive application of templates.

Pure functional programming is great wrt correctness, it's working today, but it's not-so-great when it comes to readability/maintainability.  For starters, composition is a challenge.  Unwinding recursions in your head is another challenge.  Debugging is another challenge.

Additionally, any template recursions extend the type names in a very ugly way.  Yes, the extension will give you a unique (unintelligible ginormous) name but that's about it.  Seems that we should be able to get a unique mangle without the garbage, something that a human could read and have a prayer of understanding while the universe is still warm.

So, what if we had mutable type variables that were canonicalized/frozen/vetted by the compiler?

Mutations might initially be restricted to those that could be independently checked for correctness.  This could get us off the ground wrt composition and avoid a full (re)check when a correct/concrete type is needed.

Structural equivalence could be factored out from name equivalence, both for speed and functionality.

Type functions should become both more powerful and more readable.

I believe as Andrei apparently does, that we're still in the early phases of meta programming, that we need more not less meta capability, and that rebasing the compiler to, itself, be both simpler and more capable in the meta realm will pay big ongoing dividends.

As noted above, destruction requested.












August 02, 2020
On Sunday, 2 August 2020 at 00:19:46 UTC, Bruce Carneal wrote:
> Current dlang type 'variables' are created in a strictly functional style.  IOW, they come from declarative syntax, CTFE constants, and possibly recursive application of templates.

In D, currently, there are a few different ways you can refer to a type.

1) By its name.

    int x; // refers to `int` by name

2) By an alias.

    alias T = int;
    T x; // refers to `int` by the alias `T`

3) By a template parameter.

    template foo(T) {
        T x; // refers to any type by the parameter `T`
    }

4) By a typeof expression.

    typeof(42) x; // refers to `int` by the expression `typeof(42)`

5) By a __traits expression.

    struct Node {
        int data;
        __traits(parent, data)* next;
        // refers to `Node` by the expression `__traits(parent, data)`
    }

6) By a string mixin.

    mixin("int") x; // refers to `int` by the string mixin `mixin("int")`

By "type variables", I am going to assume you mean #2 and #3, aliases and template parameters, since they both involve giving an existing type a new name.

Currently, these "type variables" are immutable, in the sense that once a name is given to a type, the same name cannot later be given to a new type. For example, you are not allowed to write code like this:

    alias T = int;
    T x = 42;
    T = string; // error: can't assign to an alias
    T s = "hello";

In general, this is a good thing--having `T` refer to two different types at two different points in the program makes the code harder to read (because you have to keep track of which type it refers to) and harder to modify (because you have to make sure the changes to `T` and its uses remain in the correct order relative to one another).

However, in certain specific contexts, the ability to modify an alias or a template parameter can be useful. This is where proposals like type functions come in: they provide a context in which aliases like `T` can be mutated, while still leaving them immutable in "normal" D code.

> Pure functional programming is great wrt correctness, it's working today, but it's not-so-great when it comes to readability/maintainability.  For starters, composition is a challenge.  Unwinding recursions in your head is another challenge.  Debugging is another challenge.

The main issue with recursion is not that it is difficult to understand or maintain, but that it has poor performance. In order to process an argument list of length N using template recursion, N separate template instantiations are required. Using mutation instead of recursion would reduce the memory required by such templates from O(N) to O(1).

This is the primary motivation behind proposals like Stefan Koch's type functions and Manu's `...` operator.

> Additionally, any template recursions extend the type names in a very ugly way.  Yes, the extension will give you a unique (unintelligible ginormous) name but that's about it.  Seems that we should be able to get a unique mangle without the garbage, something that a human could read and have a prayer of understanding while the universe is still warm.

As far as I'm aware, the problem with ginormous names had nothing to do with template *recursion*. Rather, it resulted from the fact that code making heavy use of templates (for example, UFCS chains of std.algorithm and std.range functions) generated names in which the *same exact* type names were *repeated* many, many times.

If you know of an example of excessively large symbol names resulting *specifically* from template recursion, I would be interested to hear about it.
August 02, 2020
On Sunday, 2 August 2020 at 02:21:30 UTC, Paul Backus wrote:
> On Sunday, 2 August 2020 at 00:19:46 UTC, Bruce Carneal wrote:
>> Current dlang type 'variables' are created in a strictly functional style.  IOW, they come from declarative syntax, CTFE constants, and possibly recursive application of templates.
>
> In D, currently, there are a few different ways you can refer to a type.
>
> 1) By its name.
>
>     int x; // refers to `int` by name
>
> 2) By an alias.
..
> 3) By a template parameter.
..
> 4) By a typeof expression.
..
> 5) By a __traits expression.
..
> 6) By a string mixin.
..
> By "type variables", I am going to assume you mean #2 and #3, aliases and template parameters, since they both involve giving an existing type a new name.

I was thinking more about gaining access to a mutable form which could be converted to/from concrete types as represented by the compiler under the hood rather than the current methods of creating types in the source.  Note that "mixin" reduces to the others above and the others, of course, reduce to the compiler's internal form.

>
> Currently, these "type variables" are immutable, in the sense that once a name is given to a type, the same name cannot later be given to a new type. For example, you are not allowed to
...
>
> In general, this is a good thing--having `T` refer to two different types ...

Yes, unrestricted type mutation capability is a non goal. So today we have a large and growing zoo of forms that we can utilize to work with types.  A mutable class convertible to/from whatever the compiler is using "under the hood" might be a better way.  It might be used to implement the zoo while limiting further special casing.  Switching metaphors, we'd have less unsprung weight going forward.

>
> However, in certain specific contexts, the ability to modify an alias or a template parameter can be useful. This is where proposals like type functions come in: they provide a context in which aliases like `T` can be mutated, while still leaving them immutable in "normal" D code.

Yes, there are other ways to achieve better readability without exposing types as completely as I've sketched.

>
>> Pure functional programming is great wrt correctness, it's working today, but it's not-so-great when it comes to readability/maintainability.  For starters, composition is a challenge.  Unwinding recursions in your head is another challenge.  Debugging is another challenge.
>
> The main issue with recursion is not that it is difficult to understand or maintain, but that it has poor performance. In order to process an argument list of length N using template recursion, N separate template instantiations are required. Using mutation instead of recursion would reduce the memory required by such templates from O(N) to O(1).
>
> This is the primary motivation behind proposals like Stefan Koch's type functions and Manu's `...` operator.
>

Yes, I misspoke.  It's the potential expanse of the template invocations that's the issue.  Recursion tends to improve readability.

>> Additionally, any template recursions extend the type names in a very ugly way.  Yes, the extension will give you a unique (unintelligible ginormous) name but that's about it.  Seems that we should be able to get a unique mangle without the garbage, something that a human could read and have a prayer of understanding while the universe is still warm.
>
> As far as I'm aware, the problem with ginormous names had nothing to do with template *recursion*. Rather, it resulted from the fact that code making heavy use of templates (for example, UFCS chains of std.algorithm and std.range functions) generated names in which the *same exact* type names were *repeated* many, many times.
>
> If you know of an example of excessively large symbol names resulting *specifically* from template recursion, I would be interested to hear about it.

Yes, as above I misspoke here, it's not recursion that's the issue rather it's the mechanism that is employed wrt naming that could show up when recurring (but mostly elsewhere).

Factoring out naming from the "anonymous" structural aspects of types seems like a good way to go.  If people want to match on type structure for some reason, great.  If they want to create ginormous names, well, OK.

Thanks for the response.
















August 02, 2020
On Sunday, 2 August 2020 at 04:08:00 UTC, Bruce Carneal wrote:
[snip]
>
> Factoring out naming from the "anonymous" structural aspects of types seems like a good way to go.  If people want to match on type structure for some reason, great.  If they want to create ginormous names, well, OK.
>

Did some background reading on LLVM.  The LLVM type system provides both named and unnamed literal/structural forms.

Type equality checks within LLVM, both named and otherwise, appear to be non-trivial in general (saw a 2020 paper highlighting the screw-ups and workarounds).

Earlier LLVM work aimed to speed things up there by reducing the cost of "uniqueing" mutable type representations (turning deep/expensive equality checks in to pointer comparisons IIUC).  As a guess, DMD already does this type of thing, separating deeper commonalities from shallower differences (names, attributes, and the like).

Looks like many of LLVM's more recent problems stem from C/C++ related issues.  Again not a problem for DMD.

Still, even concrete/lowered type representation is much less a "solved" problem than I imagined if LLVM is anything to go by.

The improvements suggested by Manu and Stefan are looking pretty good.









August 02, 2020
On Sunday, 2 August 2020 at 06:36:44 UTC, Bruce Carneal wrote:

> Earlier LLVM work aimed to speed things up there by reducing the cost of "uniqueing" mutable type representations (turning deep/expensive equality checks in to pointer comparisons IIUC).
>  As a guess, DMD already does this type of thing, separating deeper commonalities from shallower differences (names, attributes, and the like).

No, dmd uses the mangle to compare types,
as the mangle has to be unique and identical for identical types.


> Looks like many of LLVM's more recent problems stem from C/C++ related issues.  Again not a problem for DMD.
>
> Still, even concrete/lowered type representation is much less a "solved" problem than I imagined if LLVM is anything to go by.

Hmm yes, everyone has their own type representation, as much as I like bashing LLVM
for this they can't be faulted as they try to be a general framework.

> The improvements suggested by Manu and Stefan are looking pretty good.

Thanks!
August 02, 2020
On Sunday, 2 August 2020 at 02:21:30 UTC, Paul Backus wrote:

>> Pure functional programming is great wrt correctness, it's working today, but it's not-so-great when it comes to readability/maintainability.  For starters, composition is a challenge.  Unwinding recursions in your head is another challenge.  Debugging is another challenge.
>
> The main issue with recursion is not that it is difficult to understand or maintain, but that it has poor performance. In order to process an argument list of length N using template recursion, N separate template instantiations are required. Using mutation instead of recursion would reduce the memory required by such templates from O(N) to O(1).
>
> This is the primary motivation behind proposals like Stefan Koch's type functions and Manu's `...` operator.

While performance is indeed the original motivation,
I come to appreciate being able to use imperative style
in type operations a lot.
It's always good to have the choice.

Take this for example.
int[] iota(int start, int finish)
{
    int[] result = [];
    result.length = finish - start;
    foreach(i;start .. finish)
    {
        result[i - start] = i;
    }
    return result;
}

Whereas the recursive function for this,
is something I do not even want to put on here.
Because it's way to complicated for this simple task.

>> Additionally, any template recursions extend the type names in a very ugly way.  Yes, the extension will give you a unique (unintelligible ginormous) name but that's about it.  Seems that we should be able to get a unique mangle without the garbage, something that a human could read and have a prayer of understanding while the universe is still warm.
>
> As far as I'm aware, the problem with ginormous names had nothing to do with template *recursion*. Rather, it resulted from the fact that code making heavy use of templates (for example, UFCS chains of std.algorithm and std.range functions) generated names in which the *same exact* type names were *repeated* many, many times.
>
> If you know of an example of excessively large symbol names resulting *specifically* from template recursion, I would be interested to hear about it.

There are some examples where, even with mangle compression the mangles are still execcsively long.
Large string swtiches are the easy demonstaration here.
Those are not due to recusion but recursion certainly does not help.

Because the intermediate templates are created explicitly and therefore can't be optimised away in the general(!) case.
August 02, 2020
On Sunday, 2 August 2020 at 12:16:32 UTC, Stefan Koch wrote:
> On Sunday, 2 August 2020 at 06:36:44 UTC, Bruce Carneal wrote:
>
>> Earlier LLVM work aimed to speed things up there by reducing the cost of "uniqueing" mutable type representations (turning deep/expensive equality checks in to pointer comparisons IIUC).
>>  As a guess, DMD already does this type of thing, separating deeper commonalities from shallower differences (names, attributes, and the like).
>
> No, dmd uses the mangle to compare types,
> as the mangle has to be unique and identical for identical types.

Thanks for this peek under the hood.  I've read a little of the sdc code but have not dipped in to dmd yet.  Should have done so before opining as I did.

>
>
>> Looks like many of LLVM's more recent problems stem from C/C++ related issues.  Again not a problem for DMD.
>>
>> Still, even concrete/lowered type representation is much less a "solved" problem than I imagined if LLVM is anything to go by.
>
> Hmm yes, everyone has their own type representation, as much as I like bashing LLVM
> for this they can't be faulted as they try to be a general framework.
>

Yes. Reportedly LLVM has had problems with a mutable type representation that is "uniqueified" after the fact.  IIUC, DMD simplifies this significantly by constraining the evolution of the internal type representation.  That constraint appears to be so strong, strongly pure IIUC, that unique names could be provided far upstream of any deep template invocations.

>> The improvements suggested by Manu and Stefan are looking pretty good.
>
> Thanks!


August 02, 2020
On Sunday, 2 August 2020 at 14:42:39 UTC, Bruce Carneal wrote:
> On Sunday, 2 August 2020 at 12:16:32 UTC, Stefan Koch wrote:
>>
>> Hmm yes, everyone has their own type representation, as much as I like bashing LLVM
>> for this they can't be faulted as they try to be a general framework.
>>
>
> Yes. Reportedly LLVM has had problems with a mutable type representation that is "uniqueified" after the fact.  IIUC, DMD simplifies this significantly by constraining the evolution of the internal type representation.  That constraint appears to be so strong, strongly pure IIUC, that unique names could be provided far upstream of any deep template invocations.
>

Separately, isn't structural comparison (comparison without names) important for template bloat reduction?  Does DMD use names and attributes to qualify access up front but reduce to low level equivalence before handing off to the back end?

Since you have to let the back end do deduplication in the LTO case my guess is that the front end doesn't waste time on it.



August 02, 2020
On Sunday, 2 August 2020 at 12:31:00 UTC, Stefan Koch wrote:
> Take this for example.
> int[] iota(int start, int finish)
> {
>     int[] result = [];
>     result.length = finish - start;
>     foreach(i;start .. finish)
>     {
>         result[i - start] = i;
>     }
>     return result;
> }
>
> Whereas the recursive function for this,
> is something I do not even want to put on here.
> Because it's way to complicated for this simple task.

In fact, the naive recursive version of iota is actually even simpler:

int[] iota(int start, int finish)
{
    if (start > finish)
        return [];
    else
        return [start] ~ iota(start + 1, finish);
}

Again, the problem with this function is not complexity or readability, but performance. It performs many unnecessary memory allocations, and consumes (finish - start) stack frames instead of 1.
August 02, 2020
On Sunday, 2 August 2020 at 15:04:07 UTC, Paul Backus wrote:
> On Sunday, 2 August 2020 at 12:31:00 UTC, Stefan Koch wrote:
>> Take this for example.
>> int[] iota(int start, int finish)
>> {
>>     int[] result = [];
>>     result.length = finish - start;
>>     foreach(i;start .. finish)
>>     {
>>         result[i - start] = i;
>>     }
>>     return result;
>> }
>>
>> Whereas the recursive function for this,
>> is something I do not even want to put on here.
>> Because it's way to complicated for this simple task.
>
> In fact, the naive recursive version of iota is actually even simpler:
>
> int[] iota(int start, int finish)
> {
>     if (start > finish)
>         return [];
>     else
>         return [start] ~ iota(start + 1, finish);
> }
>
> Again, the problem with this function is not complexity or readability, but performance. It performs many unnecessary memory allocations, and consumes (finish - start) stack frames instead of 1.

Yes.  Sorry to have misspoken wrt recursion early on in the thread.  As you note here recursion can be, and often is, easy on the eyes.  And, as you also note, that seductive simplicity sometimes masks a not-insignificant price.

« First   ‹ Prev
1 2