Jump to page: 1 2 3
Thread overview
Phobos' std.conv.to-conversion from enum to string doesn't scale beyond hundreds of enumerators
Jun 21, 2018
Per Nordlöw
Jun 22, 2018
Stefan Koch
Jun 22, 2018
Stefan Koch
Jun 24, 2018
Per Nordlöw
Jun 24, 2018
Seb
Jun 24, 2018
Per Nordlöw
Jun 24, 2018
Per Nordlöw
Jun 24, 2018
Per Nordlöw
Jun 24, 2018
Per Nordlöw
Jun 24, 2018
Per Nordlöw
Jun 24, 2018
Timoses
Jun 25, 2018
Jonathan M Davis
Jun 25, 2018
Per Nordlöw
Jun 25, 2018
Per Nordlöw
Jun 25, 2018
Jonathan M Davis
Jun 25, 2018
Timoses
Jun 25, 2018
Per Nordlöw
Jun 24, 2018
Per Nordlöw
Jun 24, 2018
Jonathan M Davis
Jun 24, 2018
Nordlöw
June 21, 2018
I've discovered the annoying fact that std.conv.to doesn't scale for enum to string conversion when the enum has hundreds of members. This because of a call to `NoDuplicates` which has (at least) O(n*log(n) time and space complexity.

So I've come up with


/** Faster implementation of `std.conv.to`.
 */
string toString(T)(T value) @safe pure nothrow @nogc
if (is(T == enum))
{
    final switch (value)
    {
        static foreach (member; __traits(allMembers, T))
        {
        case __traits(getMember, T, member):
            return member;
        }
    }
}

///
@safe pure nothrow @nogc unittest
{
    enum E { unknown, x, y, z, }
    assert(E.x.toString == "x");
    assert(E.y.toString == "y");
    assert(E.z.toString == "z");
}


The question know is: How do I make this support enums with enumerator aliases without needing to call `NoDuplicates`?

For instance, this should work

///
@safe pure nothrow @nogc unittest
{
    enum E { unknown, x, y, z, z_ = z, }
    assert(E.x.toString == "x");
    assert(E.y.toString == "y");
    assert(E.z.toString == "z");
    assert(E.z_.toString == "z");
}

June 21, 2018
On 6/21/18 6:46 PM, Per Nordlöw wrote:
> I've discovered the annoying fact that std.conv.to doesn't scale for enum to string conversion when the enum has hundreds of members. This because of a call to `NoDuplicates` which has (at least) O(n*log(n) time and space complexity.
> 
> So I've come up with
> 
> 
> /** Faster implementation of `std.conv.to`.
>   */
> string toString(T)(T value) @safe pure nothrow @nogc
> if (is(T == enum))
> {
>      final switch (value)
>      {
>          static foreach (member; __traits(allMembers, T))
>          {
>          case __traits(getMember, T, member):
>              return member;
>          }
>      }
> }
> 
> ///
> @safe pure nothrow @nogc unittest
> {
>      enum E { unknown, x, y, z, }
>      assert(E.x.toString == "x");
>      assert(E.y.toString == "y");
>      assert(E.z.toString == "z");
> }
> 
> 
> The question know is: How do I make this support enums with enumerator aliases without needing to call `NoDuplicates`?

You can't. That's why it's in there. And I can't think of a better algorithm than O(nlgn). That's what it would cost to sort anyway.

The sucky thing is, the compiler is *already* doing a sort on the items in the switch, and *already* doing the duplicate check. It would be cool to be able to leverage this mechanism to avoid the library solution, but I don't know how we can do that, as the semantics for switch are well defined, and there's no other way to hook this builtin functionality.

One thing that may be worthwhile is checking to see if a CTFE solution is better than a template solution. In other words:

string[] dedup(string[] identifiers) { ... }

enum items = dedup(__traits(allMembers, T));

static foreach(member; items) ...

Just avoiding all the template symbol generation may make it worth it, even if the dedup function is O(n^2) complexity.

-Steve
June 22, 2018
On Friday, 22 June 2018 at 00:50:05 UTC, Steven Schveighoffer wrote:
> On 6/21/18 6:46 PM, Per Nordlöw wrote:
>> I've discovered the annoying fact that std.conv.to doesn't scale for enum to string conversion when the enum has hundreds of members. This because of a call to `NoDuplicates` which has (at least) O(n*log(n) time and space complexity.
>> 
>> So I've come up with
>> 
>> 
>> /** Faster implementation of `std.conv.to`.
>>   */
>> string toString(T)(T value) @safe pure nothrow @nogc
>> if (is(T == enum))
>> {
>>      final switch (value)
>>      {
>>          static foreach (member; __traits(allMembers, T))
>>          {
>>          case __traits(getMember, T, member):
>>              return member;
>>          }
>>      }
>> }
>> 
>> ///
>> @safe pure nothrow @nogc unittest
>> {
>>      enum E { unknown, x, y, z, }
>>      assert(E.x.toString == "x");
>>      assert(E.y.toString == "y");
>>      assert(E.z.toString == "z");
>> }
>> 
>> 
>> The question know is: How do I make this support enums with enumerator aliases without needing to call `NoDuplicates`?
>
> You can't. That's why it's in there. And I can't think of a better algorithm than O(nlgn). That's what it would cost to sort anyway.
>
> The sucky thing is, the compiler is *already* doing a sort on the items in the switch, and *already* doing the duplicate check. It would be cool to be able to leverage this mechanism to avoid the library solution, but I don't know how we can do that, as the semantics for switch are well defined, and there's no other way to hook this builtin functionality.
>
> One thing that may be worthwhile is checking to see if a CTFE solution is better than a template solution. In other words:
>
> string[] dedup(string[] identifiers) { ... }
>
> enum items = dedup(__traits(allMembers, T));
>
> static foreach(member; items) ...
>
> Just avoiding all the template symbol generation may make it worth it, even if the dedup function is O(n^2) complexity.
>
> -Steve


I do have my own ctfe utils for this
{https://forum.dlang.org/post/bfnwstkafhfgihavtzsz@forum.dlang.org}, however without dedup (because I'd rather be alerted when it happens) what you have to do is to sort the enum values.
(I'd suggest a non-recursive mergesort.)
However if there are no duplicates there's no need to deduplicate, the reason it is deduplicated it because the switch_statement forbids duplicates.

I may add that this performs very well with newCTFE I don't know about the old engine, but I'd assume even there it's faster.
June 22, 2018
On 6/22/18 1:41 PM, Stefan Koch wrote:
> On Friday, 22 June 2018 at 00:50:05 UTC, Steven Schveighoffer wrote:
>> On 6/21/18 6:46 PM, Per Nordlöw wrote:
>>> I've discovered the annoying fact that std.conv.to doesn't scale for enum to string conversion when the enum has hundreds of members. This because of a call to `NoDuplicates` which has (at least) O(n*log(n) time and space complexity.
>>>
>>> So I've come up with
>>>
>>>
>>> /** Faster implementation of `std.conv.to`.
>>>   */
>>> string toString(T)(T value) @safe pure nothrow @nogc
>>> if (is(T == enum))
>>> {
>>>      final switch (value)
>>>      {
>>>          static foreach (member; __traits(allMembers, T))
>>>          {
>>>          case __traits(getMember, T, member):
>>>              return member;
>>>          }
>>>      }
>>> }
>>>
>>> ///
>>> @safe pure nothrow @nogc unittest
>>> {
>>>      enum E { unknown, x, y, z, }
>>>      assert(E.x.toString == "x");
>>>      assert(E.y.toString == "y");
>>>      assert(E.z.toString == "z");
>>> }
>>>
>>>
>>> The question know is: How do I make this support enums with enumerator aliases without needing to call `NoDuplicates`?
>>
>> You can't. That's why it's in there. And I can't think of a better algorithm than O(nlgn). That's what it would cost to sort anyway.
>>
>> The sucky thing is, the compiler is *already* doing a sort on the items in the switch, and *already* doing the duplicate check. It would be cool to be able to leverage this mechanism to avoid the library solution, but I don't know how we can do that, as the semantics for switch are well defined, and there's no other way to hook this builtin functionality.
>>
>> One thing that may be worthwhile is checking to see if a CTFE solution is better than a template solution. In other words:
>>
>> string[] dedup(string[] identifiers) { ... }
>>
>> enum items = dedup(__traits(allMembers, T));
>>
>> static foreach(member; items) ...
>>
>> Just avoiding all the template symbol generation may make it worth it, even if the dedup function is O(n^2) complexity.
>>
> 
> 
> I do have my own ctfe utils for this
> {https://forum.dlang.org/post/bfnwstkafhfgihavtzsz@forum.dlang.org}, however without dedup (because I'd rather be alerted when it happens) what you have to do is to sort the enum values.
> (I'd suggest a non-recursive mergesort.)

How will that perform in CTFE? I'm concerned about swapping values making it allocate new arrays all over the place.

> However if there are no duplicates there's no need to deduplicate, the reason it is deduplicated it because the switch_statement forbids duplicates.

If you can sort, deduplicating is as easy as checking if the value is the same as the previous. So really, we just need a good compile-time sort for strings.

> I may add that this performs very well with newCTFE I don't know about the old engine, but I'd assume even there it's faster.
I think your algorithm is roughly the same as Nordlow's. Basically you are using the compiler to sort the array (as it does anyway for a switch statement).

-Steve
June 22, 2018
On 6/22/18 2:15 PM, Steven Schveighoffer wrote:
> On 6/22/18 1:41 PM, Stefan Koch wrote:
>> (I'd suggest a non-recursive mergesort.)
> 
> How will that perform in CTFE? I'm concerned about swapping values making it allocate new arrays all over the place.

This kind of sounded like a fun experiment, so I tried it:

https://run.dlang.io/gist/7c23567e0fa2e3b663d0b0695d3c257e

No idea of the performance or capability given hundreds of enum members, but something new to try.

-Steve
June 22, 2018
On Friday, 22 June 2018 at 20:23:51 UTC, Steven Schveighoffer wrote:
> On 6/22/18 2:15 PM, Steven Schveighoffer wrote:
>> On 6/22/18 1:41 PM, Stefan Koch wrote:
>>> (I'd suggest a non-recursive mergesort.)
>> 
>> How will that perform in CTFE? I'm concerned about swapping values making it allocate new arrays all over the place.
>
> This kind of sounded like a fun experiment, so I tried it:
>
> https://run.dlang.io/gist/7c23567e0fa2e3b663d0b0695d3c257e
>
> No idea of the performance or capability given hundreds of enum members, but something new to try.
>
> -Steve

run.dlang.io is down for me.
maybe you can provide a gh link?


June 22, 2018
On 6/22/18 4:56 PM, Stefan Koch wrote:
> On Friday, 22 June 2018 at 20:23:51 UTC, Steven Schveighoffer wrote:
>> On 6/22/18 2:15 PM, Steven Schveighoffer wrote:
>>> On 6/22/18 1:41 PM, Stefan Koch wrote:
>>>> (I'd suggest a non-recursive mergesort.)
>>>
>>> How will that perform in CTFE? I'm concerned about swapping values making it allocate new arrays all over the place.
>>
>> This kind of sounded like a fun experiment, so I tried it:
>>
>> https://run.dlang.io/gist/7c23567e0fa2e3b663d0b0695d3c257e
>>
>> No idea of the performance or capability given hundreds of enum members, but something new to try.
> 
> run.dlang.io is down for me.
> maybe you can provide a gh link?
> 
> 

https://gist.github.com/run-dlang/d023233104927e61ba69e79c03d266e4

-Steve
June 24, 2018
On Friday, 22 June 2018 at 00:50:05 UTC, Steven Schveighoffer wrote:
> The sucky thing is, the compiler is *already* doing a sort on the items in the switch, and *already* doing the duplicate check. It would be cool to be able to leverage this mechanism to avoid the library solution, but I don't know how we can do that, as the semantics for switch are well defined, and there's no other way to hook this builtin functionality.

I would like to see a new trait named, say, `primaryMembers` or `nonAliasMembers` that returns exactly what the switch needs. I believe this is motivated by the fact that this is a serious issue; Without the user notices, the use of enums combined with .to!string or io sucks up more and more RAM in the compiler as new members are added.

If enough (hundreds) of members are added you can get out of RAM, which is what happened to me. What do you think about that idea?

We should plot compilation time and memory usage with enumerator count so we can reason about the severity of this issue.
June 24, 2018
On Friday, 22 June 2018 at 20:56:58 UTC, Stefan Koch wrote:
>>> How will that perform in CTFE? I'm concerned about swapping values making it allocate new arrays all over the place.

What about creating a bit-array with the size of the enumerator count and use that to detect duplicates? How well would a mutating bitarray play in CTFE?
June 24, 2018
On Friday, 22 June 2018 at 20:56:58 UTC, Stefan Koch wrote:
> On Friday, 22 June 2018 at 20:23:51 UTC, Steven Schveighoffer wrote:
>> On 6/22/18 2:15 PM, Steven Schveighoffer wrote:
>>> On 6/22/18 1:41 PM, Stefan Koch wrote:
>>>> (I'd suggest a non-recursive mergesort.)
>>> 
>>> How will that perform in CTFE? I'm concerned about swapping values making it allocate new arrays all over the place.
>>
>> This kind of sounded like a fun experiment, so I tried it:
>>
>> https://run.dlang.io/gist/7c23567e0fa2e3b663d0b0695d3c257e
>>
>> No idea of the performance or capability given hundreds of enum members, but something new to try.
>>
>> -Steve
>
> run.dlang.io is down for me.
> maybe you can provide a gh link?

Huh? There was never any downtime.
Do you still have troubles? (if so, it would be good to figure out what's happening)
« First   ‹ Prev
1 2 3