Phobos' std.conv.to-conversion from enum to string doesn't scale beyond hundreds of enumerators - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » Phobos' std.conv.to-conversion from enum to string doesn't scale beyond hundreds of enumerators

Thread overview

Phobos' std.conv.to-conversion from enum to string doesn't scale beyond hundreds of enumerators
Jun 21, 2018 Per Nordlöw
Jun 22, 2018 Steven Schveighoffer
Jun 22, 2018 Stefan Koch
Jun 22, 2018 Steven Schveighoffer
Jun 22, 2018 Steven Schveighoffer
Jun 22, 2018 Stefan Koch
Jun 22, 2018 Steven Schveighoffer
Jun 24, 2018 Per Nordlöw
Jun 24, 2018 Seb
Jun 24, 2018 Per Nordlöw
Jun 24, 2018 Steven Schveighoffer
Jun 24, 2018 Per Nordlöw
Jun 24, 2018 Per Nordlöw
Jun 24, 2018 Per Nordlöw
Jun 24, 2018 Steven Schveighoffer
Jun 24, 2018 Per Nordlöw
Jun 24, 2018 Timoses
Jun 25, 2018 Jonathan M Davis
Jun 25, 2018 Per Nordlöw
Jun 25, 2018 Per Nordlöw
Jun 25, 2018 Jonathan M Davis
Jun 25, 2018 Timoses
Jun 25, 2018 Per Nordlöw
Jun 24, 2018 Per Nordlöw
Jun 24, 2018 Jonathan M Davis
Jun 24, 2018 Nordlöw

June 21, 2018

Phobos' std.conv.to-conversion from enum to string doesn't scale beyond hundreds of enumerators

Posted by Per Nordlöw

Per Nordlöw

I've discovered the annoying fact that std.conv.to doesn't scale for enum to string conversion when the enum has hundreds of members. This because of a call to `NoDuplicates` which has (at least) O(n*log(n) time and space complexity.

So I've come up with


/** Faster implementation of `std.conv.to`.
 */
string toString(T)(T value) @safe pure nothrow @nogc
if (is(T == enum))
{
    final switch (value)
    {
        static foreach (member; __traits(allMembers, T))
        {
        case __traits(getMember, T, member):
            return member;
        }
    }
}

///
@safe pure nothrow @nogc unittest
{
    enum E { unknown, x, y, z, }
    assert(E.x.toString == "x");
    assert(E.y.toString == "y");
    assert(E.z.toString == "z");
}


The question know is: How do I make this support enums with enumerator aliases without needing to call `NoDuplicates`?

For instance, this should work

///
@safe pure nothrow @nogc unittest
{
    enum E { unknown, x, y, z, z_ = z, }
    assert(E.x.toString == "x");
    assert(E.y.toString == "y");
    assert(E.z.toString == "z");
    assert(E.z_.toString == "z");
}

June 21, 2018

Re: Phobos' std.conv.to-conversion from enum to string doesn't scale beyond hundreds of enumerators

Posted by Steven Schveighoffer
in reply to Per Nordlöw

Steven Schveighoffer

Posted in reply to Per Nordlöw

On 6/21/18 6:46 PM, Per Nordlöw wrote:
> I've discovered the annoying fact that std.conv.to doesn't scale for enum to string conversion when the enum has hundreds of members. This because of a call to `NoDuplicates` which has (at least) O(n*log(n) time and space complexity.
> 
> So I've come up with
> 
> 
> /** Faster implementation of `std.conv.to`.
>   */
> string toString(T)(T value) @safe pure nothrow @nogc
> if (is(T == enum))
> {
>      final switch (value)
>      {
>          static foreach (member; __traits(allMembers, T))
>          {
>          case __traits(getMember, T, member):
>              return member;
>          }
>      }
> }
> 
> ///
> @safe pure nothrow @nogc unittest
> {
>      enum E { unknown, x, y, z, }
>      assert(E.x.toString == "x");
>      assert(E.y.toString == "y");
>      assert(E.z.toString == "z");
> }
> 
> 
> The question know is: How do I make this support enums with enumerator aliases without needing to call `NoDuplicates`?

You can't. That's why it's in there. And I can't think of a better algorithm than O(nlgn). That's what it would cost to sort anyway.

The sucky thing is, the compiler is *already* doing a sort on the items in the switch, and *already* doing the duplicate check. It would be cool to be able to leverage this mechanism to avoid the library solution, but I don't know how we can do that, as the semantics for switch are well defined, and there's no other way to hook this builtin functionality.

One thing that may be worthwhile is checking to see if a CTFE solution is better than a template solution. In other words:

string[] dedup(string[] identifiers) { ... }

enum items = dedup(__traits(allMembers, T));

static foreach(member; items) ...

Just avoiding all the template symbol generation may make it worth it, even if the dedup function is O(n^2) complexity.

-Steve

June 22, 2018

Re: Phobos' std.conv.to-conversion from enum to string doesn't scale beyond hundreds of enumerators

Posted by Stefan Koch
in reply to Steven Schveighoffer

Stefan Koch

Posted in reply to Steven Schveighoffer

On Friday, 22 June 2018 at 00:50:05 UTC, Steven Schveighoffer wrote:
> On 6/21/18 6:46 PM, Per Nordlöw wrote:
>> I've discovered the annoying fact that std.conv.to doesn't scale for enum to string conversion when the enum has hundreds of members. This because of a call to `NoDuplicates` which has (at least) O(n*log(n) time and space complexity.
>> 
>> So I've come up with
>> 
>> 
>> /** Faster implementation of `std.conv.to`.
>>   */
>> string toString(T)(T value) @safe pure nothrow @nogc
>> if (is(T == enum))
>> {
>>      final switch (value)
>>      {
>>          static foreach (member; __traits(allMembers, T))
>>          {
>>          case __traits(getMember, T, member):
>>              return member;
>>          }
>>      }
>> }
>> 
>> ///
>> @safe pure nothrow @nogc unittest
>> {
>>      enum E { unknown, x, y, z, }
>>      assert(E.x.toString == "x");
>>      assert(E.y.toString == "y");
>>      assert(E.z.toString == "z");
>> }
>> 
>> 
>> The question know is: How do I make this support enums with enumerator aliases without needing to call `NoDuplicates`?
>
> You can't. That's why it's in there. And I can't think of a better algorithm than O(nlgn). That's what it would cost to sort anyway.
>
> The sucky thing is, the compiler is *already* doing a sort on the items in the switch, and *already* doing the duplicate check. It would be cool to be able to leverage this mechanism to avoid the library solution, but I don't know how we can do that, as the semantics for switch are well defined, and there's no other way to hook this builtin functionality.
>
> One thing that may be worthwhile is checking to see if a CTFE solution is better than a template solution. In other words:
>
> string[] dedup(string[] identifiers) { ... }
>
> enum items = dedup(__traits(allMembers, T));
>
> static foreach(member; items) ...
>
> Just avoiding all the template symbol generation may make it worth it, even if the dedup function is O(n^2) complexity.
>
> -Steve


I do have my own ctfe utils for this
{https://forum.dlang.org/post/bfnwstkafhfgihavtzsz@forum.dlang.org}, however without dedup (because I'd rather be alerted when it happens) what you have to do is to sort the enum values.
(I'd suggest a non-recursive mergesort.)
However if there are no duplicates there's no need to deduplicate, the reason it is deduplicated it because the switch_statement forbids duplicates.

I may add that this performs very well with newCTFE I don't know about the old engine, but I'd assume even there it's faster.

June 22, 2018

Re: Phobos' std.conv.to-conversion from enum to string doesn't scale beyond hundreds of enumerators

Posted by Steven Schveighoffer
in reply to Stefan Koch

Steven Schveighoffer

Posted in reply to Stefan Koch

On 6/22/18 1:41 PM, Stefan Koch wrote:
> On Friday, 22 June 2018 at 00:50:05 UTC, Steven Schveighoffer wrote:
>> On 6/21/18 6:46 PM, Per Nordlöw wrote:
>>> I've discovered the annoying fact that std.conv.to doesn't scale for enum to string conversion when the enum has hundreds of members. This because of a call to `NoDuplicates` which has (at least) O(n*log(n) time and space complexity.
>>>
>>> So I've come up with
>>>
>>>
>>> /** Faster implementation of `std.conv.to`.
>>>   */
>>> string toString(T)(T value) @safe pure nothrow @nogc
>>> if (is(T == enum))
>>> {
>>>      final switch (value)
>>>      {
>>>          static foreach (member; __traits(allMembers, T))
>>>          {
>>>          case __traits(getMember, T, member):
>>>              return member;
>>>          }
>>>      }
>>> }
>>>
>>> ///
>>> @safe pure nothrow @nogc unittest
>>> {
>>>      enum E { unknown, x, y, z, }
>>>      assert(E.x.toString == "x");
>>>      assert(E.y.toString == "y");
>>>      assert(E.z.toString == "z");
>>> }
>>>
>>>
>>> The question know is: How do I make this support enums with enumerator aliases without needing to call `NoDuplicates`?
>>
>> You can't. That's why it's in there. And I can't think of a better algorithm than O(nlgn). That's what it would cost to sort anyway.
>>
>> The sucky thing is, the compiler is *already* doing a sort on the items in the switch, and *already* doing the duplicate check. It would be cool to be able to leverage this mechanism to avoid the library solution, but I don't know how we can do that, as the semantics for switch are well defined, and there's no other way to hook this builtin functionality.
>>
>> One thing that may be worthwhile is checking to see if a CTFE solution is better than a template solution. In other words:
>>
>> string[] dedup(string[] identifiers) { ... }
>>
>> enum items = dedup(__traits(allMembers, T));
>>
>> static foreach(member; items) ...
>>
>> Just avoiding all the template symbol generation may make it worth it, even if the dedup function is O(n^2) complexity.
>>
> 
> 
> I do have my own ctfe utils for this
> {https://forum.dlang.org/post/bfnwstkafhfgihavtzsz@forum.dlang.org}, however without dedup (because I'd rather be alerted when it happens) what you have to do is to sort the enum values.
> (I'd suggest a non-recursive mergesort.)

How will that perform in CTFE? I'm concerned about swapping values making it allocate new arrays all over the place.

> However if there are no duplicates there's no need to deduplicate, the reason it is deduplicated it because the switch_statement forbids duplicates.

If you can sort, deduplicating is as easy as checking if the value is the same as the previous. So really, we just need a good compile-time sort for strings.

> I may add that this performs very well with newCTFE I don't know about the old engine, but I'd assume even there it's faster.
I think your algorithm is roughly the same as Nordlow's. Basically you are using the compiler to sort the array (as it does anyway for a switch statement).

-Steve

June 22, 2018

Re: Phobos' std.conv.to-conversion from enum to string doesn't scale beyond hundreds of enumerators

Posted by Steven Schveighoffer
in reply to Steven Schveighoffer

Steven Schveighoffer

Posted in reply to Steven Schveighoffer

On 6/22/18 2:15 PM, Steven Schveighoffer wrote:
> On 6/22/18 1:41 PM, Stefan Koch wrote:
>> (I'd suggest a non-recursive mergesort.)
> 
> How will that perform in CTFE? I'm concerned about swapping values making it allocate new arrays all over the place.

This kind of sounded like a fun experiment, so I tried it:

https://run.dlang.io/gist/7c23567e0fa2e3b663d0b0695d3c257e

No idea of the performance or capability given hundreds of enum members, but something new to try.

-Steve

June 22, 2018

Re: Phobos' std.conv.to-conversion from enum to string doesn't scale beyond hundreds of enumerators

Posted by Stefan Koch
in reply to Steven Schveighoffer

Stefan Koch

Posted in reply to Steven Schveighoffer

On Friday, 22 June 2018 at 20:23:51 UTC, Steven Schveighoffer wrote:
> On 6/22/18 2:15 PM, Steven Schveighoffer wrote:
>> On 6/22/18 1:41 PM, Stefan Koch wrote:
>>> (I'd suggest a non-recursive mergesort.)
>> 
>> How will that perform in CTFE? I'm concerned about swapping values making it allocate new arrays all over the place.
>
> This kind of sounded like a fun experiment, so I tried it:
>
> https://run.dlang.io/gist/7c23567e0fa2e3b663d0b0695d3c257e
>
> No idea of the performance or capability given hundreds of enum members, but something new to try.
>
> -Steve

run.dlang.io is down for me.
maybe you can provide a gh link?

June 22, 2018

Re: Phobos' std.conv.to-conversion from enum to string doesn't scale beyond hundreds of enumerators

Posted by Steven Schveighoffer
in reply to Stefan Koch

Steven Schveighoffer

Posted in reply to Stefan Koch

On 6/22/18 4:56 PM, Stefan Koch wrote:
> On Friday, 22 June 2018 at 20:23:51 UTC, Steven Schveighoffer wrote:
>> On 6/22/18 2:15 PM, Steven Schveighoffer wrote:
>>> On 6/22/18 1:41 PM, Stefan Koch wrote:
>>>> (I'd suggest a non-recursive mergesort.)
>>>
>>> How will that perform in CTFE? I'm concerned about swapping values making it allocate new arrays all over the place.
>>
>> This kind of sounded like a fun experiment, so I tried it:
>>
>> https://run.dlang.io/gist/7c23567e0fa2e3b663d0b0695d3c257e
>>
>> No idea of the performance or capability given hundreds of enum members, but something new to try.
> 
> run.dlang.io is down for me.
> maybe you can provide a gh link?
> 
> 

https://gist.github.com/run-dlang/d023233104927e61ba69e79c03d266e4

-Steve

June 24, 2018

Re: Phobos' std.conv.to-conversion from enum to string doesn't scale beyond hundreds of enumerators

Posted by Per Nordlöw
in reply to Steven Schveighoffer

Per Nordlöw

Posted in reply to Steven Schveighoffer

On Friday, 22 June 2018 at 00:50:05 UTC, Steven Schveighoffer wrote:
> The sucky thing is, the compiler is *already* doing a sort on the items in the switch, and *already* doing the duplicate check. It would be cool to be able to leverage this mechanism to avoid the library solution, but I don't know how we can do that, as the semantics for switch are well defined, and there's no other way to hook this builtin functionality.

I would like to see a new trait named, say, `primaryMembers` or `nonAliasMembers` that returns exactly what the switch needs. I believe this is motivated by the fact that this is a serious issue; Without the user notices, the use of enums combined with .to!string or io sucks up more and more RAM in the compiler as new members are added.

If enough (hundreds) of members are added you can get out of RAM, which is what happened to me. What do you think about that idea?

We should plot compilation time and memory usage with enumerator count so we can reason about the severity of this issue.

June 24, 2018

Re: Phobos' std.conv.to-conversion from enum to string doesn't scale beyond hundreds of enumerators

Posted by Per Nordlöw
in reply to Stefan Koch

Per Nordlöw

Posted in reply to Stefan Koch

On Friday, 22 June 2018 at 20:56:58 UTC, Stefan Koch wrote:
>>> How will that perform in CTFE? I'm concerned about swapping values making it allocate new arrays all over the place.

What about creating a bit-array with the size of the enumerator count and use that to detect duplicates? How well would a mutating bitarray play in CTFE?

June 24, 2018

Re: Phobos' std.conv.to-conversion from enum to string doesn't scale beyond hundreds of enumerators

Posted by Seb
in reply to Stefan Koch

Seb

Posted in reply to Stefan Koch

On Friday, 22 June 2018 at 20:56:58 UTC, Stefan Koch wrote:
> On Friday, 22 June 2018 at 20:23:51 UTC, Steven Schveighoffer wrote:
>> On 6/22/18 2:15 PM, Steven Schveighoffer wrote:
>>> On 6/22/18 1:41 PM, Stefan Koch wrote:
>>>> (I'd suggest a non-recursive mergesort.)
>>> 
>>> How will that perform in CTFE? I'm concerned about swapping values making it allocate new arrays all over the place.
>>
>> This kind of sounded like a fun experiment, so I tried it:
>>
>> https://run.dlang.io/gist/7c23567e0fa2e3b663d0b0695d3c257e
>>
>> No idea of the performance or capability given hundreds of enum members, but something new to try.
>>
>> -Steve
>
> run.dlang.io is down for me.
> maybe you can provide a gh link?

Huh? There was never any downtime.
Do you still have troubles? (if so, it would be good to figure out what's happening)

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation