Thread overview | ||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
June 21, 2018 Phobos' std.conv.to-conversion from enum to string doesn't scale beyond hundreds of enumerators | ||||
---|---|---|---|---|
| ||||
I've discovered the annoying fact that std.conv.to doesn't scale for enum to string conversion when the enum has hundreds of members. This because of a call to `NoDuplicates` which has (at least) O(n*log(n) time and space complexity. So I've come up with /** Faster implementation of `std.conv.to`. */ string toString(T)(T value) @safe pure nothrow @nogc if (is(T == enum)) { final switch (value) { static foreach (member; __traits(allMembers, T)) { case __traits(getMember, T, member): return member; } } } /// @safe pure nothrow @nogc unittest { enum E { unknown, x, y, z, } assert(E.x.toString == "x"); assert(E.y.toString == "y"); assert(E.z.toString == "z"); } The question know is: How do I make this support enums with enumerator aliases without needing to call `NoDuplicates`? For instance, this should work /// @safe pure nothrow @nogc unittest { enum E { unknown, x, y, z, z_ = z, } assert(E.x.toString == "x"); assert(E.y.toString == "y"); assert(E.z.toString == "z"); assert(E.z_.toString == "z"); } |
June 21, 2018 Re: Phobos' std.conv.to-conversion from enum to string doesn't scale beyond hundreds of enumerators | ||||
---|---|---|---|---|
| ||||
Posted in reply to Per Nordlöw | On 6/21/18 6:46 PM, Per Nordlöw wrote:
> I've discovered the annoying fact that std.conv.to doesn't scale for enum to string conversion when the enum has hundreds of members. This because of a call to `NoDuplicates` which has (at least) O(n*log(n) time and space complexity.
>
> So I've come up with
>
>
> /** Faster implementation of `std.conv.to`.
> */
> string toString(T)(T value) @safe pure nothrow @nogc
> if (is(T == enum))
> {
> final switch (value)
> {
> static foreach (member; __traits(allMembers, T))
> {
> case __traits(getMember, T, member):
> return member;
> }
> }
> }
>
> ///
> @safe pure nothrow @nogc unittest
> {
> enum E { unknown, x, y, z, }
> assert(E.x.toString == "x");
> assert(E.y.toString == "y");
> assert(E.z.toString == "z");
> }
>
>
> The question know is: How do I make this support enums with enumerator aliases without needing to call `NoDuplicates`?
You can't. That's why it's in there. And I can't think of a better algorithm than O(nlgn). That's what it would cost to sort anyway.
The sucky thing is, the compiler is *already* doing a sort on the items in the switch, and *already* doing the duplicate check. It would be cool to be able to leverage this mechanism to avoid the library solution, but I don't know how we can do that, as the semantics for switch are well defined, and there's no other way to hook this builtin functionality.
One thing that may be worthwhile is checking to see if a CTFE solution is better than a template solution. In other words:
string[] dedup(string[] identifiers) { ... }
enum items = dedup(__traits(allMembers, T));
static foreach(member; items) ...
Just avoiding all the template symbol generation may make it worth it, even if the dedup function is O(n^2) complexity.
-Steve
|
June 22, 2018 Re: Phobos' std.conv.to-conversion from enum to string doesn't scale beyond hundreds of enumerators | ||||
---|---|---|---|---|
| ||||
Posted in reply to Steven Schveighoffer | On Friday, 22 June 2018 at 00:50:05 UTC, Steven Schveighoffer wrote: > On 6/21/18 6:46 PM, Per Nordlöw wrote: >> I've discovered the annoying fact that std.conv.to doesn't scale for enum to string conversion when the enum has hundreds of members. This because of a call to `NoDuplicates` which has (at least) O(n*log(n) time and space complexity. >> >> So I've come up with >> >> >> /** Faster implementation of `std.conv.to`. >> */ >> string toString(T)(T value) @safe pure nothrow @nogc >> if (is(T == enum)) >> { >> final switch (value) >> { >> static foreach (member; __traits(allMembers, T)) >> { >> case __traits(getMember, T, member): >> return member; >> } >> } >> } >> >> /// >> @safe pure nothrow @nogc unittest >> { >> enum E { unknown, x, y, z, } >> assert(E.x.toString == "x"); >> assert(E.y.toString == "y"); >> assert(E.z.toString == "z"); >> } >> >> >> The question know is: How do I make this support enums with enumerator aliases without needing to call `NoDuplicates`? > > You can't. That's why it's in there. And I can't think of a better algorithm than O(nlgn). That's what it would cost to sort anyway. > > The sucky thing is, the compiler is *already* doing a sort on the items in the switch, and *already* doing the duplicate check. It would be cool to be able to leverage this mechanism to avoid the library solution, but I don't know how we can do that, as the semantics for switch are well defined, and there's no other way to hook this builtin functionality. > > One thing that may be worthwhile is checking to see if a CTFE solution is better than a template solution. In other words: > > string[] dedup(string[] identifiers) { ... } > > enum items = dedup(__traits(allMembers, T)); > > static foreach(member; items) ... > > Just avoiding all the template symbol generation may make it worth it, even if the dedup function is O(n^2) complexity. > > -Steve I do have my own ctfe utils for this {https://forum.dlang.org/post/bfnwstkafhfgihavtzsz@forum.dlang.org}, however without dedup (because I'd rather be alerted when it happens) what you have to do is to sort the enum values. (I'd suggest a non-recursive mergesort.) However if there are no duplicates there's no need to deduplicate, the reason it is deduplicated it because the switch_statement forbids duplicates. I may add that this performs very well with newCTFE I don't know about the old engine, but I'd assume even there it's faster. |
June 22, 2018 Re: Phobos' std.conv.to-conversion from enum to string doesn't scale beyond hundreds of enumerators | ||||
---|---|---|---|---|
| ||||
Posted in reply to Stefan Koch | On 6/22/18 1:41 PM, Stefan Koch wrote: > On Friday, 22 June 2018 at 00:50:05 UTC, Steven Schveighoffer wrote: >> On 6/21/18 6:46 PM, Per Nordlöw wrote: >>> I've discovered the annoying fact that std.conv.to doesn't scale for enum to string conversion when the enum has hundreds of members. This because of a call to `NoDuplicates` which has (at least) O(n*log(n) time and space complexity. >>> >>> So I've come up with >>> >>> >>> /** Faster implementation of `std.conv.to`. >>> */ >>> string toString(T)(T value) @safe pure nothrow @nogc >>> if (is(T == enum)) >>> { >>> final switch (value) >>> { >>> static foreach (member; __traits(allMembers, T)) >>> { >>> case __traits(getMember, T, member): >>> return member; >>> } >>> } >>> } >>> >>> /// >>> @safe pure nothrow @nogc unittest >>> { >>> enum E { unknown, x, y, z, } >>> assert(E.x.toString == "x"); >>> assert(E.y.toString == "y"); >>> assert(E.z.toString == "z"); >>> } >>> >>> >>> The question know is: How do I make this support enums with enumerator aliases without needing to call `NoDuplicates`? >> >> You can't. That's why it's in there. And I can't think of a better algorithm than O(nlgn). That's what it would cost to sort anyway. >> >> The sucky thing is, the compiler is *already* doing a sort on the items in the switch, and *already* doing the duplicate check. It would be cool to be able to leverage this mechanism to avoid the library solution, but I don't know how we can do that, as the semantics for switch are well defined, and there's no other way to hook this builtin functionality. >> >> One thing that may be worthwhile is checking to see if a CTFE solution is better than a template solution. In other words: >> >> string[] dedup(string[] identifiers) { ... } >> >> enum items = dedup(__traits(allMembers, T)); >> >> static foreach(member; items) ... >> >> Just avoiding all the template symbol generation may make it worth it, even if the dedup function is O(n^2) complexity. >> > > > I do have my own ctfe utils for this > {https://forum.dlang.org/post/bfnwstkafhfgihavtzsz@forum.dlang.org}, however without dedup (because I'd rather be alerted when it happens) what you have to do is to sort the enum values. > (I'd suggest a non-recursive mergesort.) How will that perform in CTFE? I'm concerned about swapping values making it allocate new arrays all over the place. > However if there are no duplicates there's no need to deduplicate, the reason it is deduplicated it because the switch_statement forbids duplicates. If you can sort, deduplicating is as easy as checking if the value is the same as the previous. So really, we just need a good compile-time sort for strings. > I may add that this performs very well with newCTFE I don't know about the old engine, but I'd assume even there it's faster. I think your algorithm is roughly the same as Nordlow's. Basically you are using the compiler to sort the array (as it does anyway for a switch statement). -Steve |
June 22, 2018 Re: Phobos' std.conv.to-conversion from enum to string doesn't scale beyond hundreds of enumerators | ||||
---|---|---|---|---|
| ||||
Posted in reply to Steven Schveighoffer | On 6/22/18 2:15 PM, Steven Schveighoffer wrote: > On 6/22/18 1:41 PM, Stefan Koch wrote: >> (I'd suggest a non-recursive mergesort.) > > How will that perform in CTFE? I'm concerned about swapping values making it allocate new arrays all over the place. This kind of sounded like a fun experiment, so I tried it: https://run.dlang.io/gist/7c23567e0fa2e3b663d0b0695d3c257e No idea of the performance or capability given hundreds of enum members, but something new to try. -Steve |
June 22, 2018 Re: Phobos' std.conv.to-conversion from enum to string doesn't scale beyond hundreds of enumerators | ||||
---|---|---|---|---|
| ||||
Posted in reply to Steven Schveighoffer | On Friday, 22 June 2018 at 20:23:51 UTC, Steven Schveighoffer wrote:
> On 6/22/18 2:15 PM, Steven Schveighoffer wrote:
>> On 6/22/18 1:41 PM, Stefan Koch wrote:
>>> (I'd suggest a non-recursive mergesort.)
>>
>> How will that perform in CTFE? I'm concerned about swapping values making it allocate new arrays all over the place.
>
> This kind of sounded like a fun experiment, so I tried it:
>
> https://run.dlang.io/gist/7c23567e0fa2e3b663d0b0695d3c257e
>
> No idea of the performance or capability given hundreds of enum members, but something new to try.
>
> -Steve
run.dlang.io is down for me.
maybe you can provide a gh link?
|
June 22, 2018 Re: Phobos' std.conv.to-conversion from enum to string doesn't scale beyond hundreds of enumerators | ||||
---|---|---|---|---|
| ||||
Posted in reply to Stefan Koch | On 6/22/18 4:56 PM, Stefan Koch wrote: > On Friday, 22 June 2018 at 20:23:51 UTC, Steven Schveighoffer wrote: >> On 6/22/18 2:15 PM, Steven Schveighoffer wrote: >>> On 6/22/18 1:41 PM, Stefan Koch wrote: >>>> (I'd suggest a non-recursive mergesort.) >>> >>> How will that perform in CTFE? I'm concerned about swapping values making it allocate new arrays all over the place. >> >> This kind of sounded like a fun experiment, so I tried it: >> >> https://run.dlang.io/gist/7c23567e0fa2e3b663d0b0695d3c257e >> >> No idea of the performance or capability given hundreds of enum members, but something new to try. > > run.dlang.io is down for me. > maybe you can provide a gh link? > > https://gist.github.com/run-dlang/d023233104927e61ba69e79c03d266e4 -Steve |
June 24, 2018 Re: Phobos' std.conv.to-conversion from enum to string doesn't scale beyond hundreds of enumerators | ||||
---|---|---|---|---|
| ||||
Posted in reply to Steven Schveighoffer | On Friday, 22 June 2018 at 00:50:05 UTC, Steven Schveighoffer wrote:
> The sucky thing is, the compiler is *already* doing a sort on the items in the switch, and *already* doing the duplicate check. It would be cool to be able to leverage this mechanism to avoid the library solution, but I don't know how we can do that, as the semantics for switch are well defined, and there's no other way to hook this builtin functionality.
I would like to see a new trait named, say, `primaryMembers` or `nonAliasMembers` that returns exactly what the switch needs. I believe this is motivated by the fact that this is a serious issue; Without the user notices, the use of enums combined with .to!string or io sucks up more and more RAM in the compiler as new members are added.
If enough (hundreds) of members are added you can get out of RAM, which is what happened to me. What do you think about that idea?
We should plot compilation time and memory usage with enumerator count so we can reason about the severity of this issue.
|
June 24, 2018 Re: Phobos' std.conv.to-conversion from enum to string doesn't scale beyond hundreds of enumerators | ||||
---|---|---|---|---|
| ||||
Posted in reply to Stefan Koch | On Friday, 22 June 2018 at 20:56:58 UTC, Stefan Koch wrote:
>>> How will that perform in CTFE? I'm concerned about swapping values making it allocate new arrays all over the place.
What about creating a bit-array with the size of the enumerator count and use that to detect duplicates? How well would a mutating bitarray play in CTFE?
|
June 24, 2018 Re: Phobos' std.conv.to-conversion from enum to string doesn't scale beyond hundreds of enumerators | ||||
---|---|---|---|---|
| ||||
Posted in reply to Stefan Koch | On Friday, 22 June 2018 at 20:56:58 UTC, Stefan Koch wrote:
> On Friday, 22 June 2018 at 20:23:51 UTC, Steven Schveighoffer wrote:
>> On 6/22/18 2:15 PM, Steven Schveighoffer wrote:
>>> On 6/22/18 1:41 PM, Stefan Koch wrote:
>>>> (I'd suggest a non-recursive mergesort.)
>>>
>>> How will that perform in CTFE? I'm concerned about swapping values making it allocate new arrays all over the place.
>>
>> This kind of sounded like a fun experiment, so I tried it:
>>
>> https://run.dlang.io/gist/7c23567e0fa2e3b663d0b0695d3c257e
>>
>> No idea of the performance or capability given hundreds of enum members, but something new to try.
>>
>> -Steve
>
> run.dlang.io is down for me.
> maybe you can provide a gh link?
Huh? There was never any downtime.
Do you still have troubles? (if so, it would be good to figure out what's happening)
|
Copyright © 1999-2021 by the D Language Foundation