Thread overview
[Issue 9821] New: Smarter conversion of strings to enums
Mar 26, 2013
Jared Miller
Mar 26, 2013
Andrej Mitrovic
Mar 26, 2013
Jared Miller
Mar 27, 2013
Jared Miller
Mar 27, 2013
Jonathan M Davis
Mar 27, 2013
Jared Miller
March 26, 2013
http://d.puremagic.com/issues/show_bug.cgi?id=9821

           Summary: Smarter conversion of strings to enums
           Product: D
           Version: D2
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: Phobos
        AssignedTo: nobody@puremagic.com
        ReportedBy: jared@economicmodeling.com


--- Comment #0 from Jared Miller <jared@economicmodeling.com> 2013-03-26 13:01:24 PDT ---
Currently std.conv.to requires a string to be the *member name* of an enum in order to convert it. However, a standard use case when sharing data is to serialize enum variables using the underlying type of the enum, since different programs should not be expected to use the same enum naming scheme internally. The std.conv.to template currently cannot handle such a conversion (from string version of underlying type to the enum type). This also requires a workaround going the other direction (i.e., converting enum values to strings). In order to serialize data in a portable manner, you shouldn't emit enum values as the string representation of the symbols used in the source code.

This is a significant annoyance that surfaces in std.csv.csvReader, which requires all data going into an enum to be serialized as the enum member name, not the string representation of its underlying type.

Example and my current workaround:

-------------
import std.algorithm, std.conv, std.stdio, std.string, std.traits;

enum MyEnum {
    Foo = 1,
    Baz = 7
}

void main()
{
    writeln( to!MyEnum(7) );              // ok.
    writeln( to!MyEnum("Baz") );          // ok.
    try {
        writeln( to!MyEnum("7") );        // throws
    }
    catch(ConvException  e) {
        writeln( e.msg );
    }

    writeln( strToEnum!MyEnum("7") );     // ok.
    writeln( strToEnum!MyEnum("Baz") );   // ok.
}

/*
 * Current workaround for a smarter conversion.
 */
E strToEnum(E, S)(S str)
    if(is(E == enum) && isSomeString!S)
{
    if(countUntil([__traits(allMembers,E)], str) > -1)
        return to!E(str);
    else {
        auto underlyingValue = to!(OriginalType!E)(str);
        if(countUntil([EnumMembers!E], underlyingValue) > -1)
            return cast(E)(underlyingValue);
        else
            throw new ConvException(format(
                    "Value '%s' cannot be converted to enum %s",
                    underlyingValue, E.stringof));
    }
}

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
March 26, 2013
http://d.puremagic.com/issues/show_bug.cgi?id=9821


Andrej Mitrovic <andrej.mitrovich@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |andrej.mitrovich@gmail.com


--- Comment #1 from Andrej Mitrovic <andrej.mitrovich@gmail.com> 2013-03-26 14:01:11 PDT ---
It would have to become a new function, not std.conv.to, see https://github.com/D-Programming-Language/phobos/pull/897

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
March 26, 2013
http://d.puremagic.com/issues/show_bug.cgi?id=9821



--- Comment #2 from Jared Miller <jared@economicmodeling.com> 2013-03-26 16:43:27 PDT ---
Perhaps I'm being naive, but why not modify the current string-to-enum parse()
overload so that it (1) first tries to convert using enum member names, as it
currently does, and *only if* that fails, then (2) tries to convert the string
to the enum base type?

The current functionality would remain the same, except that the conversion would succeed instead of failing in those cases where a string => base type => enum conversion is possible. (Sorry if this isn't the formal way of submitting a patch; it's more of an explanation of what I mean.)

2126,2129c2126,2148
<
<     throw new ConvException(
<         Target.stringof ~ " does not have a member named '"
<         ~ to!string(s) ~ "'");
---
>     else
>     {
>         OriginalType!Target baseVal;
>         try {
>             baseVal = to!(OriginalType!Target)(s);
>         }
>         catch(ConvException e) {
>             throw new ConvException(
>                 "'" ~ to!string(s) ~ "' is not a member name of "
>                 ~ Target.stringof ~ " and is not convertible to "
>                 ~ (OriginalType!Target).stringof );
>         }
>         if(countUntil([EnumMembers!Target], baseVal) != -1)
>         {
>             return cast(Target)(baseVal);
>         }
>         else
>         {
>             throw new ConvException(
>                 "'" ~ to!string(s) ~ "' is not a member name or value of "
>                 ~ Target.stringof);
>         }
>     }

If this is not desirable, I would be okay with closing this issue and filing one for std.csv.csvReader to work around it, since that's mainly where it really causes problems (and possibly in other deserialization code like std.json too). At my workplace we've hit the same issue with reading from databases and have a special case to handle enums.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
March 26, 2013
http://d.puremagic.com/issues/show_bug.cgi?id=9821


bearophile_hugs@eml.cc changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |bearophile_hugs@eml.cc


--- Comment #3 from bearophile_hugs@eml.cc 2013-03-26 16:59:43 PDT ---
(In reply to comment #2)
> Perhaps I'm being naive, but why not modify the current string-to-enum parse()
> overload so that it (1) first tries to convert using enum member names, as it
> currently does, and *only if* that fails, then (2) tries to convert the string
> to the enum base type?

This is a bad idea. It's much better to keep the semantics tidy, to avoid troubles down the line.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
March 27, 2013
http://d.puremagic.com/issues/show_bug.cgi?id=9821



--- Comment #4 from Jared Miller <jared@economicmodeling.com> 2013-03-26 17:36:17 PDT ---
> This is a bad idea. It's much better to keep the semantics tidy, to avoid troubles down the line.

Sure, I understand if there's reluctance to change the meaning of to() in this case. It's just unfortunate that for enums the original decision makes it harder to work with real-world data (I admit it's nice to read on-screen).

But there should be at least some alternative function for converting strings to enums using the base type, and it should be a standard option for any (de)serialization code in Phobos. Considering how frequently string serialization comes up in the real world, it will be well worth it.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
March 27, 2013
http://d.puremagic.com/issues/show_bug.cgi?id=9821



--- Comment #5 from bearophile_hugs@eml.cc 2013-03-26 18:20:03 PDT ---
(In reply to comment #4)

> Sure, I understand if there's reluctance to change the meaning of to() in this
> case.

What I meant to say is that this semantics is bad:

> first tries to convert using enum member names, as it
> currently does, and *only if* that fails, then (2) tries to convert the string
> to the enum base type?

You try a conversion, and if it fails, then you _stop_. Otherwise you are going into a swamp.


> But there should be at least some alternative function for converting strings to enums using the base type,

I think I have asked for such function in another Bugzilla issue.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
March 27, 2013
http://d.puremagic.com/issues/show_bug.cgi?id=9821


Jonathan M Davis <jmdavisProg@gmx.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jmdavisProg@gmx.com


--- Comment #6 from Jonathan M Davis <jmdavisProg@gmx.com> 2013-03-26 18:23:30 PDT ---
> But there should be at least some alternative function for converting strings
to enums using the base type

Then just use to with the base type. And if you're dealing with generic code or don't want to hard code what the base type is, then use std.traits.OriginalType:

to!(OriginalType!MyEnum)(str);

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
March 27, 2013
http://d.puremagic.com/issues/show_bug.cgi?id=9821



--- Comment #7 from Jared Miller <jared@economicmodeling.com> 2013-03-27 12:11:26 PDT ---
(In reply to comment #6)
> Then just use to with the base type. And if you're dealing with generic code or don't want to hard code what the base type is, then use std.traits.OriginalType:
> 
> to!(OriginalType!MyEnum)(str);

Right, that gets me to the base type. Eventually I want the enum type, so at minimum I need to do:

// Naive, unless I check that the value is a member
MyEnum e = cast(MyEnum)(to!(OriginalType!MyEnum)(str));
// Strict
MyEnum e = to!MyEnum(to!(OriginalType!MyEnum)(str));

So, it's not hard, it's just always a special case. Some "expected" things don't work:

MyEnum { Foo = 1, Bar = 7 }
MyEnum e2 = to!MyEnum("7"); // throws
readf("%d", &e1);           // error: no matching unformatValue
readf("%d", &cast(int)e1);  // ok, works like C, avoids proper checks
writef("%d", e1);           // ok

Are my expectations just odd? Very possible :) I deal with a lot of data munging and this little enum quirk ends up requiring special handling in every bit of generic read/write code, unless I just ban enums entirely from all data conversion code. C/C++ always treat them as ints and D always treats them as member-name strings in std.conv -- you see the potential incompatibility. I realize D has to deal with more complexity since more base types are allowed, but it does break with tradition for integral base types.

I expected something like C# which stringifies enums to member name by default (like D), but Enum.Parse takes strings representing *either* member name or value.

So, maybe std.conv could allow C#-style parsing for integral enums only. Or, there could be a toImpl overload that's restricted to integral enums and takes a required flag for "by name" or "by value". Also it'd be handy to have enum overloads of unformatValue. Mainly what I don't like to see is the the default member-name conversion creeping into other components like csvReader.

So I guess what I'm looking for is either:

(1) "No, we've thought it over and enums, regardless of base type, should nearly always be (de)serialized by member name -- doing it by value is a rare use case so you should write your own wrappers for anything that uses std.conv", or

(2) "Yes, parsing string/char to integral enums by value is common enough that components in Phobos should offer that option where appropriate."

Thanks and I hope I've made my issue clearer.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------