Jump to page: 1 2
Thread overview
Range of enum type values
Dec 27, 2019
Johan Engelen
Dec 27, 2019
Timon Gehr
Dec 27, 2019
Johan Engelen
Dec 28, 2019
Timon Gehr
Dec 28, 2019
ag0aep6g
OFF TOPIC Re: Range of enum type values
Dec 28, 2019
Johan Engelen
Dec 28, 2019
Walter Bright
Dec 28, 2019
Timon Gehr
Dec 28, 2019
Timon Gehr
Dec 28, 2019
Timon Gehr
Dec 28, 2019
Walter Bright
Dec 28, 2019
Johan Engelen
December 27, 2019
Hi all,
  I am wondering about the valid range of values of an enum type. I couldn't find anything explicit about this in the language specification.

Consider this code:
```
enum Flags
{
    A = 1,
    B = 2,
    C = 4
}

bool rangeCheck(Flags f)
{
    return (f >= Flags.min) && (f <= Flags.max);
}
bool preciseCheck(Flags f)
{
    return (f == Flags.A) || (f == Flags.B) || (f == Flags.C);
}```

Is `rangeCheck` guaranteed to return true? Is `preciseCheck` guaranteed to return true?
A variable of type Flags is always initialized to Flags.A.
Integer assignment is not allowed.
So `Flags f` should always have a value of A B or C, right?

No. This code is accepted:
```
Flags getFlags()
{
    return Flags.A | Flags.B; // and so is `^`, `&`, `+`, `*`, ...
}
```

I'd like to have the value range implications of the use of operators on enum values explicitly mentioned in the spec.

Current compiler behavior results in an infinite value range. (but it's implicit behavior, i.e. not explicitly mentioned in spec)

- Currently, are operations resulting in a value larger than the underlying integer storage type UB, like for normal signed integers?
- Should we limit the range of valid values of the Flags enum (C++ defines valid range to be [0..7])?
- Do we want to limit operations allowed on enum types? Or change the result type? (e.g. the type of `Flags + Flags` is `int` instead of `Flags`.

cheers,
  Johan

December 27, 2019
On 27.12.19 13:14, Johan Engelen wrote:
> 
> Current compiler behavior results in an infinite value range. (but it's implicit behavior, i.e. not explicitly mentioned in spec)
> 
> - Currently, are operations resulting in a value larger than the underlying integer storage type UB,

They are @safe. You can't have UB in @safe code.

> like for normal signed integers?

Signed integers have wraparound semantics.
https://forum.dlang.org/thread/n23bo3$qe$1@digitalmars.com

The spec mentions this for AddExpressions (but the example only shows it for uint): https://dlang.org/spec/expression.html
"If both operands are of integral types and an overflow or underflow occurs in the computation, wrapping will happen."

There simply _can't_ be any UB in signed integer operations, as they are considered @safe.

> - Should we limit the range of valid values of the Flags enum (C++ defines valid range to be [0..7])?
> - Do we want to limit operations allowed on enum types? Or change the result type? (e.g. the type of `Flags + Flags` is `int` instead of `Flags`.

I see those options:

1. The valid range is the full range of the underlying type (as DMD treats it now).

2. The range is [1..4]. In this case, the operations have to promote their operands to the enum base type, and most casts to enum types must be @system.

3. The range is [0..7]. In this case, only operations that preserve this range (such as bitwise operators) should yield the enum type, and other operations should promote their operands to the enum base type, and most casts to enum types must be @system.


Personally, I think 2 makes most sense (especially with `final switch`, as the current semantics forces compilers to insert default cases there), but this would be a breaking language change.
December 27, 2019
On 12/27/19 9:12 AM, Timon Gehr wrote:
> On 27.12.19 13:14, Johan Engelen wrote:
>>
>> Current compiler behavior results in an infinite value range. (but it's implicit behavior, i.e. not explicitly mentioned in spec)
>>
>> - Currently, are operations resulting in a value larger than the underlying integer storage type UB,
> 
> They are @safe. You can't have UB in @safe code.
> 
>> like for normal signed integers?
> 
> Signed integers have wraparound semantics.
> https://forum.dlang.org/thread/n23bo3$qe$1@digitalmars.com
> 
> The spec mentions this for AddExpressions (but the example only shows it for uint): https://dlang.org/spec/expression.html
> "If both operands are of integral types and an overflow or underflow occurs in the computation, wrapping will happen."
> 
> There simply _can't_ be any UB in signed integer operations, as they are considered @safe.
> 
>> - Should we limit the range of valid values of the Flags enum (C++ defines valid range to be [0..7])?
>> - Do we want to limit operations allowed on enum types? Or change the result type? (e.g. the type of `Flags + Flags` is `int` instead of `Flags`.
> 
> I see those options:
> 
> 1. The valid range is the full range of the underlying type (as DMD treats it now).
> 
> 2. The range is [1..4]. In this case, the operations have to promote their operands to the enum base type, and most casts to enum types must be @system.
> 
> 3. The range is [0..7]. In this case, only operations that preserve this range (such as bitwise operators) should yield the enum type, and other operations should promote their operands to the enum base type, and most casts to enum types must be @system.
> 
> 
> Personally, I think 2 makes most sense (especially with `final switch`, as the current semantics forces compilers to insert default cases there), but this would be a breaking language change.

We have another option, which I like. That is, only allow bitwise operations on enums that are flagged as allowing bitwise operations (either with a uda, or via some other mechanism). Many languages actually treat enums just like structs, where you can add operators and functions. This is also a possibility.

This is also a breaking change, but also I don't want the compiler complaining about final switch on enums where the enum is intended not to be a bitwise flag. So I'd prefer 2 over 3.

-Steve
December 27, 2019
On Friday, 27 December 2019 at 14:58:59 UTC, Steven Schveighoffer wrote:
> On 12/27/19 9:12 AM, Timon Gehr wrote:
>> On 27.12.19 13:14, Johan Engelen wrote:
>>>
>>> Current compiler behavior results in an infinite value range. (but it's implicit behavior, i.e. not explicitly mentioned in spec)
>>>
>>> - Currently, are operations resulting in a value larger than the underlying integer storage type UB,
>> 
>> They are @safe. You can't have UB in @safe code.
>> 
>>> like for normal signed integers?
>> 
>> Signed integers have wraparound semantics.
>> https://forum.dlang.org/thread/n23bo3$qe$1@digitalmars.com

Thanks for the correction!
I hope someone finds the time to make that more explicit in the spec.

>> The spec mentions this for AddExpressions (but the example only shows it for uint): https://dlang.org/spec/expression.html
>> "If both operands are of integral types and an overflow or underflow occurs in the computation, wrapping will happen."
>> 
>> There simply _can't_ be any UB in signed integer operations, as they are considered @safe.

I don't accept this argument [*], but no argument needed here. Just needs some clarification in spec text.

>>> - Should we limit the range of valid values of the Flags enum (C++ defines valid range to be [0..7])?
>>> - Do we want to limit operations allowed on enum types? Or change the result type? (e.g. the type of `Flags + Flags` is `int` instead of `Flags`.
>> 
>> I see those options:
>> 
>> 1. The valid range is the full range of the underlying type (as DMD treats it now).
>> 
>> 2. The range is [1..4]. In this case, the operations have to promote their operands to the enum base type, and most casts to enum types must be @system.
>> 
>> 3. The range is [0..7]. In this case, only operations that preserve this range (such as bitwise operators) should yield the enum type, and other operations should promote their operands to the enum base type, and most casts to enum types must be @system.
>> 
>> 
>> Personally, I think 2 makes most sense (especially with `final switch`, as the current semantics forces compilers to insert default cases there), but this would be a breaking language change.
>
> We have another option, which I like. That is, only allow bitwise operations on enums that are flagged as allowing bitwise operations (either with a uda, or via some other mechanism). Many languages actually treat enums just like structs, where you can add operators and functions. This is also a possibility.
>
> This is also a breaking change, but also I don't want the compiler complaining about final switch on enums where the enum is intended not to be a bitwise flag. So I'd prefer 2 over 3.

Let's separate the discussion into what it _currently is_ and what _it might be in future_.

Current language behavior:
enum value range = full range of base type; integer operations work as-if the type is the base type.

Future:
Several options + lots of discussion ;-) and DIP needed.

Can I summarize it like that?

cheers,
  Johan



[*]
Let's not go off on a tangent, but there is enough UB in D that I do not accept that @safe=="no UB" argument. One example that comes to mind is bitshifting by more than the operand bit width: "illegal" is what the spec says but that doesn't make sense for runtime shift values and, in practice, turns into UB at runtime. ;-)

December 28, 2019
On 27.12.19 19:12, Johan Engelen wrote:
> 
> [*]
> Let's not go off on a tangent,

It's not a tangent, it's a powerful tool to decide whether something has any business being UB or not.

> but there is enough UB in D that I do not accept that @safe=="no UB" argument.

https://dlang.org/articles/safed.html

"In D, we expect the vast majority of programmers to operate within the safe subset of D, which we call SafeD. The safety and the ease of use of SafeD is comparable to Java—in fact Java programs can be machine-translated into this safe subset of D. SafeD is easy to learn and it keeps the programmers away from undefined behaviors. It is also very efficient."

"[...] you are guaranteed not to encounter any undefined behavior."

https://dlang.org/spec/memory-safe-d.html

"Therefore, the safe subset of D consists only of programming language features that are guaranteed to never result in memory corruption. See this article for a rationale."

("this article" links to
https://dlang.org/articles/safed.html)

> One example that comes to mind is bitshifting by more than the operand bit width: "illegal" is what the spec says but that doesn't make sense for runtime shift values and, in practice, turns into UB at runtime. ;-)

What that means is not that UB is allowed in @safe code, but rather that the spec hasn't been properly updated after @safe was introduced to clarify what "illegal" means here. It should mean that the returned value is arbitrary, not that the behavior of the entire program will be arbitrary. I think Walter has said as much before, but I can't find the post.

@safe is meant to imply no memory corruption. @safe implies no UB, because UB can lead to any behavior, including memory corruption. UB allows compilers to insert arbitrary code execution exploits. How can you call that @safe?
December 28, 2019
On 27.12.19 19:12, Johan Engelen wrote:
> 
> Current language behavior:
> enum value range = full range of base type; integer operations work as-if the type is the base type.
> 
> Future:
> Several options + lots of discussion ;-) and DIP needed.
> 
> Can I summarize it like that?

I think so, but you might want to add that currently, e.g. `cast(E)x` is @safe for an `enum E:typeof(x){ ... }`
December 28, 2019
On 28.12.19 14:33, Timon Gehr wrote:
> https://dlang.org/articles/safed.html
> 
> "In D, we expect the vast majority of programmers to operate within the safe subset of D, which we call SafeD. The safety and the ease of use of SafeD is comparable to Java—in fact Java programs can be machine-translated into this safe subset of D. SafeD is easy to learn and it keeps the programmers away from undefined behaviors. It is also very efficient."
> 
> "[...] you are guaranteed not to encounter any undefined behavior."
> 
> https://dlang.org/spec/memory-safe-d.html
> 
> "Therefore, the safe subset of D consists only of programming language features that are guaranteed to never result in memory corruption. See this article for a rationale."
> 
> ("this article" links to
> https://dlang.org/articles/safed.html)

Also: https://dlang.org/spec/function.html#function-safety

"Safe functions are functions that are statically checked to exhibit no possibility of undefined behavior."

"Safe functions are marked with the @safe attribute."
December 28, 2019
I am skeptical about the value of major breaking changes with enums at this point, as it doesn't seem like there are a lot of undetected bugs emanating from the fairly loose definition of them.

Related to this is the ability to specify a range of values for a type, rather than enumerating them.
December 28, 2019
On Saturday, 28 December 2019 at 13:33:25 UTC, Timon Gehr wrote:
> 
> @safe is meant to imply no memory corruption. @safe implies no UB, because UB can lead to any behavior, including memory corruption. UB allows compilers to insert arbitrary code execution exploits. How can you call that @safe?

I think we are talking about different things here.

You are saying: the spec says @safe means no UB, and if the spec doesn't say it then it simply needs updating. There are a number of text pieces that say that.

I am saying: regardless of what the spec and any of those articles promise, current D behavior is that @safe _can_ have UB in it.

I know most people don't like to hear it nor acknowledge it. But I think it is better to be realistic about this. `@safe` currently does _not_ mean the code is super safe.

One could say that the compilers are just not standard-compliant, nor is the spec itself, but the problem is bigger than that. Adding null dereference checks everywhere is not what (I think) people want. So that means there will always be UB potential in @safe code with interface method calls or class member variable access. I know about the "we specify that reading from NULL must result in a segfault". It misses the point by not understanding that a "null dereference" doesn't mean "access address 0" (let alone that by that we disallow e.g. system programmers to actually use address 0). Member variable access often does not access address 0x0, nor does an interface method call.

(I've been in these discussions too many times now. I'll try to stop arguing it.)

-Johan

December 28, 2019
On Saturday, 28 December 2019 at 20:20:46 UTC, Walter Bright wrote:
> I am skeptical about the value of major breaking changes with enums at this point, as it doesn't seem like there are a lot of undetected bugs emanating from the fairly loose definition of them.

Yeah, I agree. It's good to clarify things in the spec though. To prevent someone (i.e. me) from trying to use enum range information for optimization.

https://github.com/dlang/dlang.org/pull/2728

Can we add a text like: "The enum type can be used in operator expressions (like AddExpression): the resulting type is the enum type, and the resulting value is computed by performing the operation as if the type is the enum base type. A variable of type enum does not have to have a value that corresponds with any of the enum members; the range of valid values for an enum typed variable is [basetype.min ... basetype.max]."

I'm not so satisfied with this text though.

-Johan

« First   ‹ Prev
1 2