size_t index=-1; (page 2) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » Learn » size_t index=-1; (page 2)

March 17, 2016

Re: size_t index=-1;

Posted by tsbockman
in reply to Jonathan M Davis

tsbockman

Posted in reply to Jonathan M Davis

On Thursday, 17 March 2016 at 01:57:16 UTC, Jonathan M Davis wrote:
> Just assigning one to the other really isn't a problem, and sometimes you _want_ the wraparound. If you assume that it's always the case that assigning a negative value to an unsigned type is something that programmers don't want to do, then you haven't programmed in C enough.

Greater than 90% of the time, even in low level code, an assignment, comparison, or any other operation that mixes signed and unsigned types is being done directly (without bounds checking) only for speed, laziness, or ignorance - not because 2's complement mapping of negative to positive values is actually desired.

Forcing deliberate invocations of 2's complement mapping between signed and unsigned types to be explicitly marked is a good thing, seeing as the intended semantics are fundamentally different. I interpret any resistance to this idea, simply as evidence that we haven't yet made it sufficiently easy/pretty to be explicit.

Any idea that it's actually *desirable* for code to be ambiguous in this way is just silly.

March 17, 2016

Re: size_t index=-1;

Posted by tsbockman
in reply to Jonathan M Davis

tsbockman

Posted in reply to Jonathan M Davis

On Thursday, 17 March 2016 at 01:57:16 UTC, Jonathan M Davis wrote:
> or wrap your integers in types that have more restrictive rules. IIRC, at least one person around here has done that already so that they can catch integer overflow - which is basically what you're complaining about here.

That's me (building on Robert Schadek's work):
    https://code.dlang.org/packages/checkedint

Although I should point out that my `SmartInt` actually has *less* restrictive rules than the built-in types - all possible combinations of size and signedness are both allowed and safe for all operations, without any explicit casts. A lot of what `SmartInt` does depends on (minimal) extra runtime logic, which imposes a ~30% performance penalty (when integer math is actually the bottleneck) with good compiler optimizations (GDC or LDC).

But, a lot of it could also be done at no runtime cost, by leveraging VRP. C's integer math rules are really pretty bad, even when taking performance into account. Something as simple as by default promoting to a signed, rather than unsigned, type would prevent many bugs in practice, at zero cost (except that it would be a breaking change).

There is also `SafeInt` with "more restrictive rules", if it is for some reason necessary to work inside the limitations of the built-in basic integer types.

March 17, 2016

Re: size_t index=-1;

Posted by Steven Schveighoffer
in reply to Mathias Lang

Steven Schveighoffer

Posted in reply to Mathias Lang

On 3/16/16 6:37 PM, Mathias Lang wrote:
> On Wednesday, 16 March 2016 at 21:49:05 UTC, Steven Schveighoffer wrote:
>> No, please don't. Assigning a signed value to an unsigned (and vice
>> versa) is very useful, and there is no good reason to break this.
>
> I'm not talking about removing it completely. The implicit conversion
> should only happen when it's safe:
>
> ```
> int s;
> if (s >= 0) // VRP saves the day
> {
>    uint u = s;
> }
> ```
>
> ```
> uint u;
>
> if (u > short.max)
>    throw new Exception("Argument out of range");
> // Or `assert`
> short s = u;
> ```

Converting unsigned to signed or vice versa (of the same size type) is safe. No information is lost. It's the comparison between the two which confuses the heck out of people. I think we can solve 80% of the problems by just fixing that. And the bug report says it's preapproved from Walter and Andrei.

VRP on steroids would be nice, but I don't think it's as trivial to solve.

-Steve

March 17, 2016

Re: size_t index=-1;

Posted by tsbockman
in reply to Steven Schveighoffer

tsbockman

Posted in reply to Steven Schveighoffer

On Thursday, 17 March 2016 at 17:09:46 UTC, Steven Schveighoffer wrote:
> Converting unsigned to signed or vice versa (of the same size type) is safe. No information is lost.

Saying that "no information is lost" in such a case, is like saying that if I encrypt my hard drive and then throw away the password, "no information is lost". Technically this is true: the bit count is the same as it was before.

In practice, though, the knowledge of how information is encoded is essential to actually using it.

In the same way, using `cast(ulong)` to pass `-1L` to a function that expects a `ulong` results in a de-facto loss of information, because that `-1L` can no longer distinguished from `ulong.max`, despite the fundamental semantic difference between the two.

> VRP on steroids would be nice, but I don't think it's as trivial to solve.

D's current VRP is actually surprisingly anemic: it doesn't even understand integer comparisons, or the range restrictions implied by the predicate when a certain branch of an `if` statement is taken.

Lionello Lunesu made a PR a while back that adds these two features, and it makes the compiler feel a lot smarter. (The PR was not accepted at the time, but I have since revived it:
    https://github.com/D-Programming-Language/dmd/pull/5229)

March 18, 2016

Re: size_t index=-1;

Posted by Ola Fosheim Grøstaf
in reply to tsbockman

Ola Fosheim Grøstaf

Posted in reply to tsbockman

On Thursday, 17 March 2016 at 22:46:01 UTC, tsbockman wrote:
> In the same way, using `cast(ulong)` to pass `-1L` to a function that expects a `ulong` results in a de-facto loss of information, because that `-1L` can no longer distinguished from `ulong.max`, despite the fundamental semantic difference between the two.

Only providing modular arithmetics is a significant language design flaw, but as long as all integers are defined to be modular then there is no fundamental semantic difference either.

Of course, comparisons beyond equality doesn't work for modular arithmetics either, irrespective of sign...

You basically have to decide whether you want a line or a circle; Walter chose the circle for integers and the line for floating point. The circle is usually the wrong model, but that does not change the language definition...

March 18, 2016

Re: size_t index=-1;

Posted by Marc Schütz
in reply to Steven Schveighoffer

Marc Schütz

Posted in reply to Steven Schveighoffer

On Thursday, 17 March 2016 at 17:09:46 UTC, Steven Schveighoffer wrote:
> On 3/16/16 6:37 PM, Mathias Lang wrote:
>> On Wednesday, 16 March 2016 at 21:49:05 UTC, Steven Schveighoffer wrote:
>>> No, please don't. Assigning a signed value to an unsigned (and vice
>>> versa) is very useful, and there is no good reason to break this.
>>
>> I'm not talking about removing it completely. The implicit conversion
>> should only happen when it's safe:
>>
>> ```
>> int s;
>> if (s >= 0) // VRP saves the day
>> {
>>    uint u = s;
>> }
>> ```
>>
>> ```
>> uint u;
>>
>> if (u > short.max)
>>    throw new Exception("Argument out of range");
>> // Or `assert`
>> short s = u;
>> ```
>
> Converting unsigned to signed or vice versa (of the same size type) is safe. No information is lost.

Strictly speaking yes, but typically, an `int` isn't used as a bit-pattern but as an integer (it's in the name). Such behaviour is very undesirable for integers.

> It's the comparison between the two which confuses the heck out of people. I think we can solve 80% of the problems by just fixing that.

That's probably true, anyway.

March 18, 2016

Re: size_t index=-1;

Posted by Steven Schveighoffer
in reply to tsbockman

Steven Schveighoffer

Posted in reply to tsbockman

On 3/17/16 6:46 PM, tsbockman wrote:
> On Thursday, 17 March 2016 at 17:09:46 UTC, Steven Schveighoffer wrote:
>> Converting unsigned to signed or vice versa (of the same size type) is
>> safe. No information is lost.
>
> Saying that "no information is lost" in such a case, is like saying that
> if I encrypt my hard drive and then throw away the password, "no
> information is lost". Technically this is true: the bit count is the
> same as it was before.

It's hard to throw away the "key" of 2's complement math.

> In practice, though, the knowledge of how information is encoded is
> essential to actually using it.

In practice, a variable that is unsigned or signed is expected to behave like it is declared. I don't think anyone expects differently.

When I see:

size_t x = -1;

I expect x to behave like an unsigned size_t that represents -1. There is no ambiguity here. Where it gets confusing is if you didn't mean to type size_t. But the compiler can't know that.

When you start doing comparisons, then ambiguity creeps in. The behavior is well defined, but not very intuitive. You can get into trouble even without mixing signed/unsigned types. For example:

for(size_t i = 0; i < a.length - 1; ++i)

This is going to crash when a.length == 0. Better to do this:

for(size_t i = 0; i + 1 < a.length; ++i)

unsigned math can be difficult, there is no doubt. But we can't just disable it, or disable unsigned conversions.

> In the same way, using `cast(ulong)` to pass `-1L` to a function that
> expects a `ulong` results in a de-facto loss of information, because
> that `-1L` can no longer distinguished from `ulong.max`, despite the
> fundamental semantic difference between the two.

Any time you cast a type, the original type information is lost. But in this case, no bits are lost. In this case, the function is declaring "I don't care what your original type was, I want to use ulong". If it desires to know the original type, it should use a template parameter instead.

Note, I have made these mistakes myself, and I understand what you are asking for and why you are asking for it. But these are bugs. The user is telling the compiler to do one thing, and expecting it to do something else. It's not difficult to fix, and in fact, many lines of code are written specifically to take advantage of these rules. This is why we cannot remove them. The benefit is not worth the cost.

>> VRP on steroids would be nice, but I don't think it's as trivial to
>> solve.
>
> D's current VRP is actually surprisingly anemic: it doesn't even
> understand integer comparisons, or the range restrictions implied by the
> predicate when a certain branch of an `if` statement is taken.
>
> Lionello Lunesu made a PR a while back that adds these two features, and
> it makes the compiler feel a lot smarter. (The PR was not accepted at
> the time, but I have since revived it:
>      https://github.com/D-Programming-Language/dmd/pull/5229)

I'm not compiler-savvy enough to have an opinion on the PR, but I think more sophisticated VRP would be good.

-Steve

March 18, 2016

Re: size_t index=-1;

Posted by tsbockman
in reply to Ola Fosheim Grøstaf

tsbockman

Posted in reply to Ola Fosheim Grøstaf

On Friday, 18 March 2016 at 05:20:35 UTC, Ola Fosheim Grøstaf wrote:
> Only providing modular arithmetics is a significant language design flaw, but as long as all integers are defined to be modular then there is no fundamental semantic difference either.

`ulong.max` and `-1L` are fundamentally different semantically, even with two's complement modular arithmetic.

Just because a few operations (addition and subtraction, mainly) can use a common implementation for both, does not change that. Division, for example, cannot be done correctly without knowing whether the inputs are signed or not.

March 18, 2016

Re: size_t index=-1;

Posted by tsbockman
in reply to Steven Schveighoffer

tsbockman

Posted in reply to Steven Schveighoffer

On Friday, 18 March 2016 at 14:51:34 UTC, Steven Schveighoffer wrote:
> Note, I have made these mistakes myself, and I understand what you are asking for and why you are asking for it. But these are bugs. The user is telling the compiler to do one thing, and expecting it to do something else. It's not difficult to fix, and in fact, many lines of code are written specifically to take advantage of these rules. This is why we cannot remove them. The benefit is not worth the cost.

Actually, I think I confused things for you by mentioning to `cast(ulong)`.

I'm not asking for a Java-style "no unsigned" system (I hate that; it's one of my biggest annoyances with Java). Rather, I'm picking on *implicit* conversions between signed and unsigned.

I'm basically saying, "because information is lost when casting between signed and unsigned, all such casts should be explicit". This could make code rather verbose - except that from my experiments, with decent VRP the compiler can actually be surprisingly smart about warning only in those cases where implicit casting is really a bad idea.

March 18, 2016

Re: size_t index=-1;

Posted by Jonathan M Davis
in reply to tsbockman

Jonathan M Davis

Posted in reply to tsbockman

On Friday, March 18, 2016 23:48:32 tsbockman via Digitalmars-d-learn wrote:
> I'm basically saying, "because information is lost when casting between signed and unsigned, all such casts should be explicit".

See. Here's the fundamental disagreement. _No_ information is lost when converting between signed and unsigned integers. e.g.

    int i = -1;
    uint ui = i;
    int j = i;
    assert(j == -1);

But even if you convinced us, you'd have to convince Walter. And based on previously discussions on this subject, I think that you have an _extremely_ low chance of that. He doesn't even think that there's a problem that

void foo(bool bar) {}
void foo(long bar) {}
foo(1);

resulted in call to the bool overload was a problem when pretty much everyone else did. The only thing that I'm aware of that Walter has thought _might_ be something that we should change is allowing the comparison between signed and unsigned integers, and if you read what he says in the bug report for it, he clearly doesn't think it's a big problem:

https://issues.dlang.org/show_bug.cgi?id=259

And that's something that clearly causes bugs in way that converting between signed and unsigned integers does not. You're fighting for a lost cause on this one.

- Jonathan M Davis

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation