size_t index=-1; (page 4)

On Monday, 21 March 2016 at 17:38:35 UTC, Steven Schveighoffer wrote: > Your definition of when "implicit casting is really a bad idea" is almost certainly going to include cases where it really isn't a bad idea. This logic can be applied to pretty much any warning condition or safety/correctness-related compiler feature; if it were followed consistently the compiler would just always trust the programmer, like an ancient C or C++ compiler with warnings turned off. > The compiler isn't all-knowing, and there will always be cases where the user knows best (and did the conversion intentionally). That's what explicit casts are for. > An obvious one is: > > void foo(ubyte[] x) > { > int len = x.length; > } > > (let's assume 32-bit CPU) I'm assuming the compiler would complain about this, since technically, len could be negative! Disallowing such code or requiring a cast is probably too much. But that *is* a bad idea - there have been real-world bugs caused by doing stuff like that without checking. With respect to your specific example: 1) The memory limit on a true 32-bit system is 4GiB, not 2GiB. Even with an OS that reserves some of the address space, as much as 3GiB or 3.5GiB may be exposed to a user-space process in practice. 2) Many 32-bit CPUs have Physical Address Extension, which allows them to support way more than 4GiB. Even a non-PAE-aware process will probably still be offered at least 3GiB on such a system. 3) Just because your program is 32-bit, does *not* mean that it will only ever run on 32-bit CPUs. On a 64-bit system, a single 32-bit process could easily have access to ~3GiB of memory. 4) Even on an embedded system (which D doesn't really support right now, anyway) with a true, 2GiB memory limit, you still have the problem that such highly platform-dependent code is difficult to find and update when the time comes to port the code to more powerful hardware. These kinds of things are why D has fixed-size integer types: to encourage writing portable code, without too many invisible assumptions about the precise details of the execution environment.

On 3/21/16 4:27 PM, tsbockman wrote: > On Monday, 21 March 2016 at 17:38:35 UTC, Steven Schveighoffer wrote: >> Your definition of when "implicit casting is really a bad idea" is >> almost certainly going to include cases where it really isn't a bad idea. > > This logic can be applied to pretty much any warning condition or > safety/correctness-related compiler feature; if it were followed > consistently the compiler would just always trust the programmer, like > an ancient C or C++ compiler with warnings turned off. Right, if we were starting over, I'd say let's make sure you can't make these kinds of mistakes. We are not starting over though, and existing code will have intentional uses of the existing behavior that are NOT bugs. Even that may have been rejected by Walter since a goal is making C code easy to port. Note that we already have experience with such a thing: if(arr). Fixing is easy, just put if(arr.ptr). It was rejected because major users of this "feature" did not see any useful improvements -- all their usages were sound. >> The compiler isn't all-knowing, and there will always be cases where >> the user knows best (and did the conversion intentionally). > > That's what explicit casts are for. Existing code doesn't need to cast. People are lazy. I only would insert a cast if I needed to. Most valid code just works fine without casts, so you are going to flag lots of valid code as a nuisance. >> An obvious one is: >> >> void foo(ubyte[] x) >> { >> int len = x.length; >> } >> >> (let's assume 32-bit CPU) I'm assuming the compiler would complain >> about this, since technically, len could be negative! Disallowing such >> code or requiring a cast is probably too much. > > But that *is* a bad idea - there have been real-world bugs caused by > doing stuff like that without checking. It depends on the situation. foo may know that x is going to be short enough to fit in an int. The question becomes, if 99% of cases the user knows that he was converting to a signed value intentionally, and in the remaining 1% of cases, 99% of those were harmless "errors", then this is going to be just a nuisance update, and it will fail to be accepted. > With respect to your specific example: > > 1) The memory limit on a true 32-bit system is 4GiB, not 2GiB. Even with > an OS that reserves some of the address space, as much as 3GiB or 3.5GiB > may be exposed to a user-space process in practice. Then make it long len = x.length on a 64-bit system. Only reason I said assume it's 32-bit, is because on 64-bit CPU, using int is already an error. The architecture wasn't important for the example. -Steve

On Monday, 21 March 2016 at 22:29:46 UTC, Steven Schveighoffer wrote: > It depends on the situation. foo may know that x is going to be short enough to fit in an int. > > The question becomes, if 99% of cases the user knows that he was converting to a signed value intentionally, and in the remaining 1% of cases, 99% of those were harmless "errors", then this is going to be just a nuisance update, and it will fail to be accepted. My experimentation strongly suggests that your "99.99% false positive" figure is way, *way* off. This stuff is both: 1) Harder for people to get right than you think (you can't develop good intuition about the extent of the problem, unless you spend some time thoroughly auditing existing code bases specifically looking for this kind of problem), and also 2) Easier for the compiler to figure out than you think - I was really surprised at how short the list of problems flagged by the compiler was, when I tested Lionello Lunesu's work on the current D codebase. The false positive rate would certainly be *much* lower than your outlandish 10,000 : 1 estimate, given a good compiler implementation. >> With respect to your specific example: >> >> 1) The memory limit on a true 32-bit system is 4GiB, not 2GiB. Even with >> an OS that reserves some of the address space, as much as 3GiB or 3.5GiB >> may be exposed to a user-space process in practice. > > Then make it long len = x.length on a 64-bit system. > > Only reason I said assume it's 32-bit, is because on 64-bit CPU, using int is already an error. The architecture wasn't important for the example. Huh? The point of mine which you quoted applies specifically to 32-bit systems. 32-bit array lengths can be greater than `int.max`. Did you mean to reply to point #3, instead?

March 21, 2016

Re: size_t index=-1;

Posted by Steven Schveighoffer
in reply to tsbockman

Permalink

Steven Schveighoffer

Posted in reply to tsbockman

Permalink

On 3/21/16 7:43 PM, tsbockman wrote:
> On Monday, 21 March 2016 at 22:29:46 UTC, Steven Schveighoffer wrote:
>> It depends on the situation. foo may know that x is going to be short
>> enough to fit in an int.
>>
>> The question becomes, if 99% of cases the user knows that he was
>> converting to a signed value intentionally, and in the remaining 1% of
>> cases, 99% of those were harmless "errors", then this is going to be
>> just a nuisance update, and it will fail to be accepted.
>
> My experimentation strongly suggests that your "99.99% false positive"
> figure is way, *way* off. This stuff is both:

Maybe, what would be a threshold that people would find acceptable?

> 1) Harder for people to get right than you think (you can't develop good
> intuition about the extent of the problem, unless you spend some time
> thoroughly auditing existing code bases specifically looking for this
> kind of problem), and also

It matters not to the person who is very aware of the issue and doesn't write buggy code. His code "breaks" too.

I would estimate that *most* uses of if(arr) in the wild were/are incorrect. However, in one particular user's code *0* were incorrect, even though he used it extensively. This kind of problem is what lead to the change being reverted. I suspect this change would be far more likely to create more headaches than help.

> 2) Easier for the compiler to figure out than you think - I was really
> surprised at how short the list of problems flagged by the compiler was,
> when I tested Lionello Lunesu's work on the current D codebase.

This is highly subjective to whose code you use it on.

> The false positive rate would certainly be *much* lower than your
> outlandish 10,000 : 1 estimate, given a good compiler implementation.

I wouldn't say it's outlandish given my understanding of the problem. The question is, does the pain justify the update? I haven't run it against my code or any code really, but I can see how someone is very good at making correct uses of the implicit conversion.

>>> With respect to your specific example:
>>>
>>> 1) The memory limit on a true 32-bit system is 4GiB, not 2GiB. Even with
>>> an OS that reserves some of the address space, as much as 3GiB or 3.5GiB
>>> may be exposed to a user-space process in practice.
>>
>> Then make it long len = x.length on a 64-bit system.
>>
>> Only reason I said assume it's 32-bit, is because on 64-bit CPU, using
>> int is already an error. The architecture wasn't important for the
>> example.
>
> Huh? The point of mine which you quoted applies specifically to 32-bit
> systems. 32-bit array lengths can be greater than `int.max`.
>
> Did you mean to reply to point #3, instead?

You seem to spend a lot of time focusing on 32-bit architecture, which was not my point at all.

My point is that most arrays and uses are short enough to be handled with a signed value as the length.

If this is a generic library function, sure, we should handle all possibilities. This doesn't mean someone's command line utility processing strings from the argument list should have to worry about that (as an example). Breaking perfectly good code is something we should strive against.

-Steve

On Tuesday, 22 March 2016 at 00:18:54 UTC, Steven Schveighoffer wrote: > On 3/21/16 7:43 PM, tsbockman wrote: >> The false positive rate would certainly be *much* lower than your >> outlandish 10,000 : 1 estimate, given a good compiler implementation. > > I wouldn't say it's outlandish given my understanding of the problem. The question is, does the pain justify the update? I haven't run it against my code or any code really, but I can see how someone is very good at making correct uses of the implicit conversion. Well that's the real problem here then, isn't it? I wouldn't want this stuff "fixed" either, if I thought false positives would outnumber useful warnings by 10,000 : 1. However, I already *know* that's not the case, from my own tests. But at this point I'm obviously not going to convince you, except by compiling some concrete statistics on what got flagged in some real code bases. And this I plan to do (in some form or other), once `checkedint` and/or the fix for DMD issue 259 are really ready. People can make an informed decision about the trade-offs then.

Forums