Bring back foreach int indexes - bug prone implicit conversions (page 3)

Settings

Help

Index » General » Bring back foreach int indexes - bug prone implicit conversions (page 3)

December 01

Re: Bring back foreach int indexes - bug prone implicit conversions

Posted by Nick Treleaven
in reply to IGotD-

Permalink

Nick Treleaven

Posted in reply to IGotD-

Permalink

On Thursday, 30 November 2023 at 11:54:01 UTC, IGotD- wrote:

One of the corner stones in D is that there are hardly any implicit conversions.

I don't think that is true. In modern languages these would not be allowed:

typeof(null) -> T*
Often I want a pointer that always points to a T, e.g. for function arguments.

uint -> int
int -> uint
Even C compilers can catch these with warnings on.

int -> dchar
byte -> char

Then there's the problem of 0 and 1 matching a bool overload instead of int.

enum Enum {member = 1}
Enum e = Enum.member << 2; // wait, 4 isn't in Enum

I hope most of these can be disallowed in D3.

December 01

Re: Bring back foreach int indexes - bug prone implicit conversions

Posted by Nick Treleaven
in reply to Nick Treleaven

Permalink

Nick Treleaven

Posted in reply to Nick Treleaven

Permalink

On Friday, 1 December 2023 at 12:38:05 UTC, Nick Treleaven wrote:

I hope most of these can be disallowed in D3.

I meant future editions of D.

December 01

Re: Bring back foreach int indexes

Posted by Nick Treleaven
in reply to Steven Schveighoffer

Permalink

Nick Treleaven

Posted in reply to Steven Schveighoffer

Permalink

On Thursday, 30 November 2023 at 21:28:51 UTC, Steven Schveighoffer wrote:

So clearly it uses ubyte as the index for iteration, and also somehow converts the length to ubyte instead of the other way around.

But it doesn't use it exactly, because modifying idx doesn't change the loop. So I don't see why it uses ubyte for the actual index instead of size_t.

This code:

foreach(ubyte idx, v; new int[270])
{
    writeln(idx);
}

Lowers to:

{
	scope int[] __r72 = (new int[](270LU))[];
	ubyte __key71 = cast(ubyte)0u;
	for (; cast(int)__key71 < cast(int)cast(ubyte)__r72.length; cast(int)__key71 += 1)
	{
		int v = __r72[cast(ulong)__key71];
		ubyte idx = __key71;
		writeln(idx);
	}
}

From dmd -vcg-ast. So the array length is cast to ubyte, giving 14.

Honestly, maybe the easiest fix here is just to fix the actual lowering to be more sensical (I would have expected 270 iterations with repeated indexes 0 to 13 after going through 255).

Well I get a deprecation message:

Deprecation: foreach: loop index implicitly converted from `size_t` to `ubyte`

So I think it will become an error (at least in a future edition anyway).

December 10

Re: Bring back foreach int indexes

Posted by Denis Feklushkin
in reply to Nick Treleaven

Permalink

Denis Feklushkin

Posted in reply to Nick Treleaven

Permalink

Now that arrays become templates defined in druntime, maybe it will be possible to define array index size during array declaration and this problem will somehow go away?

December 11

Re: Bring back foreach int indexes

Posted by Quirin Schroll
in reply to Steven Schveighoffer

Permalink

Quirin Schroll

Posted in reply to Steven Schveighoffer

Permalink

On Wednesday, 29 November 2023 at 15:48:25 UTC, Steven Schveighoffer wrote:

On Wednesday, 29 November 2023 at 14:56:50 UTC, Steven Schveighoffer wrote:

I don’t know how many times I get caught with size_t indexes but I want them to be int or uint. It’s especially painful in my class that I’m teaching where I don’t want to yet explain why int doesn’t work there and have to introduce casting or use to!int. All for the possibility that I have an array larger than 2 billion elements.

I am forgetting why we removed this in the first place.

Can we have the compiler insert an assert at the loop start that the bounds are in range when you use a smaller int type? Clearly the common case is that the array is small enough for int indexes.

For those who are unaware, this used to work:

auto arr = [1, 2, 3];
foreach(int idx, v; arr) {
    ...
}

But was removed at some point. I think it should be brought back (we are bringing stuff back now, right? Like hex strings?)

-Steve

Couldn’t you write a function withIntIndex or withIndexType!int such that you can check the array is indeed short enough?

auto withIndexType(T : ulong, U)(U[] array)
{
    static struct WithIndexType
    {
        U[] array;
        int opApplyImpl(DG)(scope DG callback)
        {
            for (T index = 0; index < cast(T)array.length; ++index)
            {
                if (auto result = callback(index, array[index])) return result;
            }
            return 0;
        }
        alias opApply = opApplyImpl!(int delegate(T, ref U));
    }
    assert(array.length < T.max, "withIndexType: array length is too big for index type");
    return WithIndexType(array);
}

The alias opApply makes it so that this works (notice @safe on main):

void main() @safe
{
    double[] xs = new double[](120);
    foreach (i, ref d; xs.withIndexType!byte)
    {
        static assert(is(typeof(i) == byte));
        static assert(is(typeof(d) == double));
        // your part :)
    }
}

In all honesty, I don’t know why the alias trick even works, but using it, the compiler can infer the foreach types and instantiate the opApplyImpl template with the concrete type of the loop delegate.

March 24

Re: Bring back foreach int indexes

Posted by Liam McGillivray
in reply to Steven Schveighoffer

Permalink

Liam McGillivray

Posted in reply to Steven Schveighoffer

Permalink

On Wednesday, 29 November 2023 at 14:56:50 UTC, Steven Schveighoffer wrote:

Yes! This! Right now I have 22 of these deprecation warnings every time I compile my program. I was going to start a new thread recommending this feature be dedeprecated. I'm happy to find this old thread with Steve suggesting this very thing, and also glad to see most people here are on my side. Having to add another variable to do an explicit cast would be ugly.

I don't want this to actually be removed at some point and destroy my code which should be perfectly acceptable. It should just be removed from deprecation.

When I was thinking of starting this thread myself, I had the feeling that there would be some kind of objection from programmers more experienced than me. But it looks like Jonathan M Davis was the only one here to give a serious argument why it shouldn't be allowed.

On Thursday, 30 November 2023 at 15:25:52 UTC, Jonathan M Davis wrote:

Because size_t is uint on 32-bit systems, using int with foreach works just fine aside from the issue of signed vs unsigned (which D doesn't consider to be a narrowing conversion, for better or worse). So, someone could use int with foreach on a 32-bit system and have no problems, but when they move to a 64-bit system, it could become a big problem, because there, size_t is ulong. So, code that worked fine on a 32-bit system could then break on a 64-bit system (assuming that it then starts operating on arrays that are larger than a 32-bit system could handle).

An interesting, not bad point, but I don't think it's enough to justify removing this language feature. It's just too unlikely of a scenario to be worth removing a feature which improves things far more often than not.

Firstly, how often would it be that a program wouldn't explicitly require more array values than uint can fit, but is still capable of filling the array beyond that in places when the maximum array size is enough?

For someone to do all the development and testing of their program on a 32-bit system must be a rare scenario. Even 10 years ago, if someone was running a 32-bit desktop operating system, it meant that either they had one of the older computers still in use, or they stupidly chose the 32-bit version even though their computer was 64-bit capable. The kinds of people who would use a programming language like D aren't the most likely people to make such mistakes. Those that write programs that other people use are even less likely. Now with Windows no longer coming in 32-bit versions, these days are largely behind.

There are probably some people around using D for embedded applications, which may involve 32-bit microcontrollers. In 2024 and beyond, this is the only scenario where someone may realistically use D and do all the testing on a 32-bit system. They would then need to move the same program to a 64-bit system after testing for the problem to emerge. I just don't think this is likely enough to be worth removing the feature.

In the unlikely chance this problem ever does happen, it's just one more of many places where bugs can happen. It might not ever happen. If it were to ever happen, it probably would have already back when 32-bit systems were more common. If there are no known cases of this, then I think it's safe to remove it from deprecation.

Maybe disallow it from functions marked @safe, but generally, I think this feature should be allowed without deprecation warnings.

March 24

Re: Bring back foreach int indexes

Posted by Walter Bright
in reply to Liam McGillivray

Permalink

Walter Bright

Posted in reply to Liam McGillivray

Permalink

When size_t is 64 bits, the reason:

    foreach (int i; 0 .. array.length)

gives an error is the same reason that:

    size_t s;
    int i = s;

gives an error. A 64 bit integer cannot be converted to a 32 bit integer without risk of losing data.

If the loop is rewritten as:

    foreach (i; 0 .. array.length)

then the problem goes away.

size_t is 64 bits for a machine with 64 bit pointers. This also means that registers are 64 bits in size, so no memory and no performance is saved by making the index 32 bits. It's the "natural" size for an index.

BTW, there have been endless bugs in C programs when converting them between 16-32-64 bits, because C doesn't give an error when converting from a larger integer type to a smaller one. It just truncates. These bugs tend to be hidden and hard to track down. Such overflows are also exploited for malware purposes.

D is doing the right thing and the code is portable between 32<=>64 bits pretty much by default. When was the last time you heard of a 32<=>64 bit porting bug with D? I don't recall one. Many of D's design decisions came from living through the C/C++ conversion bugs by transitioning 16-32-64.

Just use:

    foreach (i; 0 .. array.length)

and let the compiler take care of it for you.

March 24

Re: Bring back foreach int indexes

Posted by Walter Bright
in reply to Steven Schveighoffer

Permalink

Walter Bright

Posted in reply to Steven Schveighoffer

Permalink

On 11/29/2023 11:27 AM, Steven Schveighoffer wrote:
> A great point from CyberShadow on the original PR that added the deprecation:
> 
> https://github.com/dlang/dmd/pull/8941#issuecomment-496306412

In that case, the array length is known and VRP should apply.

https://issues.dlang.org/show_bug.cgi?id=24450

March 24

Re: Bring back foreach int indexes

Posted by Liam McGillivray
in reply to Walter Bright

Permalink

Liam McGillivray

Posted in reply to Walter Bright

Permalink

On Sunday, 24 March 2024 at 16:33:06 UTC, Walter Bright wrote:

When size_t is 64 bits, the reason:

foreach (int i; 0 .. array.length)

I suppose I wasn't clear enough. When I say I want it to work without errors, I specifically meant in the following format:

    foreach (uint i, entry; array)

I suppose for the other foreach format it would be nice too. But for the latter, it would be as easy as removing a deprecation.

My position is that the latter format should be allowed and removed from deprecation for the foreseeable future outside of code marked @safe, but perhaps don't allow it in @safe code.

I don't know if it's ever considered acceptable practice for runtime warnings to be automatically inserted into a program, but perhaps for debug builds a runtime warning can be inserted whenever the array is longer than, or perhaps more than 3/4 the size allowed by the type of i (as in the code above).

But if it's still considered unacceptable to have the now-deprecated format shown above be brought back as a language feature, I suggest the following as a compromise:

    foreach (cast uint i, entry; array)

and also

    foreach (cast uint i; 0 .. array.length)

March 25

Re: Bring back foreach int indexes

Posted by Nick Treleaven
in reply to Liam McGillivray

Permalink

Nick Treleaven

Posted in reply to Liam McGillivray

Permalink

On Sunday, 24 March 2024 at 08:23:03 UTC, Liam McGillivray wrote:

On Thursday, 30 November 2023 at 15:25:52 UTC, Jonathan M Davis wrote:

It's good to make any integer truncation visible rather than implicit - that's the main reason. And given that it will be safe to use a smaller integer type than size_t when the array length is statically known (after https://github.com/dlang/dmd/pull/16334), some future people might expect a specified index type to be verified as able to hold every index in the array.

The 64-bit version of the program may be expected to handle more data than the 32-bit version. That could even be the reason why it was ported to 64-bit.

...

Maybe disallow it from functions marked @safe,

@safe is for memory-safety, it shouldn't be conflated with other types of safety.

But if it's still considered unacceptable to have the now-deprecated format shown above be brought back as a language feature, I suggest the following as a compromise:

    foreach (cast uint i, entry; array)

Possibly a wrapper could be made which asserts that the length fits:

    foreach (i, entry; array.withIndex!uint)

Top | Forum index | About this forum

Forums