Thread overview | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
|
September 30, 2016 Should r.length of type ulong be supported on 32-bit systems? | ||||
---|---|---|---|---|
| ||||
https://github.com/dlang/phobos/pull/4827 still allows that but specifies that phobos algorithms are not required to. -- Andrei |
September 30, 2016 Re: Should r.length of type ulong be supported on 32-bit systems? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | On 9/30/2016 7:31 PM, Andrei Alexandrescu wrote:
> https://github.com/dlang/phobos/pull/4827 still allows that but specifies that
> phobos algorithms are not required to. -- Andrei
A couple instances where a length may be longer than the address space:
1. large files
2. sequences of mathematically generated values
I don't see a particular advantage to restricting hasLength() to be size_t or ulong. Whether an algorithm requires the length to be within the addressable capabilities of the CPU or not should be up to the algorithm.
There is also cause for r.length to return an int even on a 64 bit system - it can be more efficient.
|
October 01, 2016 Re: Should r.length of type ulong be supported on 32-bit systems? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | On Saturday, 1 October 2016 at 02:31:23 UTC, Andrei Alexandrescu wrote: > https://github.com/dlang/phobos/pull/4827 still allows that but specifies that phobos algorithms are not required to. -- Andrei In at least some cases it's bad. See: http://forum.dlang.org/thread/vqeeyqprqphwktebgaiw@forum.dlang.org?page=1 |
October 01, 2016 Re: Should r.length of type ulong be supported on 32-bit systems? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Walter Bright | On 09/30/2016 11:58 PM, Walter Bright wrote:
> On 9/30/2016 7:31 PM, Andrei Alexandrescu wrote:
>> https://github.com/dlang/phobos/pull/4827 still allows that but
>> specifies that
>> phobos algorithms are not required to. -- Andrei
>
> A couple instances where a length may be longer than the address space:
>
> 1. large files
> 2. sequences of mathematically generated values
>
> I don't see a particular advantage to restricting hasLength() to be
> size_t or ulong. Whether an algorithm requires the length to be within
> the addressable capabilities of the CPU or not should be up to the
> algorithm.
>
> There is also cause for r.length to return an int even on a 64 bit
> system - it can be more efficient.
Indeed the "I have a dream" cases are quite the lure. The reality in the field, however, is that Phobos thoroughly assumes length has type size_t. The problem is defining hasLength in ways that are at the same time general and that also work is extremely laborious. Just grep through Consider:
1. std.algorithm.comparison:1007 (abridged)
const rc = mulu(r, c, overflow);
if (_matrix.length < rc)
So here we assume length is comparable for inequality with a const size_t. That should probably rule out signed integrals.
2. ibid, 1694 (abridged)
static if (hasLength!(Range1) && hasLength!(Range2))
{
return r1.length == r2.length;
}
Here we assume any two ranges satisfying hasLength have lengths comparable for equality. That projects a rather tenuous definition of hasLength: "the range must define a length that is comparable for equality with any other range that also defines a length". Not impossible, but definitely not what we have right now, de facto or de jure.
3. ibid, 634 (abridged)
immutable len = min(r1.length, r2.length);
So here two lengths must be comparable for ordering, have a common type, and have an immutable constructor.
4. ibid, 656 (abridged)
return threeWay(r1.length, r2.length);
Here threeWay takes two size_t. So both lengths must be at least convertible implicitly to size_t.
I got to the end of the first file I scanned with grep, and I don't know what a robust definition of hasLength should be - outside of size_t itself. The more locations one examines, the more the capabilities required get closer to those of size_t - or the more latent bugs exist in phobos if we want to support something else.
We have two options here.
1. Support a range of types (e.g.: any integral and any user-defined type that supports the following list of primitives: ...) for length with hasLength, and then have each range-based algorithm define its own additional restrictions if they have such. That means right now a bunch of phobos code is incorrect and needs to be fixed.
2. Require lengths to be size_t.
I'm all for generality, but of the useful kind. It seems to me supporting a complicated definition of length is a protracted proposition with little upside. That's why I'm going with the engineering solution in the PR.
Andrei
|
September 30, 2016 Re: Should r.length of type ulong be supported on 32-bit systems? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | On 9/30/2016 10:04 PM, Andrei Alexandrescu wrote:
> I'm all for generality, but of the useful kind. It seems to me supporting a
> complicated definition of length is a protracted proposition with little upside.
> That's why I'm going with the engineering solution in the PR.
You make a good case. Agreed.
|
October 01, 2016 Re: Should r.length of type ulong be supported on 32-bit systems? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | On Friday, September 30, 2016 22:31:23 Andrei Alexandrescu via Digitalmars-d wrote:
> https://github.com/dlang/phobos/pull/4827 still allows that but specifies that phobos algorithms are not required to. -- Andrei
Every time this comes up, I strongly argue in favor of requiring that length be size_t. Occasionally, that restriction is annoying, but allowing anything else plays havoc with generic code. It's even worse when you have to start dealing with the fact that the same code needs to compile and function the same on both 32-bit and 64-bit systems.
At one point, iota was changed to support lengths of long or ulong in order to give you the full range of numbers for a range of long or ulong on 32-bit systems, and that's caused us a fair bit of grief in various places in Phobos, because everything was assuming that length was size_t and doing anything else can get fairly complicated. And I think that it's still the case that a fair bit of Phobos will fall flat on its face when dealing with a length that isn't size_t. Fortunately, it looks like iota was finally fixed to use size_t for length again.
This is some loss of expressiveness as a result of requiring that length be size_t, but it's rarely a problem, and it simplifies code that deals with length significantly.
- Jonathan M Davis
|
October 01, 2016 Re: Should r.length of type ulong be supported on 32-bit systems? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Jonathan M Davis | On Saturday, 1 October 2016 at 08:09:45 UTC, Jonathan M Davis wrote:
> On Friday, September 30, 2016 22:31:23 Andrei Alexandrescu via Digitalmars-d wrote:
>> [...]
>
> Every time this comes up, I strongly argue in favor of requiring that length be size_t. Occasionally, that restriction is annoying, but allowing anything else plays havoc with generic code. It's even worse when you have to start dealing with the fact that the same code needs to compile and function the same on both 32-bit and 64-bit systems.
>
> [...]
+1
|
October 01, 2016 Re: Should r.length of type ulong be supported on 32-bit systems? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | On Saturday, 1 October 2016 at 02:31:23 UTC, Andrei Alexandrescu wrote:
> https://github.com/dlang/phobos/pull/4827 still allows that but specifies that phobos algorithms are not required to. -- Andrei
I don't have much experience, but IMHO the length should be restricted to be any of the built-in unsigned integral types (ubyte, ushort, uint, size_t, ulong). This way, all comparisons for equality or inequality are defined and correct, and the common type exists and is the one you'd expect. Also, every other operation on them works as expected, because smaller types are widened and the absence of signed types guarantees no strange results.
The only problem that arises with this definition of length is that functions cannot take a length parameter as size_t, because that could truncate it. Possible solutions, ordered from the best to the worst:
1) functions should try not to take lengths; they should take ranges or slices, which can be queried for their length;
2) functions could take lengths as the greatest possible type (ulong);
3) functions could be parameterized on the length type;
4) a minority of functions could, if deemed really necessary (for performance or whatever), use size_t, and it should be documented that they expect the length to be limited by the addressable length of the machine, and for which reason they do.
|
October 01, 2016 Re: Should r.length of type ulong be supported on 32-bit systems? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Lodovico Giaretta | On 10/01/2016 05:30 AM, Lodovico Giaretta wrote: > On Saturday, 1 October 2016 at 02:31:23 UTC, Andrei Alexandrescu wrote: >> https://github.com/dlang/phobos/pull/4827 still allows that but >> specifies that phobos algorithms are not required to. -- Andrei > > I don't have much experience, but IMHO the length should be restricted > to be any of the built-in unsigned integral types (ubyte, ushort, uint, > size_t, ulong). This way, all comparisons for equality or inequality are > defined and correct, and the common type exists and is the one you'd > expect. Also, every other operation on them works as expected, because > smaller types are widened and the absence of signed types guarantees no > strange results. I've looked through phobos and sizes smaller than size_t present less risk. The issue is that e.g. code is liable to do: auto len = r.length; and the proceeds to use len tacitly assuming it's a size_t. If r.length were allowed to return e.g. ubyte, doing something like ++len may overflow a lot earlier than the code would expect. I made a quick pass through Phobos looking for potential issues. I found a couple in replaceLast (uses arithmetic on indexes obtained with auto) but not much else (in fairness I stopped at std/file.d in whatever order "git grep" lists files). So in all likelihood sizes smaller than size_t may be easier to make work. However, this seems to complicate definitions without a corresponding payoff. For example a natural consequence would be to have operator[] take the same type as the length. That would definitely cause problems because people pass index computations to operator[] such as 1, idx + 1, or $ - 1. > The only problem that arises with this definition of length is that > functions cannot take a length parameter as size_t, because that could > truncate it. I think Jonathan is right: sizes longer than size_t have numerous issues - in fact enough that the limiting decision was made (by a group of contributors and approved by us) to force iota.length to return size_t. Andrei |
October 02, 2016 Re: Should r.length of type ulong be supported on 32-bit systems? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | Am Sat, 1 Oct 2016 01:04:01 -0400 schrieb Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org>: > On 09/30/2016 11:58 PM, Walter Bright wrote: > > On 9/30/2016 7:31 PM, Andrei Alexandrescu wrote: > >> https://github.com/dlang/phobos/pull/4827 still allows that but > >> specifies that > >> phobos algorithms are not required to. -- Andrei > > > > A couple instances where a length may be longer than the address space: > > > > 1. large files > > 2. sequences of mathematically generated values > > > > I don't see a particular advantage to restricting hasLength() to be size_t or ulong. Whether an algorithm requires the length to be within the addressable capabilities of the CPU or not should be up to the algorithm. > > > > There is also cause for r.length to return an int even on a 64 bit system - it can be more efficient. > > Indeed the "I have a dream" cases are quite the lure. The reality in the field, however, is that Phobos thoroughly assumes length has type size_t. The problem is defining hasLength in ways that are at the same time general and that also work is extremely laborious. Just grep through Consider: > > 1. std.algorithm.comparison:1007 (abridged) > > const rc = mulu(r, c, overflow); > if (_matrix.length < rc) > > So here we assume length is comparable for inequality with a const size_t. That should probably rule out signed integrals. > > 2. ibid, 1694 (abridged) > > static if (hasLength!(Range1) && hasLength!(Range2)) > { > return r1.length == r2.length; > } > > Here we assume any two ranges satisfying hasLength have lengths comparable for equality. That projects a rather tenuous definition of hasLength: "the range must define a length that is comparable for equality with any other range that also defines a length". Not impossible, but definitely not what we have right now, de facto or de jure. > > 3. ibid, 634 (abridged) > > immutable len = min(r1.length, r2.length); > > So here two lengths must be comparable for ordering, have a common type, and have an immutable constructor. > > 4. ibid, 656 (abridged) > > return threeWay(r1.length, r2.length); > > Here threeWay takes two size_t. So both lengths must be at least convertible implicitly to size_t. > > I got to the end of the first file I scanned with grep, and I don't know what a robust definition of hasLength should be - outside of size_t itself. The more locations one examines, the more the capabilities required get closer to those of size_t - or the more latent bugs exist in phobos if we want to support something else. > > We have two options here. > > 1. Support a range of types (e.g.: any integral and any user-defined type that supports the following list of primitives: ...) for length with hasLength, and then have each range-based algorithm define its own additional restrictions if they have such. That means right now a bunch of phobos code is incorrect and needs to be fixed. > > 2. Require lengths to be size_t. > > I'm all for generality, but of the useful kind. It seems to me supporting a complicated definition of length is a protracted proposition with little upside. That's why I'm going with the engineering solution in the PR. > > > Andrei Me too, I haven't looked into it, but I noticed that all the points you mentioned are resolved if length was required to be a natural number, i.e. unsigned basic type. Based on these points alone, I tend to agree with Walter. Note: Point 4. is arguable, because `threeWay` is a local function designed to compare char types and only used once to compare lengths under the (silent) assumption that they are < int.max. I.e. It would be cleaner to replace that call with an explicit comparison. -- Marco |
Copyright © 1999-2021 by the D Language Foundation