Thread overview | |||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
December 29, 2012 Ranges longer than size_t.max | ||||
---|---|---|---|---|
| ||||
There was a discussion a while ago about the type returned by .length for ranges. I believe the conclusion was that it should always be size_t. My question now is what to do with really large ranges? For example, if we have a range of permutations of the elements in a set, when the set is larger than 21 the number of permutations is 21 factorial, which is over 2^64. As far as I see, there are three options: 1. Allow ranges to return BigInt for length. 2. Allow ranges like this but assert when .length overflows. 3. Allow ranges like this and just allow .length to overflow dangerously. 4. Do not allow ranges like this. Option 1 solves the problem, but significantly complicates Phobos development for the rare case. Option 2 works, and is practical, although runtime asserts are undesirable. Option 3 is current Phobos practice (presumably because overflow is unlikely). See example below. This may be acceptable currently, but perhaps less so when overflow is more likely. -------------------------------- auto a = iota(size_t.max / 2 + 1); auto b = chain(a, a); writeln(a.length); // 9223372036854775808 writeln(b.length); // 0 -------------------------------- Option 4 works, but it would be a shame to lose these ranges. Thoughts? |
December 29, 2012 Re: Ranges longer than size_t.max | ||||
---|---|---|---|---|
| ||||
Posted in reply to Peter Alexander | On 2012-18-29 02:12, Peter Alexander <peter.alexander.au@gmail.com> wrote: > There was a discussion a while ago about the type returned by .length for ranges. I believe the conclusion was that it should always be size_t. > > My question now is what to do with really large ranges? For example, if we have a range of permutations of the elements in a set, when the set is larger than 21 the number of permutations is 21 factorial, which is over 2^64. As far as I see, there are three options: > > 1. Allow ranges to return BigInt for length. > 2. Allow ranges like this but assert when .length overflows. > 3. Allow ranges like this and just allow .length to overflow dangerously. > 4. Do not allow ranges like this. > > Option 1 solves the problem, but significantly complicates Phobos development for the rare case. > > Option 2 works, and is practical, although runtime asserts are undesirable. > > Option 3 is current Phobos practice (presumably because overflow is unlikely). See example below. This may be acceptable currently, but perhaps less so when overflow is more likely. > -------------------------------- > auto a = iota(size_t.max / 2 + 1); > auto b = chain(a, a); > writeln(a.length); // 9223372036854775808 > writeln(b.length); // 0 > -------------------------------- > > Option 4 works, but it would be a shame to lose these ranges. > > Thoughts? 64 bits ought to be enough for anyone. :p This is a bit of a tough one, and I believe the root problem is that user-defined types are second-class citizens. Specifically, CommonType is incapable of finding the common type of BigInt and int. This makes it hard to find the correct return type for e.g. chain. Behind the scenes, CommonType is using ?: to figure out the correct type. If D supported opImplicitCastFrom (as suggested in WalterAndrei.pdf), ?: could figure out the correct type, and chain's .length could be: private template LengthType(Range) { alias typeof(Range.length) LengthType; } @property auto length() { CommonType!(staticMap!(LengthType, R)) result = 0; foreach (i, Unused; R) { result += source[i].length; } return result; } This would thus support BigInt .length just as well as size_t, byte, or whatever. -- Simen |
December 29, 2012 Re: Ranges longer than size_t.max | ||||
---|---|---|---|---|
| ||||
Posted in reply to Peter Alexander | On Saturday, December 29, 2012 02:18:36 Peter Alexander wrote:
> There was a discussion a while ago about the type returned by .length for ranges. I believe the conclusion was that it should always be size_t.
>
> My question now is what to do with really large ranges? For example, if we have a range of permutations of the elements in a set, when the set is larger than 21 the number of permutations is 21 factorial, which is over 2^64. As far as I see, there are three options:
>
> 1. Allow ranges to return BigInt for length.
> 2. Allow ranges like this but assert when .length overflows.
> 3. Allow ranges like this and just allow .length to overflow
> dangerously.
> 4. Do not allow ranges like this.
>
> Option 1 solves the problem, but significantly complicates Phobos development for the rare case.
>
> Option 2 works, and is practical, although runtime asserts are undesirable.
>
> Option 3 is current Phobos practice (presumably because overflow is unlikely). See example below. This may be acceptable currently, but perhaps less so when overflow is more likely.
> --------------------------------
> auto a = iota(size_t.max / 2 + 1);
> auto b = chain(a, a);
> writeln(a.length); // 9223372036854775808
> writeln(b.length); // 0
> --------------------------------
>
> Option 4 works, but it would be a shame to lose these ranges.
>
> Thoughts?
I'd be very strongly inclined to go with #4 and just say that anyone who actually cares about numbers that large should use 64-bit. Needing ranges of length greater than uint.max is definitely not the norm, and I would expect people caring that much about computational stuff to be using 64-bit anyway, since odds are they'll need the memory. Allowing for length to be anything other than size_t is extremely annoying for generic code.
- Jonathan M Davis
|
December 29, 2012 Re: Ranges longer than size_t.max | ||||
---|---|---|---|---|
| ||||
Posted in reply to Jonathan M Davis | On Saturday, 29 December 2012 at 02:14:22 UTC, Jonathan M Davis wrote:
> On Saturday, December 29, 2012 02:18:36 Peter Alexander wrote:
> I'd be very strongly inclined to go with #4 and just say that anyone who
> actually cares about numbers that large should use 64-bit. Needing ranges of
> length greater than uint.max is definitely not the norm, and I would expect
> people caring that much about computational stuff to be using 64-bit anyway,
> since odds are they'll need the memory. Allowing for length to be anything
> other than size_t is extremely annoying for generic code.
Sorry, I don't think I was clear on what the issue is.
I'm already assuming 64-bit. What I'm saying is that 64-bit sometimes isn't even enough (for example in the case of a permutations range). I'm asking if allowing length to return BigInt would be reasonable.
|
December 29, 2012 Re: Ranges longer than size_t.max | ||||
---|---|---|---|---|
| ||||
Posted in reply to Simen Kjaeraas | On Saturday, 29 December 2012 at 01:56:04 UTC, Simen Kjaeraas wrote:
> @property auto length()
> {
> CommonType!(staticMap!(LengthType, R)) result = 0;
> foreach (i, Unused; R)
> {
> result += source[i].length;
> }
> return result;
> }
>
> This would thus support BigInt .length just as well as size_t, byte,
> or whatever.
This doesn't solve the issue. For example, even with your update, the code snippet I provided in the original post with iota and chain would still break (assuming iota(size_t.max).length returns a size_t).
Also, if we allow BigInt, even with better support in CommonType it would significantly complicate Phobos with LengthType!Range popping up everywhere.
|
December 29, 2012 Re: Ranges longer than size_t.max | ||||
---|---|---|---|---|
| ||||
Posted in reply to Peter Alexander | On 12/28/12 9:21 PM, Peter Alexander wrote:
> On Saturday, 29 December 2012 at 02:14:22 UTC, Jonathan M Davis wrote:
>> On Saturday, December 29, 2012 02:18:36 Peter Alexander wrote:
>> I'd be very strongly inclined to go with #4 and just say that anyone who
>> actually cares about numbers that large should use 64-bit. Needing
>> ranges of
>> length greater than uint.max is definitely not the norm, and I would
>> expect
>> people caring that much about computational stuff to be using 64-bit
>> anyway,
>> since odds are they'll need the memory. Allowing for length to be
>> anything
>> other than size_t is extremely annoying for generic code.
>
> Sorry, I don't think I was clear on what the issue is.
>
> I'm already assuming 64-bit. What I'm saying is that 64-bit sometimes
> isn't even enough (for example in the case of a permutations range). I'm
> asking if allowing length to return BigInt would be reasonable.
I think it would complicate a lot of things for the benefit of a few cases.
Andrei
|
December 29, 2012 Re: Ranges longer than size_t.max | ||||
---|---|---|---|---|
| ||||
Posted in reply to Jonathan M Davis | On Saturday, 29 December 2012 at 02:14:22 UTC, Jonathan M Davis wrote:
> I'd be very strongly inclined to go with #4 and just say that anyone who actually cares about numbers that large should use 64-bit.
Uhm, 64-bit?
What about large files (> 4 GiB) on 32-bit systems?
(Say, if a range wants to provide random access to the bytes of an 8 GiB file?)
And what about large bit vectors? A 600 MiB bit vector needs > 2^32 bits for length...
|
December 29, 2012 Re: Ranges longer than size_t.max | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | On Saturday, 29 December 2012 at 03:28:47 UTC, Andrei Alexandrescu wrote:
> On 12/28/12 9:21 PM, Peter Alexander wrote:
>> I'm already assuming 64-bit. What I'm saying is that 64-bit sometimes
>> isn't even enough (for example in the case of a permutations range). I'm
>> asking if allowing length to return BigInt would be reasonable.
>
> I think it would complicate a lot of things for the benefit of a few cases.
I agree, but the range of permutations will be longer than 2^64 on occasion, so what do we do?
* Don't provide .length at all
* Provide it as size_t, and assert on overflow
* Provide it as size_t, and silently overflow
|
December 29, 2012 Re: Ranges longer than size_t.max | ||||
---|---|---|---|---|
| ||||
Posted in reply to Peter Alexander | On Saturday, December 29, 2012 10:00:44 Peter Alexander wrote:
> On Saturday, 29 December 2012 at 03:28:47 UTC, Andrei
>
> Alexandrescu wrote:
> > On 12/28/12 9:21 PM, Peter Alexander wrote:
> >> I'm already assuming 64-bit. What I'm saying is that 64-bit
> >> sometimes
> >> isn't even enough (for example in the case of a permutations
> >> range). I'm
> >> asking if allowing length to return BigInt would be reasonable.
> >
> > I think it would complicate a lot of things for the benefit of a few cases.
>
> I agree, but the range of permutations will be longer than 2^64 on occasion, so what do we do?
>
> * Don't provide .length at all
> * Provide it as size_t, and assert on overflow
> * Provide it as size_t, and silently overflow
I expect that it'll be silently overflow in most cases, because that's a lot simpler and the problem is almost never an issue. It also incurs a performance penalty in non-release mode where there's almost never any benefit from it. But if you want to provide pull requests with assertions for overflow in various Phobos algorithms, then they'll probably be merged in.
- Jonathan M Davis
|
December 29, 2012 Re: Ranges longer than size_t.max | ||||
---|---|---|---|---|
| ||||
Posted in reply to Mehrdad | On Saturday, December 29, 2012 07:12:49 Mehrdad wrote:
> On Saturday, 29 December 2012 at 02:14:22 UTC, Jonathan M Davis
>
> wrote:
> > I'd be very strongly inclined to go with #4 and just say that anyone who actually cares about numbers that large should use 64-bit.
>
> Uhm, 64-bit?
>
> What about large files (> 4 GiB) on 32-bit systems?
>
> (Say, if a range wants to provide random access to the bytes of
> an 8 GiB file?)
You bring up a good point, but not even arrays support that, so stuff like std.mmfile can't possibly work with large files on 32-bit systems, not with the current design anyway (as it gives you an array to operate on). It may be that you just need to do something different if you want to operate on large files on 32-bit systems. I don't know.
Plenty of stuff right now in Phobos isn't going to work right now if you try and have a length of 64 bits on a 32-bit system. size_t is used pretty much everywhere. And it's going to make generic code a _lot_ less pleasant if we have to worry about using anything other than size_t. It's already been causing us problems that iota does, and we've discussed making it so that it doesn't. But even if we allow ulong for length and indexing on 32-bit systems, that's a completely different ball game from permitting BigInt.
So, I'd _very_ much like to always restruct length and indices to size_t for ranges. It will make all of the generic handling of that stuff far more pleasant. But we may very well be forced to permit ulong on 32-bit systems due to practical concerns such as large file handling on 32-bit systems.
Oh, how I wish that 32-bit systems would just die out. It would make so many things so much easier. No such luck on that yet though, much as most PC processors are 64-bit now.
- Jonathan M Davis
|
Copyright © 1999-2021 by the D Language Foundation