Jump to page: 1 24  
Page
Thread overview
Ranges longer than size_t.max
Dec 29, 2012
Peter Alexander
Dec 29, 2012
Simen Kjaeraas
Dec 29, 2012
Peter Alexander
Dec 29, 2012
Jonathan M Davis
Dec 29, 2012
Peter Alexander
Dec 29, 2012
Peter Alexander
Dec 29, 2012
Jonathan M Davis
Dec 29, 2012
Peter Alexander
Dec 29, 2012
Mehrdad
Dec 29, 2012
Jonathan M Davis
Dec 29, 2012
Dmitry Olshansky
Dec 29, 2012
Mehrdad
Dec 30, 2012
SomeDude
Dec 30, 2012
Dmitry Olshansky
Dec 31, 2012
Era Scarecrow
Dec 31, 2012
Stewart Gordon
Dec 31, 2012
Era Scarecrow
Dec 30, 2012
Jonathan M Davis
Jan 02, 2013
SomeDude
Dec 31, 2012
Marco Leise
Dec 29, 2012
Stewart Gordon
Dec 29, 2012
Peter Alexander
Dec 31, 2012
Stewart Gordon
Dec 29, 2012
monarch_dodra
Dec 29, 2012
Jonathan M Davis
Dec 31, 2012
Era Scarecrow
Dec 31, 2012
Peter Alexander
Dec 31, 2012
Era Scarecrow
Dec 31, 2012
monarch_dodra
Dec 31, 2012
Era Scarecrow
December 29, 2012
There was a discussion a while ago about the type returned by .length for ranges. I believe the conclusion was that it should always be size_t.

My question now is what to do with really large ranges? For example, if we have a range of permutations of the elements in a set, when the set is larger than 21 the number of permutations is 21 factorial, which is over 2^64. As far as I see, there are three options:

1. Allow ranges to return BigInt for length.
2. Allow ranges like this but assert when .length overflows.
3. Allow ranges like this and just allow .length to overflow dangerously.
4. Do not allow ranges like this.

Option 1 solves the problem, but significantly complicates Phobos development for the rare case.

Option 2 works, and is practical, although runtime asserts are undesirable.

Option 3 is current Phobos practice (presumably because overflow is unlikely). See example below. This may be acceptable currently, but perhaps less so when overflow is more likely.
--------------------------------
auto a = iota(size_t.max / 2 + 1);	
auto b = chain(a, a);
writeln(a.length); // 9223372036854775808
writeln(b.length); // 0
--------------------------------

Option 4 works, but it would be a shame to lose these ranges.

Thoughts?
December 29, 2012
On 2012-18-29 02:12, Peter Alexander <peter.alexander.au@gmail.com> wrote:

> There was a discussion a while ago about the type returned by .length for ranges. I believe the conclusion was that it should always be size_t.
>
> My question now is what to do with really large ranges? For example, if we have a range of permutations of the elements in a set, when the set is larger than 21 the number of permutations is 21 factorial, which is over 2^64. As far as I see, there are three options:
>
> 1. Allow ranges to return BigInt for length.
> 2. Allow ranges like this but assert when .length overflows.
> 3. Allow ranges like this and just allow .length to overflow dangerously.
> 4. Do not allow ranges like this.
>
> Option 1 solves the problem, but significantly complicates Phobos development for the rare case.
>
> Option 2 works, and is practical, although runtime asserts are undesirable.
>
> Option 3 is current Phobos practice (presumably because overflow is unlikely). See example below. This may be acceptable currently, but perhaps less so when overflow is more likely.
> --------------------------------
> auto a = iota(size_t.max / 2 + 1);	
> auto b = chain(a, a);
> writeln(a.length); // 9223372036854775808
> writeln(b.length); // 0
> --------------------------------
>
> Option 4 works, but it would be a shame to lose these ranges.
>
> Thoughts?

64 bits ought to be enough for anyone. :p

This is a bit of a tough one, and I believe the root problem is that
user-defined types are second-class citizens. Specifically, CommonType
is incapable of finding the common type of BigInt and int. This makes
it hard to find the correct return type for e.g. chain. Behind the
scenes, CommonType is using ?: to figure out the correct type. If D
supported opImplicitCastFrom (as suggested in WalterAndrei.pdf), ?:
could figure out the correct type, and chain's .length could be:

  private template LengthType(Range)
  {
      alias typeof(Range.length) LengthType;
  }

  @property auto length()
  {
      CommonType!(staticMap!(LengthType, R)) result = 0;
      foreach (i, Unused; R)
      {
          result += source[i].length;
      }
      return result;
  }

This would thus support BigInt .length just as well as size_t, byte,
or whatever.

-- 
Simen
December 29, 2012
On Saturday, December 29, 2012 02:18:36 Peter Alexander wrote:
> There was a discussion a while ago about the type returned by .length for ranges. I believe the conclusion was that it should always be size_t.
> 
> My question now is what to do with really large ranges? For example, if we have a range of permutations of the elements in a set, when the set is larger than 21 the number of permutations is 21 factorial, which is over 2^64. As far as I see, there are three options:
> 
> 1. Allow ranges to return BigInt for length.
> 2. Allow ranges like this but assert when .length overflows.
> 3. Allow ranges like this and just allow .length to overflow
> dangerously.
> 4. Do not allow ranges like this.
> 
> Option 1 solves the problem, but significantly complicates Phobos development for the rare case.
> 
> Option 2 works, and is practical, although runtime asserts are undesirable.
> 
> Option 3 is current Phobos practice (presumably because overflow is unlikely). See example below. This may be acceptable currently, but perhaps less so when overflow is more likely.
> --------------------------------
> auto a = iota(size_t.max / 2 + 1);
> auto b = chain(a, a);
> writeln(a.length); // 9223372036854775808
> writeln(b.length); // 0
> --------------------------------
> 
> Option 4 works, but it would be a shame to lose these ranges.
> 
> Thoughts?

I'd be very strongly inclined to go with #4 and just say that anyone who actually cares about numbers that large should use 64-bit. Needing ranges of length greater than uint.max is definitely not the norm, and I would expect people caring that much about computational stuff to be using 64-bit anyway, since odds are they'll need the memory. Allowing for length to be anything other than size_t is extremely annoying for generic code.

- Jonathan M Davis
December 29, 2012
On Saturday, 29 December 2012 at 02:14:22 UTC, Jonathan M Davis wrote:
> On Saturday, December 29, 2012 02:18:36 Peter Alexander wrote:
> I'd be very strongly inclined to go with #4 and just say that anyone who
> actually cares about numbers that large should use 64-bit. Needing ranges of
> length greater than uint.max is definitely not the norm, and I would expect
> people caring that much about computational stuff to be using 64-bit anyway,
> since odds are they'll need the memory. Allowing for length to be anything
> other than size_t is extremely annoying for generic code.

Sorry, I don't think I was clear on what the issue is.

I'm already assuming 64-bit. What I'm saying is that 64-bit sometimes isn't even enough (for example in the case of a permutations range). I'm asking if allowing length to return BigInt would be reasonable.
December 29, 2012
On Saturday, 29 December 2012 at 01:56:04 UTC, Simen Kjaeraas wrote:
>   @property auto length()
>   {
>       CommonType!(staticMap!(LengthType, R)) result = 0;
>       foreach (i, Unused; R)
>       {
>           result += source[i].length;
>       }
>       return result;
>   }
>
> This would thus support BigInt .length just as well as size_t, byte,
> or whatever.

This doesn't solve the issue. For example, even with your update, the code snippet I provided in the original post with iota and chain would still break (assuming iota(size_t.max).length returns a size_t).

Also, if we allow BigInt, even with better support in CommonType it would significantly complicate Phobos with LengthType!Range popping up everywhere.

December 29, 2012
On 12/28/12 9:21 PM, Peter Alexander wrote:
> On Saturday, 29 December 2012 at 02:14:22 UTC, Jonathan M Davis wrote:
>> On Saturday, December 29, 2012 02:18:36 Peter Alexander wrote:
>> I'd be very strongly inclined to go with #4 and just say that anyone who
>> actually cares about numbers that large should use 64-bit. Needing
>> ranges of
>> length greater than uint.max is definitely not the norm, and I would
>> expect
>> people caring that much about computational stuff to be using 64-bit
>> anyway,
>> since odds are they'll need the memory. Allowing for length to be
>> anything
>> other than size_t is extremely annoying for generic code.
>
> Sorry, I don't think I was clear on what the issue is.
>
> I'm already assuming 64-bit. What I'm saying is that 64-bit sometimes
> isn't even enough (for example in the case of a permutations range). I'm
> asking if allowing length to return BigInt would be reasonable.

I think it would complicate a lot of things for the benefit of a few cases.

Andrei
December 29, 2012
On Saturday, 29 December 2012 at 02:14:22 UTC, Jonathan M Davis wrote:
> I'd be very strongly inclined to go with #4 and just say that anyone who actually cares about numbers that large should use 64-bit.



Uhm, 64-bit?

What about large files (> 4 GiB) on 32-bit systems?

(Say, if a range wants to provide random access to the bytes of an 8 GiB file?)

And what about large bit vectors? A 600 MiB bit vector needs > 2^32 bits for length...
December 29, 2012
On Saturday, 29 December 2012 at 03:28:47 UTC, Andrei Alexandrescu wrote:
> On 12/28/12 9:21 PM, Peter Alexander wrote:
>> I'm already assuming 64-bit. What I'm saying is that 64-bit sometimes
>> isn't even enough (for example in the case of a permutations range). I'm
>> asking if allowing length to return BigInt would be reasonable.
>
> I think it would complicate a lot of things for the benefit of a few cases.

I agree, but the range of permutations will be longer than 2^64 on occasion, so what do we do?

* Don't provide .length at all
* Provide it as size_t, and assert on overflow
* Provide it as size_t, and silently overflow
December 29, 2012
On Saturday, December 29, 2012 10:00:44 Peter Alexander wrote:
> On Saturday, 29 December 2012 at 03:28:47 UTC, Andrei
> 
> Alexandrescu wrote:
> > On 12/28/12 9:21 PM, Peter Alexander wrote:
> >> I'm already assuming 64-bit. What I'm saying is that 64-bit
> >> sometimes
> >> isn't even enough (for example in the case of a permutations
> >> range). I'm
> >> asking if allowing length to return BigInt would be reasonable.
> > 
> > I think it would complicate a lot of things for the benefit of a few cases.
> 
> I agree, but the range of permutations will be longer than 2^64 on occasion, so what do we do?
> 
> * Don't provide .length at all
> * Provide it as size_t, and assert on overflow
> * Provide it as size_t, and silently overflow

I expect that it'll be silently overflow in most cases, because that's a lot simpler and the problem is almost never an issue. It also incurs a performance penalty in non-release mode where there's almost never any benefit from it. But if you want to provide pull requests with assertions for overflow in various Phobos algorithms, then they'll probably be merged in.

- Jonathan M Davis
December 29, 2012
On Saturday, December 29, 2012 07:12:49 Mehrdad wrote:
> On Saturday, 29 December 2012 at 02:14:22 UTC, Jonathan M Davis
> 
> wrote:
> > I'd be very strongly inclined to go with #4 and just say that anyone who actually cares about numbers that large should use 64-bit.
> 
> Uhm, 64-bit?
> 
> What about large files (> 4 GiB) on 32-bit systems?
> 
> (Say, if a range wants to provide random access to the bytes of
> an 8 GiB file?)

You bring up a good point, but not even arrays support that, so stuff like std.mmfile can't possibly work with large files on 32-bit systems, not with the current design anyway (as it gives you an array to operate on). It may be that you just need to do something different if you want to operate on large files on 32-bit systems. I don't know.

Plenty of stuff right now in Phobos isn't going to work right now if you try and have a length of 64 bits on a 32-bit system. size_t is used pretty much everywhere. And it's going to make generic code a _lot_ less pleasant if we have to worry about using anything other than size_t. It's already been causing us problems that iota does, and we've discussed making it so that it doesn't. But even if we allow ulong for length and indexing on 32-bit systems, that's a completely different ball game from permitting BigInt.

So, I'd _very_ much like to always restruct length and indices to size_t for ranges. It will make all of the generic handling of that stuff far more pleasant. But we may very well be forced to permit ulong on 32-bit systems due to practical concerns such as large file handling on 32-bit systems.

Oh, how I wish that 32-bit systems would just die out. It would make so many things so much easier. No such luck on that yet though, much as most PC processors are 64-bit now.

- Jonathan M Davis
« First   ‹ Prev
1 2 3 4