View mode: basic / threaded / horizontal-split · Log in · Help
December 29, 2012
Ranges longer than size_t.max
There was a discussion a while ago about the type returned by 
.length for ranges. I believe the conclusion was that it should 
always be size_t.

My question now is what to do with really large ranges? For 
example, if we have a range of permutations of the elements in a 
set, when the set is larger than 21 the number of permutations is 
21 factorial, which is over 2^64. As far as I see, there are 
three options:

1. Allow ranges to return BigInt for length.
2. Allow ranges like this but assert when .length overflows.
3. Allow ranges like this and just allow .length to overflow 
dangerously.
4. Do not allow ranges like this.

Option 1 solves the problem, but significantly complicates Phobos 
development for the rare case.

Option 2 works, and is practical, although runtime asserts are 
undesirable.

Option 3 is current Phobos practice (presumably because overflow 
is unlikely). See example below. This may be acceptable 
currently, but perhaps less so when overflow is more likely.
--------------------------------
auto a = iota(size_t.max / 2 + 1);	
auto b = chain(a, a);
writeln(a.length); // 9223372036854775808
writeln(b.length); // 0
--------------------------------

Option 4 works, but it would be a shame to lose these ranges.

Thoughts?
December 29, 2012
Re: Ranges longer than size_t.max
On 2012-18-29 02:12, Peter Alexander <peter.alexander.au@gmail.com> wrote:

> There was a discussion a while ago about the type returned by .length  
> for ranges. I believe the conclusion was that it should always be size_t.
>
> My question now is what to do with really large ranges? For example, if  
> we have a range of permutations of the elements in a set, when the set  
> is larger than 21 the number of permutations is 21 factorial, which is  
> over 2^64. As far as I see, there are three options:
>
> 1. Allow ranges to return BigInt for length.
> 2. Allow ranges like this but assert when .length overflows.
> 3. Allow ranges like this and just allow .length to overflow dangerously.
> 4. Do not allow ranges like this.
>
> Option 1 solves the problem, but significantly complicates Phobos  
> development for the rare case.
>
> Option 2 works, and is practical, although runtime asserts are  
> undesirable.
>
> Option 3 is current Phobos practice (presumably because overflow is  
> unlikely). See example below. This may be acceptable currently, but  
> perhaps less so when overflow is more likely.
> --------------------------------
> auto a = iota(size_t.max / 2 + 1);	
> auto b = chain(a, a);
> writeln(a.length); // 9223372036854775808
> writeln(b.length); // 0
> --------------------------------
>
> Option 4 works, but it would be a shame to lose these ranges.
>
> Thoughts?

64 bits ought to be enough for anyone. :p

This is a bit of a tough one, and I believe the root problem is that
user-defined types are second-class citizens. Specifically, CommonType
is incapable of finding the common type of BigInt and int. This makes
it hard to find the correct return type for e.g. chain. Behind the
scenes, CommonType is using ?: to figure out the correct type. If D
supported opImplicitCastFrom (as suggested in WalterAndrei.pdf), ?:
could figure out the correct type, and chain's .length could be:

  private template LengthType(Range)
  {
      alias typeof(Range.length) LengthType;
  }

  @property auto length()
  {
      CommonType!(staticMap!(LengthType, R)) result = 0;
      foreach (i, Unused; R)
      {
          result += source[i].length;
      }
      return result;
  }

This would thus support BigInt .length just as well as size_t, byte,
or whatever.

-- 
Simen
December 29, 2012
Re: Ranges longer than size_t.max
On Saturday, December 29, 2012 02:18:36 Peter Alexander wrote:
> There was a discussion a while ago about the type returned by
> .length for ranges. I believe the conclusion was that it should
> always be size_t.
> 
> My question now is what to do with really large ranges? For
> example, if we have a range of permutations of the elements in a
> set, when the set is larger than 21 the number of permutations is
> 21 factorial, which is over 2^64. As far as I see, there are
> three options:
> 
> 1. Allow ranges to return BigInt for length.
> 2. Allow ranges like this but assert when .length overflows.
> 3. Allow ranges like this and just allow .length to overflow
> dangerously.
> 4. Do not allow ranges like this.
> 
> Option 1 solves the problem, but significantly complicates Phobos
> development for the rare case.
> 
> Option 2 works, and is practical, although runtime asserts are
> undesirable.
> 
> Option 3 is current Phobos practice (presumably because overflow
> is unlikely). See example below. This may be acceptable
> currently, but perhaps less so when overflow is more likely.
> --------------------------------
> auto a = iota(size_t.max / 2 + 1);
> auto b = chain(a, a);
> writeln(a.length); // 9223372036854775808
> writeln(b.length); // 0
> --------------------------------
> 
> Option 4 works, but it would be a shame to lose these ranges.
> 
> Thoughts?

I'd be very strongly inclined to go with #4 and just say that anyone who 
actually cares about numbers that large should use 64-bit. Needing ranges of 
length greater than uint.max is definitely not the norm, and I would expect 
people caring that much about computational stuff to be using 64-bit anyway, 
since odds are they'll need the memory. Allowing for length to be anything 
other than size_t is extremely annoying for generic code.

- Jonathan M Davis
December 29, 2012
Re: Ranges longer than size_t.max
On Saturday, 29 December 2012 at 02:14:22 UTC, Jonathan M Davis 
wrote:
> On Saturday, December 29, 2012 02:18:36 Peter Alexander wrote:
> I'd be very strongly inclined to go with #4 and just say that 
> anyone who
> actually cares about numbers that large should use 64-bit. 
> Needing ranges of
> length greater than uint.max is definitely not the norm, and I 
> would expect
> people caring that much about computational stuff to be using 
> 64-bit anyway,
> since odds are they'll need the memory. Allowing for length to 
> be anything
> other than size_t is extremely annoying for generic code.

Sorry, I don't think I was clear on what the issue is.

I'm already assuming 64-bit. What I'm saying is that 64-bit 
sometimes isn't even enough (for example in the case of a 
permutations range). I'm asking if allowing length to return 
BigInt would be reasonable.
December 29, 2012
Re: Ranges longer than size_t.max
On Saturday, 29 December 2012 at 01:56:04 UTC, Simen Kjaeraas 
wrote:
>   @property auto length()
>   {
>       CommonType!(staticMap!(LengthType, R)) result = 0;
>       foreach (i, Unused; R)
>       {
>           result += source[i].length;
>       }
>       return result;
>   }
>
> This would thus support BigInt .length just as well as size_t, 
> byte,
> or whatever.

This doesn't solve the issue. For example, even with your update, 
the code snippet I provided in the original post with iota and 
chain would still break (assuming iota(size_t.max).length returns 
a size_t).

Also, if we allow BigInt, even with better support in CommonType 
it would significantly complicate Phobos with LengthType!Range 
popping up everywhere.
December 29, 2012
Re: Ranges longer than size_t.max
On 12/28/12 9:21 PM, Peter Alexander wrote:
> On Saturday, 29 December 2012 at 02:14:22 UTC, Jonathan M Davis wrote:
>> On Saturday, December 29, 2012 02:18:36 Peter Alexander wrote:
>> I'd be very strongly inclined to go with #4 and just say that anyone who
>> actually cares about numbers that large should use 64-bit. Needing
>> ranges of
>> length greater than uint.max is definitely not the norm, and I would
>> expect
>> people caring that much about computational stuff to be using 64-bit
>> anyway,
>> since odds are they'll need the memory. Allowing for length to be
>> anything
>> other than size_t is extremely annoying for generic code.
>
> Sorry, I don't think I was clear on what the issue is.
>
> I'm already assuming 64-bit. What I'm saying is that 64-bit sometimes
> isn't even enough (for example in the case of a permutations range). I'm
> asking if allowing length to return BigInt would be reasonable.

I think it would complicate a lot of things for the benefit of a few cases.

Andrei
December 29, 2012
Re: Ranges longer than size_t.max
On Saturday, 29 December 2012 at 02:14:22 UTC, Jonathan M Davis 
wrote:
> I'd be very strongly inclined to go with #4 and just say that 
> anyone who actually cares about numbers that large should use 
> 64-bit.



Uhm, 64-bit?

What about large files (> 4 GiB) on 32-bit systems?

(Say, if a range wants to provide random access to the bytes of 
an 8 GiB file?)

And what about large bit vectors? A 600 MiB bit vector needs > 
2^32 bits for length...
December 29, 2012
Re: Ranges longer than size_t.max
On Saturday, 29 December 2012 at 03:28:47 UTC, Andrei 
Alexandrescu wrote:
> On 12/28/12 9:21 PM, Peter Alexander wrote:
>> I'm already assuming 64-bit. What I'm saying is that 64-bit 
>> sometimes
>> isn't even enough (for example in the case of a permutations 
>> range). I'm
>> asking if allowing length to return BigInt would be reasonable.
>
> I think it would complicate a lot of things for the benefit of 
> a few cases.

I agree, but the range of permutations will be longer than 2^64 
on occasion, so what do we do?

* Don't provide .length at all
* Provide it as size_t, and assert on overflow
* Provide it as size_t, and silently overflow
December 29, 2012
Re: Ranges longer than size_t.max
On Saturday, December 29, 2012 10:00:44 Peter Alexander wrote:
> On Saturday, 29 December 2012 at 03:28:47 UTC, Andrei
> 
> Alexandrescu wrote:
> > On 12/28/12 9:21 PM, Peter Alexander wrote:
> >> I'm already assuming 64-bit. What I'm saying is that 64-bit
> >> sometimes
> >> isn't even enough (for example in the case of a permutations
> >> range). I'm
> >> asking if allowing length to return BigInt would be reasonable.
> > 
> > I think it would complicate a lot of things for the benefit of
> > a few cases.
> 
> I agree, but the range of permutations will be longer than 2^64
> on occasion, so what do we do?
> 
> * Don't provide .length at all
> * Provide it as size_t, and assert on overflow
> * Provide it as size_t, and silently overflow

I expect that it'll be silently overflow in most cases, because that's a lot 
simpler and the problem is almost never an issue. It also incurs a performance 
penalty in non-release mode where there's almost never any benefit from it. But 
if you want to provide pull requests with assertions for overflow in various 
Phobos algorithms, then they'll probably be merged in.

- Jonathan M Davis
December 29, 2012
Re: Ranges longer than size_t.max
On Saturday, December 29, 2012 07:12:49 Mehrdad wrote:
> On Saturday, 29 December 2012 at 02:14:22 UTC, Jonathan M Davis
> 
> wrote:
> > I'd be very strongly inclined to go with #4 and just say that
> > anyone who actually cares about numbers that large should use
> > 64-bit.
> 
> Uhm, 64-bit?
> 
> What about large files (> 4 GiB) on 32-bit systems?
> 
> (Say, if a range wants to provide random access to the bytes of
> an 8 GiB file?)

You bring up a good point, but not even arrays support that, so stuff like 
std.mmfile can't possibly work with large files on 32-bit systems, not with the 
current design anyway (as it gives you an array to operate on). It may be that 
you just need to do something different if you want to operate on large files on 
32-bit systems. I don't know.

Plenty of stuff right now in Phobos isn't going to work right now if you try 
and have a length of 64 bits on a 32-bit system. size_t is used pretty much 
everywhere. And it's going to make generic code a _lot_ less pleasant if we 
have to worry about using anything other than size_t. It's already been 
causing us problems that iota does, and we've discussed making it so that it 
doesn't. But even if we allow ulong for length and indexing on 32-bit systems, 
that's a completely different ball game from permitting BigInt.

So, I'd _very_ much like to always restruct length and indices to size_t for 
ranges. It will make all of the generic handling of that stuff far more 
pleasant. But we may very well be forced to permit ulong on 32-bit systems due 
to practical concerns such as large file handling on 32-bit systems.

Oh, how I wish that 32-bit systems would just die out. It would make so many 
things so much easier. No such luck on that yet though, much as most PC 
processors are 64-bit now.

- Jonathan M Davis
« First   ‹ Prev
1 2 3 4
Top | Discussion index | About this forum | D home