December 29, 2012
12/29/2012 1:30 PM, Jonathan M Davis пишет:
> On Saturday, December 29, 2012 07:12:49 Mehrdad wrote:
>> On Saturday, 29 December 2012 at 02:14:22 UTC, Jonathan M Davis
>>
>> wrote:
>>> I'd be very strongly inclined to go with #4 and just say that
>>> anyone who actually cares about numbers that large should use
>>> 64-bit.
>>
>> Uhm, 64-bit?
>>
>> What about large files (> 4 GiB) on 32-bit systems?
>>
>> (Say, if a range wants to provide random access to the bytes of
>> an 8 GiB file?)
>
> You bring up a good point, but not even arrays support that, so stuff like
> std.mmfile can't possibly work with large files on 32-bit systems, not with the
> current design anyway (as it gives you an array to operate on). It may be that
> you just need to do something different if you want to operate on large files on
> 32-bit systems. I don't know.

s/array/slice
It maps a view of a part of file. Obviously one can't map parts greater then address space.

>
> Plenty of stuff right now in Phobos isn't going to work right now if you try
> and have a length of 64 bits on a 32-bit system. size_t is used pretty much
> everywhere. And it's going to make generic code a _lot_ less pleasant if we
> have to worry about using anything other than size_t. It's already been
> causing us problems that iota does, and we've discussed making it so that it
> doesn't. But even if we allow ulong for length and indexing on 32-bit systems,
> that's a completely different ball game from permitting BigInt.
>
> So, I'd _very_ much like to always restruct length and indices to size_t for
> ranges. It will make all of the generic handling of that stuff far more
> pleasant. But we may very well be forced to permit ulong on 32-bit systems due
> to practical concerns such as large file handling on 32-bit systems.
>

I think 32-bit version have to bite the bullet and try ulong/uint deduction.

> Oh, how I wish that 32-bit systems would just die out.
>It would make so many
> things so much easier. No such luck on that yet though, much as most PC
> processors are 64-bit now.
>

Yeah, bad sadly they are practical ;)
As 32bit is now dominating MCU world...

-- 
Dmitry Olshansky
December 29, 2012
On 12/29/12 4:00 AM, Peter Alexander wrote:
> On Saturday, 29 December 2012 at 03:28:47 UTC, Andrei Alexandrescu wrote:
>> On 12/28/12 9:21 PM, Peter Alexander wrote:
>>> I'm already assuming 64-bit. What I'm saying is that 64-bit sometimes
>>> isn't even enough (for example in the case of a permutations range). I'm
>>> asking if allowing length to return BigInt would be reasonable.
>>
>> I think it would complicate a lot of things for the benefit of a few
>> cases.
>
> I agree, but the range of permutations will be longer than 2^64 on
> occasion, so what do we do?
>
> * Don't provide .length at all
> * Provide it as size_t, and assert on overflow
> * Provide it as size_t, and silently overflow

I'd say give it a different name, i.e. bigLength.

Andrei
December 29, 2012
On Saturday, 29 December 2012 at 09:31:02 UTC, Jonathan M Davis wrote:
> not even arrays support that, so stuff like std.mmfile can't possibly work with large files on 32-bit systems


I didn't mean mmfile, I meant random access to the bytes of a regular file. No need for mapping it to memory.
December 29, 2012
On Saturday, 29 December 2012 at 14:23:09 UTC, Andrei Alexandrescu wrote:
> On 12/29/12 4:00 AM, Peter Alexander wrote:
>> On Saturday, 29 December 2012 at 03:28:47 UTC, Andrei Alexandrescu wrote:
>>> On 12/28/12 9:21 PM, Peter Alexander wrote:
>>>> I'm already assuming 64-bit. What I'm saying is that 64-bit sometimes
>>>> isn't even enough (for example in the case of a permutations range). I'm
>>>> asking if allowing length to return BigInt would be reasonable.
>>>
>>> I think it would complicate a lot of things for the benefit of a few
>>> cases.
>>
>> I agree, but the range of permutations will be longer than 2^64 on
>> occasion, so what do we do?
>>
>> * Don't provide .length at all
>> * Provide it as size_t, and assert on overflow
>> * Provide it as size_t, and silently overflow
>
> I'd say give it a different name, i.e. bigLength.

But then the range doesn't have a length, even though it would be useful to have it in a lot of cases.

How about, have .length, and allow it to dangerously overflow, but also provide .bigLength, which returns a BigInt for when people need it?

December 29, 2012
On 29/12/2012 01:18, Peter Alexander wrote:
> There was a discussion a while ago about the type returned by .length
> for ranges. I believe the conclusion was that it should always be size_t.
>
> My question now is what to do with really large ranges? For example, if
> we have a range of permutations of the elements in a set, when the set
> is larger than 21 the number of permutations is 21 factorial, which is
> over 2^64. As far as I see, there are three options:

They look like four to me.

> 1. Allow ranges to return BigInt for length.
> 2. Allow ranges like this but assert when .length overflows.
> 3. Allow ranges like this and just allow .length to overflow dangerously.
> 4. Do not allow ranges like this.
<snip>

5. Don't allow such ranges to define a .length property.

From the std.range documentation:

template hasLength(R)
    Returns true if R has a length member that returns an integral type. R does not have to be a range. Note that length is an optional primitive as no range must implement it. Some ranges do not store their length explicitly, some cannot compute it without actually exhausting the range (e.g. socket streams), and some other ranges may be infinite.

There you go.  For reasons such as this, ranges aren't required to define a length property.  As well as I/O ranges and infinite ranges, ranges where the length is liable to be greater than ulong.max are among those that naturally won't have this property.

If we define a length that may overflow, then sooner or later some code will be called on it that calls hasLength on the range, finds it to be true, and then relies on length and malfunctions because the value returned doesn't match the actual length.

Stewart.
December 29, 2012
On 12/29/12 3:38 PM, Peter Alexander wrote:
> How about, have .length, and allow it to dangerously overflow, but also
> provide .bigLength, which returns a BigInt for when people need it?

I'd think you convinced yourself it's a bad idea while writing it. I did the same :o).

Andrei
December 29, 2012
On Saturday, 29 December 2012 at 20:59:56 UTC, Stewart Gordon wrote:
> On 29/12/2012 01:18, Peter Alexander wrote:
>> My question now is what to do with really large ranges? For example, if
>> we have a range of permutations of the elements in a set, when the set
>> is larger than 21 the number of permutations is 21 factorial, which is
>> over 2^64. As far as I see, there are three options:
>
> They look like four to me.

It's also three, for large values of three :-)


> If we define a length that may overflow, then sooner or later some code will be called on it that calls hasLength on the range, finds it to be true, and then relies on length and malfunctions because the value returned doesn't match the actual length.

Using this logic, chain shouldn't have length either (it can overflow, as I demonstrated in the original post), but it does, and it should. There are many ranges in Phobos that may overflow in length.

The problem with permutations is that, for the kinds of permutations you are likely to use often, length will not overflow. It would be really useful to have length available when this is the case.

December 29, 2012
On Saturday, 29 December 2012 at 20:59:56 UTC, Stewart Gordon
wrote:
>
> 5. Don't allow such ranges to define a .length property.
>

I still think there is a fundamental difference between:
5a: Don't allow such ranges to define a .length property.
and
5b: Don't guarantee *support* for such ranges.

The current status quo in phobos is "5b" BTW. There are a few
select algorithms that do support it, but only because the implementers decided to go the extra mile, and it usually makes things more complicated than it needs to be.

BTW iota, in certain cases, will have a length returned in ulong.
This is used to test *certain* algorithms, but really, nothing is
guaranteed.

At the very least, there was a discussion some time ago that for
a (finite) RA range, range[range.length] must compile. Ditto for
slicing.

This was never enforced though... Which is a shame (IMO)...
...Well, I was the one to suggest that actually, but I never did it, so blame falls on me :/

I'll put it back on my todo list.
December 29, 2012
On Saturday, December 29, 2012 22:18:53 monarch_dodra wrote:
> At the very least, there was a discussion some time ago that for a (finite) RA range, range[range.length] must compile. Ditto for slicing.
> 
> This was never enforced though... Which is a shame (IMO)... ...Well, I was the one to suggest that actually, but I never did it, so blame falls on me :/
> 
> I'll put it back on my todo list.

It's now enforced that finite random access ranges have length, so we're part of the way there.

The big thing that we really need though is for issue# 7177 to be implemented.

http://d.puremagic.com/issues/show_bug.cgi?id=7177

If that gets implemented, then we can require that random access ranges and ranges with slicing fully support $.

We'd probably still need to check range[range.length] though, just to check that length and indexing are properly compatible.

- Jonathan M Davis
December 30, 2012
On Saturday, 29 December 2012 at 06:12:52 UTC, Mehrdad wrote:
> Uhm, 64-bit?
>
> What about large files (> 4 GiB) on 32-bit systems?
>
> (Say, if a range wants to provide random access to the bytes of an 8 GiB file?)
>
> And what about large bit vectors? A 600 MiB bit vector needs > 2^32 bits for length...

AFAIK, no 32-bit OS supports files larger than 4Gb anyway.