February 17, 2011
I assume I'm in the minority here, but I don't see a need for such a function.

Andrei

On 2/17/11 1:40 PM, David Simcha wrote:
> Fair points.  You've convinced me (since I didn't have a very strong opinion before).  Let's go with int.  I've also come to believe that ilength is the best name because it's negligible extra typing compared to .length, so people will actually use it.  Proposed function for inclusion in object:
>
> /**
> Returns the length of an array as a 32-bit signed integer.  Verifies
> with an assert that arr.length <= int.max unless this can be proven
> at compile time.  This is a useful shortcut for getting the
> length of a arrays that cannot plausibly have length longer than
> int.max (about 2 billion) as a 32-bit integer even if building
> for a 64-bit target.  It's also useful for converting an array length
> to a signed integer while verifying the safety of this
> conversion if asserts are enabled.
> */
> @property int ilength(T)(const T[] arr) pure nothrow @safe {
>      static if(size_t.sizeof > uint.sizeof || T.sizeof == 1) {
>          assert(arr.length <= int.max,
> "Cannot get integer length of array with >int.max elements."
>          );
>      }
>
>      return cast(int) arr.length;
> }
>
>
> On Thu, Feb 17, 2011 at 2:24 PM, Don Clugston <dclugston at googlemail.com <mailto:dclugston at googlemail.com>> wrote:
>
>     On 17 February 2011 17:10, David Simcha <dsimcha at gmail.com
>     <mailto:dsimcha at gmail.com>> wrote:
>      > Can you elaborate on why?  Unsigned seems like the perfect type
>     for an array
>      > length.
>
>     An array length is a positive integer, you want to treat it as an
>     integer, not as a bag of bits.
>     Using an unsigned types is like using a cast, to be avoided whenever
>     possible. They're a breeding ground for bugs,
>     and a disturbingly large fraction of programmers don't understand them.
>
>
>
>     On 17 February 2011 18:02, David Simcha <dsimcha at gmail.com
>     <mailto:dsimcha at gmail.com>> wrote:
>      > My main gripe with going with int is that it eliminates the
>     possibility of
>      > making ilength() a noop that just returns .length on 32.  The
>     assert would
>      > still be necessary.
>
>     But the assert adds value.
>     Note that an assert is only required on arrays of size 1 --  ubyte,
>     byte, char.
>     On everything else, it's still a no-op.
>
>
>      > On Thu, Feb 17, 2011 at 9:48 AM, Don Clugston
>     <dclugston at googlemail.com <mailto:dclugston at googlemail.com>>
>      > wrote:
>      >>
>      >> On 17 February 2011 14:59, David Simcha <dsimcha at gmail.com
>     <mailto:dsimcha at gmail.com>> wrote:
>      >> > Hey guys,
>      >> >
>      >> > Kagamin just came up with a simple but great idea to mitigate the
>      >> > pedantic
>      >> > nature of 64-bit to 32-bit integer conversions in cases where
>     using
>      >> > size_t
>      >> > doesn't cut it.  Examples are storing arrays of indices into other
>      >> > arrays,
>      >> > where using size_t would be a colossal waste of space if it's
>     safe to
>      >> > assume
>      >> > none of the arrays will be billions of elements long.
>
>
>      >> >  int or uint?  I used int only b/c that was the example on the
>      >> > newsgroup,
>      >> > but I think uint makes more sense.
>      >>
>      >> I *strongly* oppose uint. We should take every possible
>     opportunity to
>      >> reduce usage of unsigned numbers.
>      >> 'i' implies immutable.
>      >> How about intlength  (or intLength ?)
>     _______________________________________________
>     phobos mailing list
>     phobos at puremagic.com <mailto:phobos at puremagic.com>
>     http://lists.puremagic.com/mailman/listinfo/phobos
>
>
>
>
> _______________________________________________
> phobos mailing list
> phobos at puremagic.com
> http://lists.puremagic.com/mailman/listinfo/phobos
February 17, 2011
On Thu, Feb 17, 2011 at 5:28 PM, Andrei Alexandrescu <andrei at erdani.com>wrote:

> I assume I'm in the minority here, but I don't see a need for such a function.
>
> Andrei
>

The point you're missing is that arrays are a very commonly used thing and give you no choice about integer widths.  Most of the time, if you don't need the extra width (most values in any program represent quantities that can't plausibly be bigger than the maximum value of some fixed-width integer) and want to avoid either the viral effects of using wide integers or the need to insert casts all over the place, you just use a narrower integer.  For the vast majority of quantities people deal with, 32-bit ints are more than enough, so they tend to be what's used most in practice.

With arrays, you don't have that choice, even though the *vast* majority of the time int is plenty and you tend to know when it isn't.  Therefore, you get stuck either dealing with the viralness of array.length being a ulong on 64 or with the ugliness of manually inserting casts everywhere.  You gain virtually nothing in safety in exchange because arrays are almost never (and when I say almost never I mean I've never seen one in my life) over 2 billion elements long.

The point is that by adding this function you lose epsilon in safety, where epsilon is some tiny but nonzero value, and gain a whole bunch in usability.  Realistically, using size_t everywhere is not a good idea (though it admittedly is a good idea in a large portion of cases) for *at least* two reasons, i.e. two that I've encountered already:

1.  Storing arrays of indices into other arrays with size_t's is incredibly wasteful unless there's a realistic chance that one of the arrays you're indexing could be billions of elements long.

2.  Some libraries written in other languages that interface with D (GTK, BLAS and LAPACK come to mind) always assume, even on 64, that int is enough for the length of an array.  The ugliness and tediousness of putting casts everywhere to accommodate this is slowly breeding insanity in me.

--Dave
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.puremagic.com/pipermail/phobos/attachments/20110217/77b379d6/attachment.html>
February 17, 2011
I don't see a problem with using size_t throughout. The machine will be as good or better at operating on size_t (compared to int).

The size argument has been historically made about double and float. Today people use double throughout and float on occasion when they want to optimize storage.

@Don: Making it signed adds insult to injury. It means that we're inconsistent about the way we handle nonnegative entities in the language and its library.


Andrei

On 2/17/11 4:46 PM, David Simcha wrote:
> On Thu, Feb 17, 2011 at 5:28 PM, Andrei Alexandrescu <andrei at erdani.com <mailto:andrei at erdani.com>> wrote:
>
>     I assume I'm in the minority here, but I don't see a need for such a
>     function.
>
>     Andrei
>
>
> The point you're missing is that arrays are a very commonly used thing and give you no choice about integer widths.  Most of the time, if you don't need the extra width (most values in any program represent quantities that can't plausibly be bigger than the maximum value of some fixed-width integer) and want to avoid either the viral effects of using wide integers or the need to insert casts all over the place, you just use a narrower integer.  For the vast majority of quantities people deal with, 32-bit ints are more than enough, so they tend to be what's used most in practice.
>
> With arrays, you don't have that choice, even though the *vast* majority of the time int is plenty and you tend to know when it isn't. Therefore, you get stuck either dealing with the viralness of array.length being a ulong on 64 or with the ugliness of manually inserting casts everywhere.  You gain virtually nothing in safety in exchange because arrays are almost never (and when I say almost never I mean I've never seen one in my life) over 2 billion elements long.
>
> The point is that by adding this function you lose epsilon in safety, where epsilon is some tiny but nonzero value, and gain a whole bunch in usability.  Realistically, using size_t everywhere is not a good idea (though it admittedly is a good idea in a large portion of cases) for *at least* two reasons, i.e. two that I've encountered already:
>
> 1.  Storing arrays of indices into other arrays with size_t's is incredibly wasteful unless there's a realistic chance that one of the arrays you're indexing could be billions of elements long.
>
> 2.  Some libraries written in other languages that interface with D (GTK, BLAS and LAPACK come to mind) always assume, even on 64, that int is enough for the length of an array.  The ugliness and tediousness of putting casts everywhere to accommodate this is slowly breeding insanity in me.
>
> --Dave
>
>
>
> _______________________________________________
> phobos mailing list
> phobos at puremagic.com
> http://lists.puremagic.com/mailman/listinfo/phobos
February 17, 2011
Have you actually tried porting any application code to 64?  Phobos and other similarly generic libraries don't count because code that's that generic legitimately can't assume that no arrays are going to be billions of elements long.

On 2/17/2011 6:00 PM, Andrei Alexandrescu wrote:
> I don't see a problem with using size_t throughout. The machine will be as good or better at operating on size_t (compared to int).
>
> The size argument has been historically made about double and float. Today people use double throughout and float on occasion when they want to optimize storage.
>
> @Don: Making it signed adds insult to injury. It means that we're inconsistent about the way we handle nonnegative entities in the language and its library.
>
>
> Andrei
>
> On 2/17/11 4:46 PM, David Simcha wrote:
>> On Thu, Feb 17, 2011 at 5:28 PM, Andrei Alexandrescu <andrei at erdani.com <mailto:andrei at erdani.com>> wrote:
>>
>>     I assume I'm in the minority here, but I don't see a need for such a
>>     function.
>>
>>     Andrei
>>
>>
>> The point you're missing is that arrays are a very commonly used thing and give you no choice about integer widths.  Most of the time, if you don't need the extra width (most values in any program represent quantities that can't plausibly be bigger than the maximum value of some fixed-width integer) and want to avoid either the viral effects of using wide integers or the need to insert casts all over the place, you just use a narrower integer.  For the vast majority of quantities people deal with, 32-bit ints are more than enough, so they tend to be what's used most in practice.
>>
>> With arrays, you don't have that choice, even though the *vast* majority of the time int is plenty and you tend to know when it isn't. Therefore, you get stuck either dealing with the viralness of array.length being a ulong on 64 or with the ugliness of manually inserting casts everywhere.  You gain virtually nothing in safety in exchange because arrays are almost never (and when I say almost never I mean I've never seen one in my life) over 2 billion elements long.
>>
>> The point is that by adding this function you lose epsilon in safety, where epsilon is some tiny but nonzero value, and gain a whole bunch in usability.  Realistically, using size_t everywhere is not a good idea (though it admittedly is a good idea in a large portion of cases) for *at least* two reasons, i.e. two that I've encountered already:
>>
>> 1.  Storing arrays of indices into other arrays with size_t's is incredibly wasteful unless there's a realistic chance that one of the arrays you're indexing could be billions of elements long.
>>
>> 2.  Some libraries written in other languages that interface with D (GTK, BLAS and LAPACK come to mind) always assume, even on 64, that int is enough for the length of an array.  The ugliness and tediousness of putting casts everywhere to accommodate this is slowly breeding insanity in me.
>>
>> --Dave
>>
>>
>>
>> _______________________________________________
>> phobos mailing list
>> phobos at puremagic.com
>> http://lists.puremagic.com/mailman/listinfo/phobos
> _______________________________________________
> phobos mailing list
> phobos at puremagic.com
> http://lists.puremagic.com/mailman/listinfo/phobos
>

February 17, 2011
I don't know, but having both length and ilength seems like it could be confusing. I mean, how can you answer simply to a question about which should be used?

It seems like there's two criteria to consider when answering this question: whether you prefer a 32-bit fixed value (useful storing a lot of them, not useful for local variables) and whether you want to use a signed value (useful to avoid some issues when substracting indices). What if you only need one of these two things?

Le 2011-02-17 ? 17:28, Andrei Alexandrescu a ?crit :

> I assume I'm in the minority here, but I don't see a need for such a function.
> 
> Andrei
> 
> On 2/17/11 1:40 PM, David Simcha wrote:
>> Fair points.  You've convinced me (since I didn't have a very strong opinion before).  Let's go with int.  I've also come to believe that ilength is the best name because it's negligible extra typing compared to .length, so people will actually use it.  Proposed function for inclusion in object:
>> 
>> /**
>> Returns the length of an array as a 32-bit signed integer.  Verifies
>> with an assert that arr.length <= int.max unless this can be proven
>> at compile time.  This is a useful shortcut for getting the
>> length of a arrays that cannot plausibly have length longer than
>> int.max (about 2 billion) as a 32-bit integer even if building
>> for a 64-bit target.  It's also useful for converting an array length
>> to a signed integer while verifying the safety of this
>> conversion if asserts are enabled.
>> */
>> @property int ilength(T)(const T[] arr) pure nothrow @safe {
>>     static if(size_t.sizeof > uint.sizeof || T.sizeof == 1) {
>>         assert(arr.length <= int.max,
>> "Cannot get integer length of array with >int.max elements."
>>         );
>>     }
>> 
>>     return cast(int) arr.length;
>> }
>> 
>> 
>> On Thu, Feb 17, 2011 at 2:24 PM, Don Clugston <dclugston at googlemail.com <mailto:dclugston at googlemail.com>> wrote:
>> 
>>    On 17 February 2011 17:10, David Simcha <dsimcha at gmail.com
>>    <mailto:dsimcha at gmail.com>> wrote:
>>     > Can you elaborate on why?  Unsigned seems like the perfect type
>>    for an array
>>     > length.
>> 
>>    An array length is a positive integer, you want to treat it as an
>>    integer, not as a bag of bits.
>>    Using an unsigned types is like using a cast, to be avoided whenever
>>    possible. They're a breeding ground for bugs,
>>    and a disturbingly large fraction of programmers don't understand them.
>> 
>> 
>> 
>>    On 17 February 2011 18:02, David Simcha <dsimcha at gmail.com
>>    <mailto:dsimcha at gmail.com>> wrote:
>>     > My main gripe with going with int is that it eliminates the
>>    possibility of
>>     > making ilength() a noop that just returns .length on 32.  The
>>    assert would
>>     > still be necessary.
>> 
>>    But the assert adds value.
>>    Note that an assert is only required on arrays of size 1 --  ubyte,
>>    byte, char.
>>    On everything else, it's still a no-op.
>> 
>> 
>>     > On Thu, Feb 17, 2011 at 9:48 AM, Don Clugston
>>    <dclugston at googlemail.com <mailto:dclugston at googlemail.com>>
>>     > wrote:
>>     >>
>>     >> On 17 February 2011 14:59, David Simcha <dsimcha at gmail.com
>>    <mailto:dsimcha at gmail.com>> wrote:
>>     >> > Hey guys,
>>     >> >
>>     >> > Kagamin just came up with a simple but great idea to mitigate the
>>     >> > pedantic
>>     >> > nature of 64-bit to 32-bit integer conversions in cases where
>>    using
>>     >> > size_t
>>     >> > doesn't cut it.  Examples are storing arrays of indices into other
>>     >> > arrays,
>>     >> > where using size_t would be a colossal waste of space if it's
>>    safe to
>>     >> > assume
>>     >> > none of the arrays will be billions of elements long.
>> 
>> 
>>     >> >  int or uint?  I used int only b/c that was the example on the
>>     >> > newsgroup,
>>     >> > but I think uint makes more sense.
>>     >>
>>     >> I *strongly* oppose uint. We should take every possible
>>    opportunity to
>>     >> reduce usage of unsigned numbers.
>>     >> 'i' implies immutable.
>>     >> How about intlength  (or intLength ?)
>>    _______________________________________________
>>    phobos mailing list
>>    phobos at puremagic.com <mailto:phobos at puremagic.com>
>>    http://lists.puremagic.com/mailman/listinfo/phobos
>> 
>> 
>> 
>> 
>> _______________________________________________
>> phobos mailing list
>> phobos at puremagic.com
>> http://lists.puremagic.com/mailman/listinfo/phobos
> _______________________________________________
> phobos mailing list
> phobos at puremagic.com
> http://lists.puremagic.com/mailman/listinfo/phobos

-- 
Michel Fortin
michel.fortin at michelf.com
http://michelf.com/



February 17, 2011
On 2/17/11 5:10 PM, David Simcha wrote:
> Have you actually tried porting any application code to 64? Phobos and other similarly generic libraries don't count because code that's that generic legitimately can't assume that no arrays are going to be billions of elements long.

Code that uses the unrecommended practice of mixing int and uint with size_t everywhere will be indeed difficult to port to 64 bits. But that's a problem with the code, and giving that unrecommended practice legitimacy by making it look good is aiming at the wrong target.

Use size_t for sizes and it's golden. You can't go wrong. On the rare occasions when you want to store arrays of indexes, do the cast by hand, don't ask the standard library to give it a nice face by making the assumption for you.


Andrei
February 17, 2011
An HTML attachment was scrubbed...
URL: <http://lists.puremagic.com/pipermail/phobos/attachments/20110217/fb85f114/attachment.html>
February 17, 2011
On 2/17/11 5:58 PM, David Simcha wrote:
> Such a statement is technically correct but amazingly pedantic in a world where there is legacy code in other languages, crufty old D code from before there was a 64-bit compiler to test it on, and the need to optimize storage sometimes, but not too many billion element arrays. If anyone has *ever in their entire life* worked with an array with over 2 billion elements (which would demonstrate that the unsafeness isn't purely theoretical), please speak up.

I do work with arrays over 2 billion elements. But my point is simpler: I don't need to prove anything - you need to prove that working with size_t aggravates you, and very frequently.

Andrei
February 17, 2011
On Thursday, February 17, 2011 15:31:01 Andrei Alexandrescu wrote:
> On 2/17/11 5:10 PM, David Simcha wrote:
> > Have you actually tried porting any application code to 64? Phobos and other similarly generic libraries don't count because code that's that generic legitimately can't assume that no arrays are going to be billions of elements long.
> 
> Code that uses the unrecommended practice of mixing int and uint with size_t everywhere will be indeed difficult to port to 64 bits. But that's a problem with the code, and giving that unrecommended practice legitimacy by making it look good is aiming at the wrong target.
> 
> Use size_t for sizes and it's golden. You can't go wrong. On the rare occasions when you want to store arrays of indexes, do the cast by hand, don't ask the standard library to give it a nice face by making the assumption for you.

I concur. If you use auto and/or size_t, it's rarely a problem that size_t changes its size based on the architecture. Using int or uint for indices is generally wrong. If you really need it to be an int or uint, then a cast should be used. And if you're really running into this problem all over the place, because you frequently save array indices (which I would posit is _not_ something which is normally done very often, let alone in a manner that it would be a problem just to save them as size_t), then you can create ilength in your own code. But I really think that not only would ilength promote poor practices, but it would generally lead to less maintainable code, because it would be _really_ easy to mix up ilength and length.

And honestly, I would say the fact that someone is running into errors with array indices in 64-bit land, because they weren't using size_t is just showing that the code was faulty to begin with. They should have been using size_t. At least the compiler will now point it out to them.

- Jonathan M Davis
February 17, 2011
An HTML attachment was scrubbed...
URL: <http://lists.puremagic.com/pipermail/phobos/attachments/20110217/24b26909/attachment.html>