November 25, 2008
"bearophile" <bearophileHUGS@lycos.com> wrote in message news:gghsa1$2u0c$1@digitalmars.com...
> Andrei Alexandrescu:
>> The problem is, it's much more annoying than one might imagine. Even array.length - 1 is up for scrutiny. Technically, even array.length + 1 is a problem because 1 is really a signed int. We could provide exceptions for constants, but exceptions are generally not solving the core issue.
>
> That can be solved making array.length signed.
> Can you list few other annoying situations?
>

I disagree. If you start using that as a solution, then you may as well eliminate unsigned values entirely.

I think the root problem with disallowing mixed-sign operations is that math just doesn't work that way. What I mean by that is, disallowing mixed-sign operations implies that we have these nice cleanly separated worlds of "signed math" and "unsigned math". But depending on the operator, the signs/ordering of the operands, and what the operands actually represent, math has tendancy to switch back and forth between the signed ("can be negative") and unsigned ("can't be negative") worlds. So if we have a type system that forces us to jump through hoops every time that world-switch happens, and we then decide that it's justifiable to say "well, let's fix it for array.length by tossing that over to the 'can be negative' world, even though it cuts our range of allowable values in half", then there's nothing stopping us from solving the rest of the cases by throwing them over the "can be negative" wall as well. All of a sudden, we have no unsigned.

Just a thought: Maybe some sort of built-in "units" system could help here? Instead of just making array.length a "signed" or "unsigned" and leavng it as that, add a "units system" and tag array.length as being a length, with length tags carring the connotation that negative is disallowed. Adding/subtracting a pure constant to a length would cause the constant to be automaticlly tagged as a "length delta" (which can be negative). And the units system would, of course, contain the rule that a length delta added/subtracted from a length results in a length. The units system could then translate all of that into "signed vs unsigned".


November 25, 2008
On Tue, 25 Nov 2008 16:56:17 -0500, bearophile wrote:
> Andrei Alexandrescu:
>> The problem is, it's much more annoying than one might imagine. Even array.length - 1 is up for scrutiny. Technically, even array.length + 1 is a problem because 1 is really a signed int. We could provide exceptions for constants, but exceptions are generally not solving the core issue.
> 
> That can be solved making array.length signed.

Is that conceptually clean/clear? (If so, I'd like to request an array of length -1.)

I like Andrei's proposal because it keeps clarity in such cases: sizes are non-negative quantities. Once you start subtracting ints, it's possibly not a size anymore, in such cases you want the user to decide explicitly.

-- Daniel
November 25, 2008
"bearophile" <bearophileHUGS@lycos.com> wrote in message news:gghc97$1mfo$1@digitalmars.com...
> Steven Schveighoffer:
>> lol!!!
>
> I know, I know... :-) But when people do errors so often, the error is elsewhere, in the original choice of that word to denote how many items an iterable has.
>
> In my libs I have defined len() like this, that I use now and then (where running speed isn't essential):
>
> long len(TyItems)(TyItems items) {
>    static if (HasLength!(TyItems))
>        return items.length;
>    else {
>        long len;
>        // this generates: foreach (p1, p2, p3; items) len++;  with a
> variable number of p1, p2...
>        mixin("foreach (" ~ SeriesGen1!("p", ", ", OpApplyCount!(TyItems),
> 1) ~ "; items) len++;");
>        return len;
>    }
> } // End of len(items)
>
> /// ditto
> long len(TyItems, TyFun)(TyItems items, TyFun pred) {
>    static assert(IsCallable!(TyFun), "len(): predicate must be a
> callable");
>    long len;
>
>    static if (IsAA!(TyItems)) {
>        foreach (key, val; items)
>            if (pred(key, val))
>                len++;
>    } else static if (is(typeof(TyItems.opApply))) {
>        mixin("foreach (" ~ SeriesGen1!("p", ", ", OpApplyCount!(TyItems),
> 1) ~ "; items)
>            if (pred(" ~ SeriesGen1!("p", ", ", OpApplyCount!(TyItems), 1)
> ~ "))
>                len++;");
>    } else {
>        foreach (el; items)
>            if (pred(el))
>                len++;
>    }
>
>    return len;
> } // End of len(items, pred)
>
> alias len!(string) strLen; /// ditto
> alias len!(int[]) intLen; /// ditto
> alias len!(float[]) floatLen; /// ditto
>
> Having a global callable like len() instead of an attribute is (sometimes) better, because you can use it for example like this (this is working syntax of my dlibs):
>
> children.sort(&len!(string));
> That sorts the array of strings "children" according to the given callable
> key, that is the len of the strings.
>

If we ever get extension methods, then maybe something along these lines would be nice:

extension typeof(T.length) len(T t)
{
    return T.length;
}


November 26, 2008
Sean Kelly wrote:
> == Quote from Andrei Alexandrescu (SeeWebsiteForEmail@erdani.org)'s article
>> (You may want to check your system's date, unless of course you traveled
>> in time.)
>> Russell Lewis wrote:
>>> I'm of the opinion that we should make mixed-sign operations a
>>> compile-time error.  I know that it would be annoying in some
>>> situations, but IMHO it gives you clearer, more reliable code.
>> The problem is, it's much more annoying than one might imagine. Even
>> array.length - 1 is up for scrutiny. Technically, even array.length + 1
>> is a problem because 1 is really a signed int. We could provide
>> exceptions for constants, but exceptions are generally not solving the
>> core issue.
> 
> Perhaps not, but the fact that constants are signed integers has been
> mentioned as a problem before.  Would making these polysemous
> values help at all?  That seems to be what your proposal is effectively
> trying to do anyway.

Well with constants we can do many tricks; I mentioned an extreme example. Polysemy does indeed help but my latest design (described in the post starting this thread) gets away with simple subtyping. I like polysemy (the name is really cool :o)) but I don't want to be concept-heavy: if a classic technique words, I'd use that and save polysemy for a tougher task that cannot be comfortably tackled with existing means.

Andrei
November 26, 2008
Nick Sabalausky wrote:
> "bearophile" <bearophileHUGS@lycos.com> wrote in message news:gghc97$1mfo$1@digitalmars.com...
>> Steven Schveighoffer:
>>> lol!!!
>> I know, I know... :-) But when people do errors so often, the error is elsewhere, in the original choice of that word to denote how many items an iterable has.
>>
>> In my libs I have defined len() like this, that I use now and then (where running speed isn't essential):
>>
>> long len(TyItems)(TyItems items) {
>>    static if (HasLength!(TyItems))
>>        return items.length;
>>    else {
>>        long len;
>>        // this generates: foreach (p1, p2, p3; items) len++;  with a variable number of p1, p2...
>>        mixin("foreach (" ~ SeriesGen1!("p", ", ", OpApplyCount!(TyItems), 1) ~ "; items) len++;");
>>        return len;
>>    }
>> } // End of len(items)
>>
>> /// ditto
>> long len(TyItems, TyFun)(TyItems items, TyFun pred) {
>>    static assert(IsCallable!(TyFun), "len(): predicate must be a callable");
>>    long len;
>>
>>    static if (IsAA!(TyItems)) {
>>        foreach (key, val; items)
>>            if (pred(key, val))
>>                len++;
>>    } else static if (is(typeof(TyItems.opApply))) {
>>        mixin("foreach (" ~ SeriesGen1!("p", ", ", OpApplyCount!(TyItems), 1) ~ "; items)
>>            if (pred(" ~ SeriesGen1!("p", ", ", OpApplyCount!(TyItems), 1) ~ "))
>>                len++;");
>>    } else {
>>        foreach (el; items)
>>            if (pred(el))
>>                len++;
>>    }
>>
>>    return len;
>> } // End of len(items, pred)
>>
>> alias len!(string) strLen; /// ditto
>> alias len!(int[]) intLen; /// ditto
>> alias len!(float[]) floatLen; /// ditto
>>
>> Having a global callable like len() instead of an attribute is (sometimes) better, because you can use it for example like this (this is working syntax of my dlibs):
>>
>> children.sort(&len!(string));
>> That sorts the array of strings "children" according to the given callable key, that is the len of the strings.
>>
> 
> If we ever get extension methods, then maybe something along these lines would be nice:
> 
> extension typeof(T.length) len(T t)
> {
>     return T.length;
> }
> 
> 

Already works:

uint len(A) (in A x) { return x.length; }
November 26, 2008
Andrei Alexandrescu wrote:
> D pursues compatibility with C and C++ in the following manner: if a code snippet compiles in both C and D or C++ and D, then it should have the same semantics.
> 
> A classic problem with C and C++ integer arithmetic is that any operation involving at least an unsigned integral receives automatically an unsigned type, regardless of how silly that actually is, semantically. About the only advantage of this rule is that it's simple. IMHO it only has disadvantages from then on.
> 
> The following operations suffer from the "abusive unsigned syndrome" (u is an unsigned integral, i is a signed integral):
> 
> (1) u + i, i + u
> (2) u - i, i - u
> (3) u - u
> (4) u * i, i * u, u / i, i / u, u % i, i % u (compatibility with C requires that these all return unsigned, ouch)
> (5) u < i, i < u, u <= i etc. (all ordering comparisons)
> (6) -u

I think that most of these problems are caused by C enforcing a foolish consitency between literals and variables.
The idea that literals like '0' and '1' are of type int is absurd, and has caused a torrent of problems. '0' is just '0'.

uint a = 1;
does NOT contain an 'implicit conversion from int to uint', any more than there are implicit conversions from naturals to integers in mathematics. So I really like the polysemous types idea.

For example, when is it reasonable to use -u?
It's useful with literals like
uint a = -1u; which is equivalent to uint a = 0xFFFF_FFFF.
Anywhere else, it's probably a bug.

My suspicion is, that if you allowed all signed-unsigned operations when at least one was a literal, and made everything else illegal, you'd fix most of the problems. In particular, there'd be a big reduction in people abusing 'uint' as a primitive range-limited int.

Although it would be nice to have a type which was range-limited, 'uint' doesn't do it. Instead, it guarantees the number is between 0 and int.max*2+1 inclusive. Allowing mixed operations encourages programmers to focus the benefit of 'the lower bound is zero!' while forgetting that there is an enormous downside ('I'm saying that this could be larger than int.max!')

Interestingly, none of these problems exist in assembly language programming, where every arithmetic instruction affects the overflow flag (for signed operations) as well as the carry flag (for unsigned).
November 26, 2008
Nick Sabalausky Wrote:

> happens, and we then decide that it's justifiable to say "well, let's fix it for array.length by tossing that over to the 'can be negative' world, even though it cuts our range of allowable values in half", then there's nothing stopping us from solving the rest of the cases by throwing them over the "can be negative" wall as well. All of a sudden, we have no unsigned.

Well... cutting out range can be no problem, after all a thought was floating around that structs shouldn't be larger that a couple of kb, note that array of shorts with signed length spans entire 32-bit address space.
November 26, 2008
"KennyTM~" <kennytm@gmail.com> wrote in message news:ggipu6$26mr$1@digitalmars.com...
> Nick Sabalausky wrote:
>> "bearophile" <bearophileHUGS@lycos.com> wrote in message news:gghc97$1mfo$1@digitalmars.com...
>>> Steven Schveighoffer:
>>>> lol!!!
>>> I know, I know... :-) But when people do errors so often, the error is elsewhere, in the original choice of that word to denote how many items an iterable has.
>>>
>>> In my libs I have defined len() like this, that I use now and then (where running speed isn't essential):
>>>
>>> long len(TyItems)(TyItems items) {
>>>    static if (HasLength!(TyItems))
>>>        return items.length;
>>>    else {
>>>        long len;
>>>        // this generates: foreach (p1, p2, p3; items) len++;  with a
>>> variable number of p1, p2...
>>>        mixin("foreach (" ~ SeriesGen1!("p", ", ",
>>> OpApplyCount!(TyItems), 1) ~ "; items) len++;");
>>>        return len;
>>>    }
>>> } // End of len(items)
>>>
>>> /// ditto
>>> long len(TyItems, TyFun)(TyItems items, TyFun pred) {
>>>    static assert(IsCallable!(TyFun), "len(): predicate must be a
>>> callable");
>>>    long len;
>>>
>>>    static if (IsAA!(TyItems)) {
>>>        foreach (key, val; items)
>>>            if (pred(key, val))
>>>                len++;
>>>    } else static if (is(typeof(TyItems.opApply))) {
>>>        mixin("foreach (" ~ SeriesGen1!("p", ", ",
>>> OpApplyCount!(TyItems), 1) ~ "; items)
>>>            if (pred(" ~ SeriesGen1!("p", ", ", OpApplyCount!(TyItems),
>>> 1) ~ "))
>>>                len++;");
>>>    } else {
>>>        foreach (el; items)
>>>            if (pred(el))
>>>                len++;
>>>    }
>>>
>>>    return len;
>>> } // End of len(items, pred)
>>>
>>> alias len!(string) strLen; /// ditto
>>> alias len!(int[]) intLen; /// ditto
>>> alias len!(float[]) floatLen; /// ditto
>>>
>>> Having a global callable like len() instead of an attribute is (sometimes) better, because you can use it for example like this (this is working syntax of my dlibs):
>>>
>>> children.sort(&len!(string));
>>> That sorts the array of strings "children" according to the given
>>> callable key, that is the len of the strings.
>>>
>>
>> If we ever get extension methods, then maybe something along these lines would be nice:
>>
>> extension typeof(T.length) len(T t)
>> {
>>     return T.length;
>> }
>>
>>
>
> Already works:
>
> uint len(A) (in A x) { return x.length; }

Oh, right. For some stupid reason I was forgetting that the param would always be an array and therefore be eligible for the existing array property syntax (and that .length always returns a uint).


November 26, 2008
Nick Sabalausky:
> Oh, right. For some stupid reason I was forgetting that the param would always be an array and therefore be eligible for the existing array property syntax (and that .length always returns a uint).

From the len() code I have posted you can see there are other places where you want to use len(), in particular to count the number of items that a lazy generator (opApply for now) yields.

Bye,
bearophile
November 26, 2008
bearophile wrote:
> Nick Sabalausky:
>> Oh, right. For some stupid reason I was forgetting that the param would always be an array and therefore be eligible for the existing array property syntax (and that .length always returns a uint).
> 
> From the len() code I have posted you can see there are other places where you want to use len(), in particular to count the number of items that a lazy generator (opApply for now) yields.
> 
> Bye,
> bearophile

I'm rather weary of a short and suggestive name that embodies a linear operation. I recall there was a discussion about that a while ago in this newsgroup. I'd rather call it linearLength or something that suggests it's a best-effort function that may take O(n).

Andrei