August 13, 2020
On Thursday, 13 August 2020 at 18:40:40 UTC, matheus wrote:
> On Thursday, 13 August 2020 at 13:33:19 UTC, bachmeier wrote:
>> ...
>> The source of wrong behavior is vec.length having type ulong. It would be very unusual for someone to even think about that.
>
> May I ask what type should it be?

Signed (size_t, the length of the machine's address space)

Just as in Java (as an improvement of C++):

https://stackoverflow.com/questions/211311/what-is-the-data-type-for-length-property-for-java-arrays

It is an int. See the Java Language Specification, section 10.7.


Initially, Java don't have unsigned integer type; it's added as late as Java 8, and when it's added, they also added the extra methods to properly handle them:

https://docs.oracle.com/javase/tutorial/java/nutsandbolts/datatypes.html

"""
int: By default, the int data type is a 32-bit signed two's complement integer, which has a minimum value of -231 and a maximum value of 231-1. In Java SE 8 and later, you can use the int data type to represent an unsigned 32-bit integer, which has a minimum value of 0 and a maximum value of 232-1. Use the Integer class to use int data type as an unsigned integer. See the section The Number Classes for more information. Static methods like compareUnsigned, divideUnsigned etc have been added to the Integer class to support the arithmetic operations for unsigned integers.
"""

In contrast, D does the potential harmful conversion *silently*.

August 13, 2020
On Thursday, 13 August 2020 at 18:51:09 UTC, mw wrote:
> [snip]
>
> It is an int. See the Java Language Specification, section 10.7.
>
>
> Initially, Java don't have unsigned integer type; it's added as late as Java 8, and when it's added, they also added the extra methods to properly handle them:
>
> https://docs.oracle.com/javase/tutorial/java/nutsandbolts/datatypes.html
> [snip]

In other words, it was not added until 2014, and even then done in a backwards compatible way that doesn't let you actually declare unsigned ints, just to call some methods on them assuming they are unsigned.
August 13, 2020
On Thu, Aug 13, 2020 at 06:51:09PM +0000, mw via Digitalmars-d wrote:
> On Thursday, 13 August 2020 at 18:40:40 UTC, matheus wrote:
> > On Thursday, 13 August 2020 at 13:33:19 UTC, bachmeier wrote:
> > > ...
> > > The source of wrong behavior is vec.length having type ulong. It
> > > would be very unusual for someone to even think about that.
> > 
> > May I ask what type should it be?
> 
> Signed (size_t, the length of the machine's address space)
[...]

size_t is unsigned, because the address space of a 64-bit machine is 2^64, but a signed value would only be able to address half of that space (2^63).


T

-- 
Bomb technician: If I'm running, try to keep up.
August 13, 2020
On Thursday, 13 August 2020 at 19:03:38 UTC, H. S. Teoh wrote:
> On Thu, Aug 13, 2020 at 06:51:09PM +0000, mw via Digitalmars-d wrote:
>> On Thursday, 13 August 2020 at 18:40:40 UTC, matheus wrote:
>> > On Thursday, 13 August 2020 at 13:33:19 UTC, bachmeier wrote:
>> > > ...
>> > > The source of wrong behavior is vec.length having type ulong. It
>> > > would be very unusual for someone to even think about that.
>> > 
>> > May I ask what type should it be?
>> 
>> Signed (size_t, the length of the machine's address space)
> [...]
>
> size_t is unsigned, because the address space of a 64-bit machine is 2^64, but a signed value would only be able to address half of that space (2^63).

Yes, I know that, that's why I put it in the brackets.

But for practical purpose: half that space is large/good enough, 2^63 = 9,223,372,036,854,775,808, you sure your machine have that much memory installed? (very roughly, 9G of GB?)

August 13, 2020
On Thursday, 13 August 2020 at 19:11:24 UTC, mw wrote:
> On Thursday, 13 August 2020 at 19:03:38 UTC, H. S. Teoh wrote:
>> On Thu, Aug 13, 2020 at 06:51:09PM +0000, mw via Digitalmars-d wrote:
>>> On Thursday, 13 August 2020 at 18:40:40 UTC, matheus wrote:
>>> > On Thursday, 13 August 2020 at 13:33:19 UTC, bachmeier wrote:
>>> > > ...
>>> > > The source of wrong behavior is vec.length having type ulong. It
>>> > > would be very unusual for someone to even think about that.
>>> > 
>>> > May I ask what type should it be?
>>> 
>>> Signed (size_t, the length of the machine's address space)
>> [...]
>>
>> size_t is unsigned, because the address space of a 64-bit machine is 2^64, but a signed value would only be able to address half of that space (2^63).
>
> Yes, I know that, that's why I put it in the brackets.
>
> But for practical purpose: half that space is large/good enough, 2^63 = 9,223,372,036,854,775,808, you sure your machine have that much memory installed? (very roughly, 9G of GB?)

One should always use unsigned whenever possible as it generates better code, many believe factor 2 is simply a shift, but not so on signed.

ssize_t fun_slow(ssize_t x)
{
    return x/2;
}

size_t fun_fast(size_t x)
{
    return x/2u;
}

fun_slow(long):                           # @fun_slow(long)
        mov     rax, rdi
        shr     rax, 63
        add     rax, rdi
        sar     rax
        ret
fun_fast(unsigned long):                           # @fun_fast(unsigned long)
        mov     rax, rdi
        shr     rax
        ret

August 13, 2020
On Thursday, 13 August 2020 at 19:24:11 UTC, Tove wrote:
> One should always use unsigned whenever possible as it generates better code, many believe factor 2 is simply a shift, but not so on signed.

I'm fine with that. In many area of the language design, we need to make a choice between:  correctness v.s. raw performance.

But at least we also need *explicit* visible warning message after we've made that choice:

-- especially warnings about *correctness* when the choice was made favoring performance
-- if the choice was made favoring correctness, user will notice the performance when the program runs.


Personally, I will favor correctness over performance in my program design decisions: make it correct first, and faster later; you never know before-hand where your program's bottleneck is.

I'm sure you know the famous quote:

"Premature optimization is the root of all evil!"

August 13, 2020
On Thursday, 13 August 2020 at 19:03:38 UTC, H. S. Teoh wrote:
> size_t is unsigned, because the address space of a 64-bit machine is 2^64, but a signed value would only be able to address half of that space (2^63).

The address bus on existing processors only uses like 48 bits and even there the lower three are reserved cuz of alignment.

But besides, even if you wanted it all, the signed negative value has the same bit pattern as the high bit set anyway so it isn't like the cpu would care, assuming it was actually mapped.

On 32 bit it makes a little more sense to say unsigned but even there the same bit pattern logic applies anyway.
August 13, 2020
On Thu, Aug 13, 2020 at 07:40:28PM +0000, mw via Digitalmars-d wrote:
> On Thursday, 13 August 2020 at 19:24:11 UTC, Tove wrote:
> > One should always use unsigned whenever possible as it generates better code, many believe factor 2 is simply a shift, but not so on signed.
> 
> I'm fine with that. In many area of the language design, we need to make a choice between:  correctness v.s. raw performance.
> 
> But at least we also need *explicit* visible warning message after we've made that choice:

I agree that the compiler should at least warn or prohibit implicit conversions between signed/unsigned. It has been the source of quite a number of frustrating bugs over the years -- frustrating mostly because implicit conversion yields unexpected results yet due to code breakage it's unlikely to ever change.

Unfortunately I don't see the situation changing anytime soon, unless somebody comes up with a *really* convincing argument that can win Walter over.  After the flop with the recent bool != int DIP, I've kinda given up hope that this area of D (int promotion rules, including implicit conversion) will ever improve.

I don't agree with making array length signed, though. The language should not whitewash the harsh reality of the underlying hardware, even if we make concessions in the way of warning the user of potentially unexpected/unwanted semantics, such as when there's implicit conversion between signed/unsigned values.


T

-- 
The irony is that Bill Gates claims to be making a stable operating system and Linus Torvalds claims to be trying to take over the world. -- Anonymous
August 13, 2020
On Thursday, 13 August 2020 at 19:03:38 UTC, H. S. Teoh wrote:
> On Thu, Aug 13, 2020 at 06:51:09PM +0000, mw via Digitalmars-d wrote:
>> On Thursday, 13 August 2020 at 18:40:40 UTC, matheus wrote:
>> > On Thursday, 13 August 2020 at 13:33:19 UTC, bachmeier wrote:
>> > > ...
>> > > The source of wrong behavior is vec.length having type ulong. It
>> > > would be very unusual for someone to even think about that.
>> > 
>> > May I ask what type should it be?
>> 
>> Signed (size_t, the length of the machine's address space)
> [...]
>
> size_t is unsigned, because the address space of a 64-bit machine is 2^64, but a signed value would only be able to address half of that space (2^63).
>
While the rationale makes sense and I'm definitely in the camp of unsigned size_t, signed addresses can without problem access the whole address range. The other half will then be addressed with negative numbers. The last address 0xFFFFFFFFFFFFFFFF is -1L (that's what was used on Apple II integer basic, which only had signed 16 bit integers as variable type, that's why entering monitor was done with CALL -151 and not CALL 65385).

August 13, 2020
On Thursday, 13 August 2020 at 07:22:18 UTC, mw wrote:
>
> void main() {
>   long   a = -5000;
>   size_t b = 2;
>   long   c = a / b;
>   writeln(c);
> }
>
>
> $ dmd divbug.d
> $ ./divbug
> 9223372036854773308
>

Feels correct to me !

When you have an unsigned and signed integer mixed with a binary operator, the operands are converted to unsigned.

This is how it works in C and C++ and we wouldn't be able to port C code to D if this were to be changed.