February 04, 2022
On 2/4/2022 6:01 AM, Paul Backus wrote:
>> Trying to hide the reality of how computer integer arithmetic works, and how integral promotions work, is a prescription for endless frustration and inevitable failure.
> 2s-complement is "the reality of how computer integer arithmetic works," but there is nothing fundamental or necessary about C's integer promotion rules, and plenty of system-level languages get by without them.

The integral promotion rules came about because of how the PDP-11 instruction set worked, as C was developed on an -11. But this has carried over into modern CPUs. Consider:

void tests(short* a, short* b, short* c) { *c = *a * *b; }
        0F B7 07                movzx   EAX,word ptr [RDI]
66      0F AF 06                imul    AX,[RSI]
66      89 02                   mov     [RDX],AX
        C3                      ret

void testi(int* a, int* b, int* c) { *c = *a * *b; }
        8B 07                   mov     EAX,[RDI]
        0F AF 06                imul    EAX,[RSI]
        89 02                   mov     [RDX],EAX
        C3                      ret

You're paying a 3 size byte penalty for using short arithmetic rather than int arithmetic. It's slower, too.

Generally speaking, int should be used for most calculations, short and byte for storage.

(Modern CPUs have long been deliberately optimized and tuned for C semantics.)
February 04, 2022
On 2/4/2022 12:31 PM, H. S. Teoh wrote:
> PDP-11 instructions no longer resemble how modern machines work, though.
> What made sense back then may not necessarily make sense anymore today.

The PDP-11, no, but the modern machines *definitely* hew to how C works, see my other post in this thread.

February 04, 2022
On Fri, Feb 04, 2022 at 08:50:35PM +0000, Mark via Digitalmars-d wrote:
> On Friday, 4 February 2022 at 04:28:37 UTC, Walter Bright wrote:
> > There's really no fix for that other than making the effort to understand 2s-complement. Some noble attempts:
> > 
> > Java: disallowed all unsigned types. Wound up having to add that back in as a hack.
> 
> How many people actually use (and need) unsigned integers?

I do. They are very useful in APIs where I expect only positive values. Marking the parameter type as uint makes it clear exactly what's expected, instead of using circumlocutions like taking int with an in-contract that x>=0.  Also, when you're dealing with bitmasks, you WANT unsigned types. Using signed types for that will cause values to get munged by unwanted sign extensions, and in general just cause grief and needless complexity where an unsigned type would be completely straightforward.

Also, for a systems programming language unsigned types are necessary, because they are a closer reflection of the reality at the hardware level.


> If 99% of users don't need them, that's a good case for relegating them to a library type.  This wasn't possible in Java because it doesn't support operator overloading, without which dealing with such types would have been quite annoying.

Needing a library type for manipulating bitmasks would make D an utter joke of a systems programming language.


T

-- 
First Rule of History: History doesn't repeat itself -- historians merely repeat each other.
February 04, 2022
On Friday, 4 February 2022 at 21:13:10 UTC, Walter Bright wrote:
> The integral promotion rules came about because of how the PDP-11 instruction set worked, as C was developed on an -11. But this has carried over into modern CPUs. Consider:
>
[...]
>
> You're paying a 3 size byte penalty for using short arithmetic rather than int arithmetic. It's slower, too.
>
> Generally speaking, int should be used for most calculations, short and byte for storage.

Sure. That's a reason why I, the programmer, might want to use int instead of short or byte in my code. But if, for whatever reason, I've chosen to use short or byte in spite of the performance penalties, I would rather not have the language second-guess me on that choice.
February 04, 2022

On 2/4/22 4:54 PM, Paul Backus wrote:

>

On Friday, 4 February 2022 at 21:13:10 UTC, Walter Bright wrote:

>

The integral promotion rules came about because of how the PDP-11 instruction set worked, as C was developed on an -11. But this has carried over into modern CPUs. Consider:

[...]

>

You're paying a 3 size byte penalty for using short arithmetic rather than int arithmetic. It's slower, too.

Generally speaking, int should be used for most calculations, short and byte for storage.

Sure. That's a reason why I, the programmer, might want to use int instead of short or byte in my code. But if, for whatever reason, I've chosen to use short or byte in spite of the performance penalties, I would rather not have the language second-guess me on that choice.

Yeah, the user doesn't care how the compiler does the instructions. They care about the outcome. If they want to assign it back to a byte, they probably don't care about losing the extra precision. Otherwise, they would assign to an int.

I don't think anyone is arguing that the result of the operation should be truncated to a byte, even if assigned to an int.

-Steve

February 04, 2022
On Friday, 4 February 2022 at 21:13:10 UTC, Walter Bright wrote:
> It's slower, too.

Not anymore.  And div can be faster on smaller integers.


> You're paying a 3 size byte penalty for using short arithmetic rather than int arithmetic.

1. You are very careful to demonstrate short arithmetic, not byte arithmetic, which is the same size as int arithmetic on x86.

2. Cycle-counting (or byte-counting) is not a sensible approach to language design.  It is relevant to language implementation, maybe; and whole-program performance may be relevant to language design; but these sorts of changes are marginal and should not get in the way of correct semantics.

3. Your code example actually does exactly what you suggest--using short arithmetic for storage.  It just happens that in this case using short calculations rather than int calculations yields the same result and smaller code.

4. (continued from 3) in a larger, more interesting expression, regardless of language semantics, the compiler will generally be free to use ints for intermediates.
February 04, 2022

On Friday, 4 February 2022 at 21:13:10 UTC, Walter Bright wrote:

>

The integral promotion rules came about because of how the PDP-11 instruction set worked, as C was developed on an -11. But this has carried over into modern CPUs. Consider:

void tests(short* a, short* b, short* c) { *c = *a * *b; }
        0F B7 07                movzx   EAX,word ptr [RDI]
66      0F AF 06                imul    AX,[RSI]
66      89 02                   mov     [RDX],AX
        C3                      ret

void testi(int* a, int* b, int* c) { *c = *a * *b; }
        8B 07                   mov     EAX,[RDI]
        0F AF 06                imul    EAX,[RSI]
        89 02                   mov     [RDX],EAX
        C3                      ret

You're paying a 3 size byte penalty for using short arithmetic rather than int arithmetic. It's slower, too.

Larger code size is surely more stressful for the instructions cache, but the slowdown caused by this is most likely barely measurable on modern processors.

>

Generally speaking, int should be used for most calculations, short and byte for storage.

(Modern CPUs have long been deliberately optimized and tuned for C semantics.)

I generally agree, but this is only valid for the regular scalar code. Autovectorizable code taking advantage of SIMD instructions looks a bit different. Consider:

void tests(short* a, short* b, short* c, int n) { while (n--) *c++ = *a++ * *b++; }
  <...>
  50:       f3 0f 6f 04 07       	movdqu (%rdi,%rax,1),%xmm0
  55:       f3 0f 6f 0c 06       	movdqu (%rsi,%rax,1),%xmm1
  5a:       66 0f d5 c1          	pmullw %xmm1,%xmm0
  5e:       0f 11 04 02          	movups %xmm0,(%rdx,%rax,1)
  62:       48 83 c0 10          	add    $0x10,%rax
  66:       4c 39 c0             	cmp    %r8,%rax
  69:       75 e5                	jne    50 <tests+0x50>
  <...>

7 instructions, which are doing 8 multiplications per inner loop iteration.

void testi(int* a, int* b, int* c, int n) { while (n--) *c++ = *a++ * *b++; }
 <...>
 188:       f3 0f 6f 04 07       	movdqu (%rdi,%rax,1),%xmm0
 18d:       f3 0f 6f 0c 06       	movdqu (%rsi,%rax,1),%xmm1
 192:       66 0f 38 40 c1       	pmulld %xmm1,%xmm0
 197:       0f 11 04 02          	movups %xmm0,(%rdx,%rax,1)
 19b:       48 83 c0 10          	add    $0x10,%rax
 19f:       4c 39 c0             	cmp    %r8,%rax
 1a2:       75 e4                	jne    188 <testi+0x48>
 <...>

7 instructions, which are doing 4 multiplications per inner loop iteration.

The code size increases really a lot, because there are large prologue and epilogue parts before and after the inner loop. But the performance improves really a lot when processing large arrays. And the 16-bit version is roughly twice faster than the 32-bit version (because each 128-bit XMM register represents either 8 shorts or 4 ints).

If we want D language to be SIMD friendly, then discouraging the use of short and byte types for local variables isn't the best idea.

February 04, 2022

On Friday, 4 February 2022 at 22:11:10 UTC, Steven Schveighoffer wrote:

>

I don't think anyone is arguing that the result of the operation should be truncated to a byte, even if assigned to an int.

And yet, that's exactly what happens if you use int and long:

int a = int.max;
long b = a + 1;
writeln(b > 0); // false

I think there are reasonable arguments to be made on both sides, but having both behaviors in the same language is a bit of a mess, don't you think?

February 04, 2022

On 2/4/22 5:22 PM, Paul Backus wrote:

>

On Friday, 4 February 2022 at 22:11:10 UTC, Steven Schveighoffer wrote:

>

I don't think anyone is arguing that the result of the operation should be truncated to a byte, even if assigned to an int.

And yet, that's exactly what happens if you use int and long:

    int a = int.max;
    long b = a + 1;
    writeln(b > 0); // false

I think there are reasonable arguments to be made on both sides, but having both behaviors in the same language is a bit of a mess, don't you think?

Yes, I would prefer for the compiler to determine if the overflow is needed, and generate appropriate instructions based on that.

The reason this doesn't come up often for int -> long is because a) you aren't usually converting an int-only operation to a long, and b) overflow of an int is rare.

-Steve

February 04, 2022
On 2/4/2022 6:12 AM, Adam D Ruppe wrote:
> I'd allow that. The input is a and b, they're both short, so let the output truncate back to short implicitly too. Just like with int, there's some understanding that yes, there is a high word produced by the multiply, but it might not fit and I don't need the compiler nagging me like I'm some kind of ignoramus.

As I observed before, there is no solution. Just different problems. It's best to stick with a scheme that has well-understood issues and works best with the common CPU architectures.