February 04, 2022
On Friday, 4 February 2022 at 23:25:14 UTC, Walter Bright wrote:
> On 2/4/2022 6:12 AM, Adam D Ruppe wrote:
>> I'd allow that. The input is a and b, they're both short, so let the output truncate back to short implicitly too. Just like with int, there's some understanding that yes, there is a high word produced by the multiply, but it might not fit and I don't need the compiler nagging me like I'm some kind of ignoramus.
>
> As I observed before, there is no solution. Just different problems. It's best to stick with a scheme that has well-understood issues and works best with the common CPU architectures.

I don't think you understand my proposal, which is closer to C's existing rules than D is now.
February 04, 2022
On Friday, 4 February 2022 at 23:25:14 UTC, Walter Bright wrote:
> As I observed before, there is no solution. Just different problems. It's best to stick with a scheme that has well-understood issues and works best with the common CPU architectures.

I think the anecdote regarding Gosling demonstrates that these issues are not well understood.
February 04, 2022
On 2/4/2022 2:15 PM, Elronnd wrote:
> On Friday, 4 February 2022 at 21:13:10 UTC, Walter Bright wrote:
>> It's slower, too.
> 
> Not anymore.  And div can be faster on smaller integers.

The code size penalty is still there.


>> You're paying a 3 size byte penalty for using short arithmetic rather than int arithmetic.
> 
> 1. You are very careful to demonstrate short arithmetic, not byte arithmetic, which is the same size as int arithmetic on x86.

The penalty for byte arithmetic is the shortage of registers. Even so, if you're talking about a general solution, not treating bytes differently from shorts, then I only need mention the shorts.

Also, Implying I have nefarious motives here is not called for.


> 2. Cycle-counting (or byte-counting) is not a sensible approach to language design.  It is relevant to language implementation, maybe; and whole-program performance may be relevant to language design; but these sorts of changes are marginal and should not get in the way of correct semantics.

That's fine unless you're using a systems programming language, where the customers expect performance.

Remember the the recent deal with the x87 where dmd would keep the extra precision around, to avoid the double rounding problem? I propagated this to dmc, and it cost me a design win. The customer benchmarked it on 'float' arithmetic, and pronounced dmc 10% slower. The double rounding issue did not interest him.


> 3. Your code example actually does exactly what you suggest--using short arithmetic for storage.

The load instructions still use the extra operand size override bytes.


> It just happens that in this case using short calculations rather than int calculations yields the same result and smaller code.

It's not "just happens". Every short load will incur an extra byte. I compiled it with gcc -O, too, just so nobody will accuse me of sabotaging the result with dmd.


> 4. (continued from 3) in a larger, more interesting expression, regardless of language semantics, the compiler will generally be free to use ints for intermediates.

If it does, then you'll have other truncation problems depending on how the optimization of the expression plays out. Unless you went the x87 route and slowed everything down by truncating every subexpression to short.

Seriously, I've been around the block with this for 40 years. There are no magic solutions. The obvious solutions all simply have other problems. The integral promotion rules really are the most practical solution. It's best to simply spend a few moments learning them, and you'll be fine.

February 04, 2022
On 2/4/2022 2:18 PM, Siarhei Siamashka wrote:
> If we want D language to be SIMD friendly, then discouraging the use of `short` and `byte` types for local variables isn't the best idea.

SIMD is its own world, and why D has vector types as a core language feature. I never had much faith in autovectorization.
February 04, 2022
Amusingly, how signed division is done when a signed divide instruction is not available is to save the signs of the operands, negate them to unsigned, do the unsigned divide, then negate the result according to the original signs.

Unsigned operations are the core of how CPUs work, the signed computations are another layer on top of that.
February 04, 2022
On Friday, 4 February 2022 at 23:36:11 UTC, Adam Ruppe wrote:
> I don't think you understand my proposal, which is closer to C's existing rules than D is now.

To reiterate:

C's rule: int promote, DO allow narrowing implicit conversion.

D's rule: int promote, do NOT allow narrowing implicit conversion unless VRP passes.

My proposed rule: int promote, do NOT allow narrowing implicit conversion unless VRP passes OR the requested conversion is the same as the largest input type (with literals excluded unless their value is obviously out of range).


So there's no change to the actual calculation. Just loosening D's currently strict implicit conversion rule back to something closer to C's permissive standard.

There'd be zero changes to codegen. No modification of intermediate values. It just allows implicit conversions back to the input *just like C does*.
February 04, 2022
On Friday, 4 February 2022 at 23:43:28 UTC, Walter Bright wrote:
> The penalty for byte arithmetic is the shortage of registers.

On 64-bit, there are as many byte registers as word registers.  (More, technically, but the high-half registers should be avoided at all costs.)


> Implying I have nefarious motives here is not called for.

Yes.  My bad.


>> these sorts of changes are marginal and should not get in the way of correct semantics.
>
> That's fine unless you're using a systems programming language, where the customers expect performance.

If a customer wants int ops to be generated, they can use ints.  There is nothing preventing them from doing this, as has been pointed out else-thread.


>> 3. Your code example actually does exactly what you suggest--using short arithmetic for storage.
>
> The load instructions still use the extra operand size override bytes.

I do not follow.  Your post said:

> Generally speaking, int should be used for most calculations, short and byte for storage.

How am I to store shorts without an operand-size override prefix?


>> It just happens that in this case using short calculations rather than int calculations yields the same result and smaller code.
>
> It's not "just happens". Every short load will incur an extra byte. I compiled it with gcc -O, too, just so nobody will accuse me of sabotaging the result with dmd.

In this case I was referring to the multiply.  It was possible to load the second register, perform a 32-bit multiply, and then store the truncated result.  In a different context, this might have been worthwhile.


>> 4. (continued from 3) in a larger, more interesting expression, regardless of language semantics, the compiler will generally be free to use ints for intermediates.
>
> If it does, then you'll have other truncation problems depending on how the optimization of the expression plays out. Unless you went the x87 route and slowed everything down by truncating every subexpression to short.

Example: ubyte x,y,z,w; w = x + y + z.

(((x + y) mod 2^32 mod 2^8) + z) mod 2^32 mod 2^8 is the same as (((x + y) mod 2^32) + z) mod 2^32 mod 2^8.  The mod 2^32 are implicit in the use of 32-bit registers; the mod 2^8 are explicit truncation.  The former form, with two explicit truncations, can be rewritten as the latter form, getting rid of the intermediate truncation, giving the exact same result as with promotion.
February 04, 2022
On 2/4/2022 3:55 PM, Elronnd wrote:
> On Friday, 4 February 2022 at 23:43:28 UTC, Walter Bright wrote:
>> The penalty for byte arithmetic is the shortage of registers.
> On 64-bit, there are as many byte registers as word registers. (More, technically, but the high-half registers should be avoided at all costs.)

Access to those byte registers requires an additional REX byte.


>> That's fine unless you're using a systems programming language, where the customers expect performance.
> If a customer wants int ops to be generated, they can use ints. There is nothing preventing them from doing this, as has been pointed out else-thread.

Actually, they'd need to insert casts to int for subexpressions. This is not going to be appealing.


>>> 3. Your code example actually does exactly what you suggest--using short arithmetic for storage.
>>
>> The load instructions still use the extra operand size override bytes.
> 
> I do not follow.  Your post said:
> 
>> Generally speaking, int should be used for most calculations, short and byte for storage.
> 
> How am I to store shorts without an operand-size override prefix?

Consider more complex expressions than load and store.


>>> 4. (continued from 3) in a larger, more interesting expression, regardless of language semantics, the compiler will generally be free to use ints for intermediates.
>>
>> If it does, then you'll have other truncation problems depending on how the optimization of the expression plays out. Unless you went the x87 route and slowed everything down by truncating every subexpression to short.
> 
> Example: ubyte x,y,z,w; w = x + y + z.
> 
> (((x + y) mod 2^32 mod 2^8) + z) mod 2^32 mod 2^8 is the same as (((x + y) mod 2^32) + z) mod 2^32 mod 2^8.  The mod 2^32 are implicit in the use of 32-bit registers; the mod 2^8 are explicit truncation.  The former form, with two explicit truncations, can be rewritten as the latter form, getting rid of the intermediate truncation, giving the exact same result as with promotion.

Consider:

    byte a, b;
    int d = a + b;

You're going to get surprising results with your proposal.
February 04, 2022
On 2/4/2022 3:41 PM, Elronnd wrote:
> I think the anecdote regarding Gosling demonstrates that these issues are not well understood.

None of the other proposals are better understood.
February 05, 2022

On Friday, 4 February 2022 at 23:45:31 UTC, Walter Bright wrote:

>

On 2/4/2022 2:18 PM, Siarhei Siamashka wrote:

>

If we want D language to be SIMD friendly, then discouraging the use of short and byte types for local variables isn't the best idea.

SIMD is its own world, and why D has vector types as a core language feature. I never had much faith in autovectorization.

I don't have much faith in autovectorization quality either, but this feature is provided by free by GCC and LLVM backends. And right now excessively paranoid errors about byte/short variables coerce the users into one of these two unattractive alternatives:

  • litter the code with ugly casts
  • change types of temporary variables to ints and waste some vectorization opportunities

When the signal/noise ratio is bad, then it's natural that the users start ignoring error messages. Beginners are effectively trained to apply casts without thinking just to shut up the annoying compiler and it leads to situations like this: https://forum.dlang.org/thread/uqeobimtzhuyhvjpvkvz@forum.dlang.org

Is see VRP as just a band-aid, which helps very little, but causes a lot of inconveniences.

My suggestion:

  1. Implement wrapping_add, wrapping_sub, wrapping_mul intrinsics similar to Rust, this is easy and costs nothing.
  2. Implement an experimental -ftrapv option in one of the D compilers (most likely GDC or LDC) to catch both signed and unsigned overflows at runtime. Or maybe add function attributes to enable/disable this functionality with a more fine grained control. Yes, I know that this violates the current D language spec, which requires two's complement wraparound for everything, but it doesn't matter for a fancy experimental option.
  3. Run some tests with -ftrapv and check how many arithmetic overflows are actually triggered in Phobos. Replace the affected arithmetic operators with intrinsics if the wrapping behavior is actually intended.
  4. In the long run consider updating the language spec.

Benefits: even if -ftrapv turns out to have a high overhead, this would still become a useful tool for testing arithmetic overflows safety in applications. Having something is better than having nothing.