March 27, 2022

On Saturday, 26 March 2022 at 19:30:13 UTC, Walter Bright wrote:

>

On 3/25/2022 4:33 PM, Murilo wrote:

>

I have been waiting for years now, can't you guys just add these 2 types to the language?

I did some work implementing it as a native type. But after a while I realized that it was not appropriate to implement it that way, it should be a library type like complex is.

I guess I must have missed this "complexity" detail, as I didn't contribute to development of D. I think it should stay at the library level. Obviously, the library is sufficient for Murilo...

SDB@79

March 28, 2022
On Saturday, 26 March 2022 at 19:30:13 UTC, Walter Bright wrote:
> https://dlang.org/phobos/core_int128.html

I tried to find faults in its divide and modulo so that I could rant about wideint.d being rewritten, but actually core.int128 seems to be better and correct (possibly more correct with negative modulo even). :)

So now it's all about adding the operator overloads, and done with that particular complaint!
March 28, 2022

On Friday, 25 March 2022 at 23:50:39 UTC, Adam Ruppe wrote:

>

What would you do with it? There's a number of alternatives available today depending what you need.

There is literally no way to implement a big number lib that'll optimize properly without and integer that is 2x what the machine supports. That means 128 bits on 64 bits machines. This is important for cryptography for instance.

March 28, 2022
On Monday, 28 March 2022 at 18:37:27 UTC, Guillaume Piolat wrote:
> On Saturday, 26 March 2022 at 19:30:13 UTC, Walter Bright wrote:
>> https://dlang.org/phobos/core_int128.html
>
> I tried to find faults in its divide and modulo so that I could rant about wideint.d being rewritten, but actually core.int128 seems to be better and correct (possibly more correct with negative modulo even). :)
>
> So now it's all about adding the operator overloads, and done with that particular complaint!

With LDC 1.27.1 :

```d
Cent foobar(Cent a, Cent b) {
    return mul(a, b);
}
```

Codegen:
```asm
nothrow @nogc @safe example.Cent example.foobar(example.Cent, example.Cent):
        push    r15
        push    r14
        push    rbx
        mov     r9d, edi
        mov     r8, rdi
        shr     r8, 32
        mov     r11d, esi
        mov     r10, rsi
        shr     r10, 32
        mov     ebx, edx
        mov     rax, rbx
        imul    r11, rbx
        imul    rbx, r9
        imul    rax, r8
        mov     r14d, ebx
        shr     rbx, 32
        add     rbx, rax
        mov     eax, ebx
        shr     rbx, 32
        add     rbx, r11
        imul    r10d, edx
        shr     rdx, 32
        mov     r11, rdx
        imul    r11, r9
        add     rax, r11
        mov     r11, rdx
        imul    r11, r8
        mov     r15d, ebx
        add     r15, r11
        mov     r11, rax
        shr     r11, 32
        add     r11, r15
        imul    edx, esi
        add     edx, r10d
        mov     r10d, ecx
        imul    r10, r9
        mov     esi, r11d
        add     rsi, r10
        imul    r8d, ecx
        add     r8d, edx
        shr     rcx, 32
        imul    ecx, edi
        add     ecx, r8d
        shl     rax, 32
        or      rax, r14
        shl     rcx, 32
        add     rcx, rbx
        movabs  rdx, -4294967296
        and     rcx, rdx
        add     rcx, r11
        and     rcx, rdx
        add     rsi, rcx
        mov     rdx, rsi
        pop     rbx
        pop     r14
        pop     r15
        ret
```

Now let's see what we can get with clang and __int128_t, same backend so the comparison is fair:

```cpp
__uint128_t foobar(__uint128_t a, __uint128_t b) {
    return a * b;
}
```

codegen:
```asm
foobar(unsigned __int128, unsigned __int128):                            # @foobar(unsigned __int128, unsigned __int128)
        mov     r8, rdx
        mov     rax, rdx
        mul     rdi
        imul    rsi, r8
        add     rdx, rsi
        imul    rcx, rdi
        add     rdx, rcx
        ret
```

Why do I even have to argue that case?
March 29, 2022

On Monday, 28 March 2022 at 19:35:10 UTC, deadalnix wrote:

>

Why do I even have to argue that case?

Because different architectures and compilers do different things. And some people value correctness over speed, especially if later we do get 128bit registers we want it to work exactly as expected when it gets recompiled.

Though i'm sure you know this and are just doing raw comparison of size.

Writing these functions using purely C you can manage to get the job done but it's a lot more work and takes a lot more steps. I ended up writing some of my hardware specific functions 3 times, 32bit, 64bit and generic. 32/64bit worked quite well with asm to speed things up for any size you'd want. Generic however didn't like it as much and actually is SLOWER than 32bit, but should work on anything and even works in CTFE (to make it as close to a native type as i could)

What i wrote should be close to the level of the cpp example, just won't be inlined; Though i suppose i could write an inline one just for 128bit which would be a lot easier to inline for one level up...

March 29, 2022

On Tuesday, 29 March 2022 at 06:28:17 UTC, Era Scarecrow wrote:

>

On Monday, 28 March 2022 at 19:35:10 UTC, deadalnix wrote:

>

Why do I even have to argue that case?

Because different architectures and compilers do different things. And some people value correctness over speed, especially if later we do get 128bit registers we want it to work exactly as expected when it gets recompiled.

No, I picked the exact same toolchain on purpose so that the approach themselves can be compared.

There is no correctness vs speed here, both code are correct. One is going to be significantly faster, but, in addition, one is going to optimize better with it surroundings, so what you'll see in practice is an even wider gap that what is presented above.

This approach will not work. I know because I specifically worked on making LLVM optimize this type of code and know how much harder it is to get good code in the presence of 128 bits integers vs in their absence.

The CPU has a lot of instructions to help handle large integers. When you let the backend do its magic, it can leverage them. When you instead give it good old small integers code and it has to infer the meaning from it and reconstruct the large integers ops you meant to be doing and optmize that, you introduce so many failure point that it's practically impossible to get a competitive result.

Once again, both of the exemple above you THE EXACT SAME toolchain, the only different is the approach to 128 bits integers in the frontend.

March 29, 2022
On Monday, 28 March 2022 at 18:40:34 UTC, deadalnix wrote:
> On Friday, 25 March 2022 at 23:50:39 UTC, Adam Ruppe wrote:
>> What would you do with it? There's a number of alternatives available today depending what you need.
>
> There is literally no way to implement a big number lib

Murilo might not need a big number lib at all.

For astronomy, there's a good chance floating point will do the job.
March 29, 2022

On Tuesday, 29 March 2022 at 12:06:54 UTC, deadalnix wrote:

>

On Tuesday, 29 March 2022 at 06:28:17 UTC, Era Scarecrow wrote:

>

[...]

No, I picked the exact same toolchain on purpose so that the approach themselves can be compared.

There is no correctness vs speed here, both code are correct. One is going to be significantly faster, but, in addition, one is going to optimize better with it surroundings, so what you'll see in practice is an even wider gap that what is presented above.

This approach will not work. I know because I specifically worked on making LLVM optimize this type of code and know how much harder it is to get good code in the presence of 128 bits integers vs in their absence.

The CPU has a lot of instructions to help handle large integers. When you let the backend do its magic, it can leverage them. When you instead give it good old small integers code and it has to infer the meaning from it and reconstruct the large integers ops you meant to be doing and optmize that, you introduce so many failure point that it's practically impossible to get a competitive result.

Once again, both of the exemple above you THE EXACT SAME toolchain, the only different is the approach to 128 bits integers in the frontend.

I agree. Having dmd as a fast backend is OK but if something as basically trivial as this cannot be implemented the we are in trouble.

That being said it may be possible to lower the cent type to the druntime type as a hack, then LDC and GDC can do what they need to do.

March 29, 2022

On Tuesday, 29 March 2022 at 12:20:21 UTC, max haughton wrote:

>

That being said it may be possible to lower the cent type to the druntime type as a hack, then LDC and GDC can do what they need to do.

In intel-intrinsics, unsupported vector types are emulated:

  • all vectors in DMD 32-bit
  • all vectors in DMD 64-bit with an option that disable D_SIMD
  • AVX vectors in GDC without AVX

so it's just a matter of adding a version there and some work, to have backend support AND work everywhere.

March 29, 2022
On Tuesday, 29 March 2022 at 12:13:31 UTC, Adam D Ruppe wrote:
> On Monday, 28 March 2022 at 18:40:34 UTC, deadalnix wrote:
>> On Friday, 25 March 2022 at 23:50:39 UTC, Adam Ruppe wrote:
>>> What would you do with it? There's a number of alternatives available today depending what you need.
>>
>> There is literally no way to implement a big number lib
>
> Murilo might not need a big number lib at all.
>
> For astronomy, there's a good chance floating point will do the job.

That not going to work if you are comparing things that are extremely big with something that is extremely small here.

- Alex