May 13
On Thursday, 13 May 2021 at 12:06:01 UTC, Witold Baryluk wrote:
> On Thursday, 13 May 2021 at 11:58:50 UTC, Witold Baryluk wrote:
>> On Thursday, 13 May 2021 at 01:59:15 UTC, Andrei Alexandrescu wrote:
>>
> I just tested, using his benchmark code, on my a bit older AMD Zen+ CPU, that is clocked 2.8GHz (so actually slower than either M1 or the tested Xeon):
>
> I got 1.156ns per u32 divide using hardware divide. If I normalize this to 3.2GHz, it becomes 1.01ns.
>
> 0.399ns (or 0.349ns normalized to 3.2GHz) when using `libdivide`. So exactly same speed as M1 (0.351ms).

Zen3 is about 2 to 3 times faster than Zen1 for both latency and throughput of 32/64 idiv. So if your results are accurate, Zen3 is 2 or 3 times faster than the M1.



May 14
On Thursday, 13 May 2021 at 22:40:06 UTC, Ola Fosheim Grøstad wrote:
> And in this "benchmark" the division could've been moved out of the inner loop by a less-than-braindead compiler.

Eh, no. I was wrong. Moral: never program right before bedtime.

May 27
It's a strange thing to optimise... I seem to do an integer divide so infrequently, that I can't imagine a measurable improvement in most code I've ever written if it were substantially faster. I feel like I stopped doing frequent integer divides right about the same time computers got FPU's...

On Thu, May 13, 2021 at 12:00 PM Andrei Alexandrescu via Digitalmars-d < digitalmars-d@puremagic.com> wrote:

>
> https://www.reddit.com/r/programming/comments/nawerv/benchmarking_division_and_libdivide_on_apple_m1/
>
> Integral division is the strongest arithmetic operation.
>
> I have a friend who knows some M1 internals. He said it's really Star Trek stuff.
>
> This will seriously challenge other CPU producers.
>
> What perspectives do we have to run the compiler on M1 and produce M1 code?
>


May 27
On Thursday, 27 May 2021 at 08:46:20 UTC, Manu wrote:
> It's a strange thing to optimise... I seem to do an integer divide so infrequently, that I can't imagine a measurable improvement in most code I've ever written if it were substantially faster. I feel like I stopped doing frequent integer divides right about the same time computers got FPU's...

Maybe Swift programmers dont think in terms of 2^n? Apple wants Swift to be a system level language that is as easy to use as a high level language. Or maybe they have analyzed apps in appstore and found that people use div more than we know?

I personally feel bad if I don't make a datastructure compatible with 2^n algos... Except in Python, there I only feel bad if the code isnt looking clean.

May 27
On Thursday, 27 May 2021 at 08:46:20 UTC, Manu wrote:
> It's a strange thing to optimise... I seem to do an integer divide so infrequently, that I can't imagine a measurable improvement in most code I've ever written if it were substantially faster. I feel like I stopped doing frequent integer divides right about the same time computers got FPU's...
>

There are a few places where it matters. Some cryptographic operations for instance, or data compression/decompression. Memory allocators tend to rely on it, not heavily, but the rest of the system depends heavily on them.

More generally, the problem with x86 divide isn't it's perf per se, but the fact that it is not pipelined on Intel machines (no idea about AMD).
May 27
On Thursday, 27 May 2021 at 12:50:52 UTC, deadalnix wrote:
> On Thursday, 27 May 2021 at 08:46:20 UTC, Manu wrote:
>> It's a strange thing to optimise... I seem to do an integer divide so infrequently, that I can't imagine a measurable improvement in most code I've ever written if it were substantially faster. I feel like I stopped doing frequent integer divides right about the same time computers got FPU's...
>>
>
> There are a few places where it matters. Some cryptographic operations for instance, or data compression/decompression. Memory allocators tend to rely on it, not heavily, but the rest of the system depends heavily on them.
>
> More generally, the problem with x86 divide isn't it's perf per se, but the fact that it is not pipelined on Intel machines (no idea about AMD).

Not pipelined!?

https://www.uops.info/table.html?search=idiv&cb_lat=on&cb_tp=on&cb_uops=on&cb_ports=on&cb_SKL=on&cb_ZEN3=on&cb_measurements=on&cb_doc=on&cb_base=on
May 27
On Thursday, 27 May 2021 at 17:20:41 UTC, Max Haughton wrote:
> Not pipelined!?

On older Intel CPUs integer divide started to use parts of floating point divide logic to save space. Still pipelined... But ineffective.
Next ›   Last »
1 2