Jump to page: 1 2
Thread overview
[Issue 24254] LDC crash on Epyc Bergamo
Nov 21, 2023
kinke
Nov 21, 2023
Jure Pečar
Nov 21, 2023
kinke
Nov 21, 2023
Jure Pečar
Nov 21, 2023
kinke
Nov 21, 2023
Jure Pečar
Nov 21, 2023
Jure Pečar
Nov 23, 2023
Jure Pečar
Nov 23, 2023
Dlang Bot
Nov 23, 2023
kinke
Nov 24, 2023
Dlang Bot
Nov 26, 2023
Dlang Bot
Nov 26, 2023
Dlang Bot
Nov 26, 2023
Dlang Bot
November 21, 2023
https://issues.dlang.org/show_bug.cgi?id=24254

kinke <kinke@gmx.net> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |kinke@gmx.net
          Component|dmd                         |druntime
                 OS|Linux                       |All

--- Comment #1 from kinke <kinke@gmx.net> ---
According to the backtrace, the problem is in druntime's `core.cpuid` - of the **host compiler's** druntime used to build LDC. Which one did you use? The issue might have been fixed in recent druntime already.

--
November 21, 2023
https://issues.dlang.org/show_bug.cgi?id=24254

Jure Pečar <jurij.pecar@embl.de> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
          Component|druntime                    |dmd
                 OS|All                         |Linux

--- Comment #2 from Jure Pečar <jurij.pecar@embl.de> ---
I don't know what official binaries are built with, my build used gcc 12.3, llvm 16.0.6 and ldc 1.24 to build ldc 1.35.

--
November 21, 2023
https://issues.dlang.org/show_bug.cgi?id=24254

kinke <kinke@gmx.net> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
          Component|dmd                         |druntime
                 OS|Linux                       |All

--- Comment #3 from kinke <kinke@gmx.net> ---
Please leave my component and hardware changes in place, this has absolutely nothing to do with the DMD compiler.

The official LDC binaries are compiled with itself, so v1.35 is built with v1.35. So from your description, we only know that `core.cpuid` of the LDC v1.24 druntime, i.e. druntime v2.094, doesn't support your CPU. But as your LDC v1.24 host compiler works, this means that the host druntime used for building that LDC v1.24 works.

--
November 21, 2023
https://issues.dlang.org/show_bug.cgi?id=24254

--- Comment #4 from Jure Pečar <jurij.pecar@embl.de> ---
Sorry this is my first time meeting D ecosystem, I'm having trouble following your feedback.

So by "host compiler" you mean the previous version of LDC that was used to build current version of LDC?

If that's the case, can you tell me which version of LDC started recognizing and working with zen4c cpus?

--
November 21, 2023
https://issues.dlang.org/show_bug.cgi?id=24254

--- Comment #5 from kinke <kinke@gmx.net> ---
(In reply to Jure Pečar from comment #4)
> Sorry this is my first time meeting D ecosystem, I'm having trouble following your feedback.

No worries.

> So by "host compiler" you mean the previous version of LDC that was used to build current version of LDC?

Yes.

> If that's the case, can you tell me which version of LDC started recognizing and working with zen4c cpus?

I don't know if it is working in current druntime. You could e.g. launch some Ubuntu/Debian container (min Ubuntu 20.04 for glibc) and try to run the official v1.35 in there. If there's no startup error, druntime v2.105 probably works.

FWIW, the problematic module is https://github.com/dlang/dmd/blob/master/druntime/src/core/cpuid.d. As the name suggests, it uses/depends on the CPUID instruction. I can only tell you that everything works on my workstation, a Threadripper 3960X.

--
November 21, 2023
https://issues.dlang.org/show_bug.cgi?id=24254

--- Comment #6 from Jure Pečar <jurij.pecar@embl.de> ---
I'll try to wade through this cpuid detection logic in an attempt to spot something. How would I narrow down the approximate location in the code where crash happens?

FYI, Sambamba (and LDC) works fine on zen4 cpus such as Genoa and Genoa-X. /proc/cpuinfo reports identical cpuid level and flags for all three. Only difference for zen4c (Bergamo) should be smaller cache. Does that help us narrowing down the issue?

--
November 21, 2023
https://issues.dlang.org/show_bug.cgi?id=24254

--- Comment #7 from Jure Pečar <jurij.pecar@embl.de> ---
Here's a diff of `cpuid -1` output from 32c Genoa (-) and 128c Bergamo (+):

@@ -3,16 +3,16 @@
    version information (1/eax):
       processor type  = primary processor (0)
       family          = 0xf (15)
-      model           = 0x1 (1)
-      stepping id     = 0x1 (1)
+      model           = 0x0 (0)
+      stepping id     = 0x2 (2)
       extended family = 0xa (10)
-      extended model  = 0x1 (1)
+      extended model  = 0xa (10)
       (family synth)  = 0x19 (25)
-      (model synth)   = 0x11 (17)
-      (simple synth)  = AMD EPYC (4th Gen) (Genoa B1) [Zen 4], 5nm
+      (model synth)   = 0xa0 (160)
+      (simple synth)  = AMD Ryzen (Bergamo) [Zen 4c], 5nm
    miscellaneous (1/ebx):
-      process local APIC physical ID = 0x10 (16)
-      maximum IDs for CPUs in pkg    = 0x40 (64)
+      process local APIC physical ID = 0xd6 (214)
+      maximum IDs for CPUs in pkg    = 0xff (255)
       CLFLUSH line size              = 0x8 (8)
       brand index                    = 0x0 (0)
    brand id = 0x00 (0): unknown
@@ -80,7 +80,7 @@
       RDRAND instruction                      = true
       hypervisor guest status                 = false
    cache and TLB information (2):
-   processor serial number = 00A1-0F11-0000-0000-0000-0000
+   processor serial number = 00AA-0F02-0000-0000-0000-0000
    deterministic cache parameters (4):
       --- cache 0 ---
       cache type                         = no more caches (0)
@@ -287,7 +287,7 @@
       bit width of fixed counters              = 0x0 (0)
       anythread deprecation                    = false
    x2APIC features / processor topology (0xb):
-      extended APIC ID                      = 16
+      extended APIC ID                      = 214
       --- level 0 ---
       level number                          = 0x0 (0)
       level type                            = thread (1)
@@ -296,8 +296,8 @@
       --- level 1 ---
       level number                          = 0x1 (1)
       level type                            = core (2)
-      bit width of level & previous levels  = 0x6 (6)
-      number of logical processors at level = 0x40 (64)
+      bit width of level & previous levels  = 0x8 (8)
+      number of logical processors at level = 0x100 (256)
       --- level 2 ---
       level number                          = 0x2 (2)
       level type                            = invalid (0)
@@ -401,13 +401,13 @@
       highest COS number supported             = 0xf (15)
    extended processor signature (0x80000001/eax):
       family/generation = 0xf (15)
-      model           = 0x1 (1)
-      stepping id     = 0x1 (1)
+      model           = 0x0 (0)
+      stepping id     = 0x2 (2)
       extended family = 0xa (10)
-      extended model  = 0x1 (1)
+      extended model  = 0xa (10)
       (family synth)  = 0x19 (25)
-      (model synth)   = 0x11 (17)
-      (simple synth)  = AMD EPYC (4th Gen) (Genoa B1) [Zen 4], 5nm
+      (model synth)   = 0xa0 (160)
+      (simple synth)  = AMD Ryzen (Bergamo) [Zen 4c], 5nm
    extended feature flags (0x80000001/edx):
       x87 FPU on chip                       = true
       virtual-8086 mode enhancement         = true
@@ -469,7 +469,7 @@
       LLC performance counter extensions     = true
       MWAITX/MONITORX supported              = true
       Address mask extension support         = true
-   brand = "AMD EPYC 9334 32-Core Processor                "
+   brand = "AMD EPYC 9754 128-Core Processor               "
    L1 TLB/cache information: 2M/4M pages & L1 TLB (0x80000005/eax):
       instruction # entries     = 0x40 (64)
       instruction associativity = 0xff (255)
@@ -509,7 +509,7 @@
       line size (bytes)     = 0x40 (64)
       lines per tag         = 0x1 (1)
       associativity         = 0x9 (9)
-      size (in 512KB units) = 0x100 (256)
+      size (in 512KB units) = 0x200 (512)
    RAS Capability (0x80000007/ebx):
       MCA overflow recovery support = true
       SUCCOR support                = true
@@ -566,8 +566,8 @@
       branch sampling feature support          = false
       (vuln to branch type confusion synth)    = false
    Size Identifiers (0x80000008/ecx):
-      number of threads                   = 0x40 (64)
-      ApicIdCoreIdSize                    = 0x6 (6)
+      number of threads                   = 0x100 (256)
+      ApicIdCoreIdSize                    = 0x8 (8)
       performance time-stamp counter size = 40 bits (0)
    Feature Extended Size (0x80000008/edx):
       max page count for INVLPGB instruction = 0x7 (7)
@@ -714,13 +714,13 @@
       line size in bytes              = 0x40 (64)
       physical line partitions        = 0x1 (1)
       number of ways                  = 0x10 (16)
-      number of sets                  = 32768
+      number of sets                  = 16384
       write-back invalidate           = true
       cache inclusive of lower levels = false
-      (synth size)                    = 33554432 (32 MB)
-   extended APIC ID = 16
+      (synth size)                    = 16777216 (16 MB)
+   extended APIC ID = 214
    Core Identifiers (0x8000001e/ebx):
-      core ID          = 0x8 (8)
+      core ID          = 0x6b (107)
       threads per core = 0x2 (2)
    Node Identifiers (0x8000001e/ecx):
       node ID             = 0x0 (0)
@@ -799,14 +799,14 @@
       number of LBR stack entries           = 0x10 (16)
       number of avail Northbridge perf ctrs = 0x10 (16)
       number of available UMC PMCs          = 0x20 (32)
-      active UMCs bitmask                   = 0x6db
+      active UMCs bitmask                   = 0xfff
    Multi-Key Encrypted Memory Capabilities (0x80000023):
       secure host multi-key memory support = true
       number of encryption key IDs         = 0x3f (63)
    0x80000024 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000
edx=0x00000000
    0x80000025 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000
edx=0x00000000
    AMD Extended CPU Topology (0x80000026):
-      extended APIC ID                        = 16
+      extended APIC ID                        = 214
       --- level 0 ---
       level number                            = 0x0 (0)
       level type                              = core (1)
@@ -821,9 +821,9 @@
       CMPXCHG8B                = true
       conditional move/compare = true
       PREFETCH/PREFETCHW       = true
-   (multi-processing synth) = multi-core (c=32), hyper-threaded (t=2)
+   (multi-processing synth) = multi-core (c=128), hyper-threaded (t=2)
    (multi-processing method) = AMD leaf 0xb
-   (APIC widths synth): CORE_width=5 SMT_width=1
-   (APIC synth): PKG_ID=0 CORE_ID=8 SMT_ID=0
-   (uarch synth) = AMD Zen 4, 5nm
-   (synth) = AMD EPYC (4th Gen) (Genoa B1) [Zen 4], 5nm
+   (APIC widths synth): CORE_width=7 SMT_width=1
+   (APIC synth): PKG_ID=0 CORE_ID=107 SMT_ID=0
+   (uarch synth) = AMD Zen 4c, 5nm
+   (synth) = AMD Ryzen (Bergamo) [Zen 4c], 5nm

Since that cpuid.d is mostly poking around these register values, I'm pretty sure that the key to fixing this issue is hiding in here.

--
November 23, 2023
https://issues.dlang.org/show_bug.cgi?id=24254

--- Comment #8 from Jure Pečar <jurij.pecar@embl.de> ---
Keen eyes in easybuild community noticed that cpuid.d uses ubyte for numcores in a couple of places. For example, function getcacheinfoCPUID4 uses uint for numcores, but function getAMDcacheinfo uses ubyte. It also doesn't differentiate between cores and threads so I assume it walks all logical cpus there in the loop on lines 633-641. Bergamo has 128 cores, 256 threads, ubyte rolls over and then on line 659 you divide something by numcores. Boom.

To test this hypothesis, I disabled SMT on one of the Bergamo nodes. Indeed, LDC then works as expected:

# ldc2 Error: No source files

So I'd say the fix is to just s/ubyte/uint/g on cpuid.d. And check if you do any similar things elsewhere.

Thanks,

--
November 23, 2023
https://issues.dlang.org/show_bug.cgi?id=24254

--- Comment #9 from Dlang Bot <dlang-bot@dlang.rocks> ---
@kinke created dlang/dmd pull request #15859 "core.cpuid: Fix div-by-zero on AMD CPUs with 256 (physical?) cores" mentioning this issue:

- core.cpuid: Fix div-by-zero on AMD CPUs with 256 (physical?) cores

  See:
https://en.wikipedia.org/wiki/CPUID#EAX=80000008h:_Virtual_and_Physical_address_Sizes

  This *might* fix Issue 24254, although I'd expect the read value for
  that CPU to be 127 (*physical* cores minus 1), not the problematic 255.

https://github.com/dlang/dmd/pull/15859

--
November 23, 2023
https://issues.dlang.org/show_bug.cgi?id=24254

--- Comment #10 from kinke <kinke@gmx.net> ---
(In reply to Jure Pečar from comment #8)
> Keen eyes in easybuild community noticed that cpuid.d uses ubyte for numcores in a couple of places […]

Thank you, and please send those keen eyes my regards. :)

--
« First   ‹ Prev
1 2