Thread overview
Fastest Prime Sieve, in D
Jun 28, 2019
Jabari Zakiya
Jun 29, 2019
Jabari Zakiya
Jun 29, 2019
Exil
June 28, 2019
This is a continuation of a previous thread on the Segmented Sieve of Zakiya (SSoZ):

https://forum.dlang.org/post/ozwapetxgcxkbzjjslqe@forum.dlang.org

to present the D version of the current state-of-art of the work.

The D version source code also performs the Fastest Primes Sieve for Twin Primes.

https://gist.github.com/jzakiya/ae93bfa03dbc8b25ccc7f97ff8ad0f61

It's a direct translation of the (original) Nim (www.nim-lang.org) version.

https://gist.github.com/jzakiya/6c7e1868bd749a6b1add62e3e3b2341e

For those interested, here are two places to get my current paper on it.
I need to update it (again) to incorporate this new algorithm, and results.

https://www.academia.edu/37952623/The_Use_of_Prime_Generators_to_Implement_Fast_Twin_Primes_Sieve_of_Zakiya_SoZ_Applications_to_Number_Theory_and_Implications_for_the_Riemann_Hypotheses

https://www.scribd.com/document/395415391/The-Use-of-Prime-Generators-to-Implement-Fast-Twin-Primes-Sieve-Of-Zakiya-SoZ-Applications-to-Number-Theory-and-Implications-for-the-Riemann-Hypoth

I compiled the D source using the latest LDC 1.16.0. When comparing times to Nim it's comparable but becomes slower as inputs sizes increase. Here are some times for the Nim version compared to `primesieve` -- https://github.com/kimwalisch/primesieve

```
➜  ~ inxi -SCM
System:    Host: localhost.localdomain Kernel: 5.1.11-pclos1 x86_64 bits: 64 Desktop: KDE Plasma 5.16.1 Distro: PCLinuxOS 2019
Machine:   Type: Laptop System: System76 product: Gazelle v: gaze10 serial: <root required>
           Mobo: System76 model: Gazelle v: gaze10 serial: <root required> UEFI [Legacy]: American Megatrends v: 1.05.08
           date: 03/31/2016
CPU:       Topology: Quad Core model: Intel Core i7-6700HQ bits: 64 type: MT MCP L2 cache: 6144 KiB
           Speed: 1185 MHz min/max: 800/3500 MHz Core speeds (MHz): 1: 800 2: 800 3: 800 4: 800 5: 800 6: 800 7: 800 8: 800
```

```
            N          | twinprimes_ssoz |   primesieve  | Ratio |
    -------------------|-----------------|---------------|-------|
       100_000_000_000 |       4.628     |      5.837    |  1.26 |
       500_000_000_000 |      23.308     |     32.765    |  1.41 |
     1_000_000_000_000 |      47.966     |     69.705    |  1.45 |
     5_000_000_000_000 |     265.746     |    388.233    |  1.46 |
    10_000_000_000_000 |     572.803     |    821.755    |  1.43 |
    50_000_000_000_000 |    3004.487     |   4620.525    |  1.54 |
   100_000_000_000_000 |    6307.521     |   9724.568    |  1.54 |
   200_000_000_000_000 |   13603.531     |  20279.547    |  1.49 |
```
I'm hoping people can help improve it and make it faster than it currently is.

Thanks.

Jabari
June 29, 2019
On Friday, 28 June 2019 at 19:32:41 UTC, Jabari Zakiya wrote:
> This is a continuation of a previous thread on the Segmented Sieve of Zakiya (SSoZ):
>
> https://forum.dlang.org/post/ozwapetxgcxkbzjjslqe@forum.dlang.org
>
> to present the D version of the current state-of-art of the work.
>
> The D version source code also performs the Fastest Primes Sieve for Twin Primes.
>
> https://gist.github.com/jzakiya/ae93bfa03dbc8b25ccc7f97ff8ad0f61
>
> It's a direct translation of the (original) Nim (www.nim-lang.org) version.
>
> https://gist.github.com/jzakiya/6c7e1868bd749a6b1add62e3e3b2341e
>
> For those interested, here are two places to get my current paper on it.
> I need to update it (again) to incorporate this new algorithm, and results.
>
> https://www.academia.edu/37952623/The_Use_of_Prime_Generators_to_Implement_Fast_Twin_Primes_Sieve_of_Zakiya_SoZ_Applications_to_Number_Theory_and_Implications_for_the_Riemann_Hypotheses
>
> https://www.scribd.com/document/395415391/The-Use-of-Prime-Generators-to-Implement-Fast-Twin-Primes-Sieve-Of-Zakiya-SoZ-Applications-to-Number-Theory-and-Implications-for-the-Riemann-Hypoth
>
> I compiled the D source using the latest LDC 1.16.0. When comparing times to Nim it's comparable but becomes slower as inputs sizes increase. Here are some times for the Nim version compared to `primesieve` -- https://github.com/kimwalisch/primesieve
>
> ```
> ➜  ~ inxi -SCM
> System:    Host: localhost.localdomain Kernel: 5.1.11-pclos1 x86_64 bits: 64 Desktop: KDE Plasma 5.16.1 Distro: PCLinuxOS 2019
> Machine:   Type: Laptop System: System76 product: Gazelle v: gaze10 serial: <root required>
>            Mobo: System76 model: Gazelle v: gaze10 serial: <root required> UEFI [Legacy]: American Megatrends v: 1.05.08
>            date: 03/31/2016
> CPU:       Topology: Quad Core model: Intel Core i7-6700HQ bits: 64 type: MT MCP L2 cache: 6144 KiB
>            Speed: 1185 MHz min/max: 800/3500 MHz Core speeds (MHz): 1: 800 2: 800 3: 800 4: 800 5: 800 6: 800 7: 800 8: 800
> ```
>
> ```
>             N          | twinprimes_ssoz |   primesieve  | Ratio |
>     -------------------|-----------------|---------------|-------|
>        100_000_000_000 |       4.628     |      5.837    |  1.26 |
>        500_000_000_000 |      23.308     |     32.765    |  1.41 |
>      1_000_000_000_000 |      47.966     |     69.705    |  1.45 |
>      5_000_000_000_000 |     265.746     |    388.233    |  1.46 |
>     10_000_000_000_000 |     572.803     |    821.755    |  1.43 |
>     50_000_000_000_000 |    3004.487     |   4620.525    |  1.54 |
>    100_000_000_000_000 |    6307.521     |   9724.568    |  1.54 |
>    200_000_000_000_000 |   13603.531     |  20279.547    |  1.49 |
> ```
> I'm hoping people can help improve it and make it faster than it currently is.
>
> Thanks.
>
> Jabari

Timing comparisons on my system. Nim code compiled with GCC 8.3.1 (system has LLVM 8.0.0).

```
            N          |       Nim       |       D       | Ratio |
    -------------------|-----------------|---------------|-------|
       100_000_000_000 |       4.628     |      4.619    |  1.00 |
       500_000_000_000 |      23.308     |     24.071    |  1.03 |
     1_000_000_000_000 |      47.966     |     49.703    |  1.04 |
     5_000_000_000_000 |     265.746     |    276.847    |  1.04 |
    10_000_000_000_000 |     572.803     |    590.881    |  1.03 |
    50_000_000_000_000 |    3009.485     |   3046.913    |  1.01 |
   100_000_000_000_000 |    6307.521     |   6594.283    |  1.04 |
   200_000_000_000_000 |   13603.531     |  14706.743    |  1.08 |
```


June 29, 2019
On Saturday, 29 June 2019 at 12:28:35 UTC, Jabari Zakiya wrote:
> Timing comparisons on my system. Nim code compiled with GCC 8.3.1 (system has LLVM 8.0.0).
>
> ```
>             N          |       Nim       |       D       | Ratio |
>     -------------------|-----------------|---------------|-------|
>        100_000_000_000 |       4.628     |      4.619    |  1.00 |
>        500_000_000_000 |      23.308     |     24.071    |  1.03 |
>      1_000_000_000_000 |      47.966     |     49.703    |  1.04 |
>      5_000_000_000_000 |     265.746     |    276.847    |  1.04 |
>     10_000_000_000_000 |     572.803     |    590.881    |  1.03 |
>     50_000_000_000_000 |    3009.485     |   3046.913    |  1.01 |
>    100_000_000_000_000 |    6307.521     |   6594.283    |  1.04 |
>    200_000_000_000_000 |   13603.531     |  14706.743    |  1.08 |
> ```

Could be the difference between NIM's GC and D's, seeing as you are continually allocating new arrays. Could just be that the GCC backend for NIM generates better assembly than LLVM. Could profile the two and look at the assembly to see where the additional time is being spent.