Jump to page: 1 25  
Page
Thread overview
October 10
Hi.

I hope this is the right place to request this, if not please tell me a better one.

I had looked at D, and played with it some circa 2010~2012, but time and life took my priorities away. But I'm still interested in learning different languages, but there are so many more now it's hard to devote the time to learn them to some high level of proficiency.

Subsequently, I started learning Nim, because I find the syntax and constructs simpler and more familiar than most (I'm coming mostly from a Ruby background).

I have developed in Nim a program to find twinprimes that seems to be the fastest of its type, compared to primesieve (https://primesieve.org/) which claims to be the fastest, written in C++.

Here is the code to my Nim implementation of twinprimes_ssoz.

https://gist.github.com/jzakiya/6c7e1868bd749a6b1add62e3e3b2341e

The total file is just 318 lines of code, with about 60 separate lines of comments.
The code is extensively commented per line to explain what's happening.
Reading the references given in the code introduction will explain the general process.
See  "The Segmented Sieve of Zakiya (SSoZ)"
https://www.scribd.com/doc/228155369/The-Segmented-Sieve-of-Zakiya-SSoZ

I am in the process of writing up a paper to explain this application, and implementation. What I am requesting here is for a person(s) who is an "expert" (very good) to create a very fast D version, using whatever tricks it has to maximize performance.

I would like to include in my paper a good comparison of various implementations in different compiled languages (C/C++, D, Nim, etc) to show how it performs with each.

This algorithm is designed to be run in multiprocessing environments (more core/threads, better performance). The Nim version (0.18.0, 0.19.0 recently released) uses its native parallel processing structure. In C++, et al, it may be best implemented using OPenMP, or CUDA, etc. These are implementation details one versed in a language could determine.

Well that's it. I am willing to share performance data, compared to primesive, if you like. The beauty of this algorithm|implementation is its simplicity in design, and minimal code. It shouldn't be that hard to understand to determine how to translate into D. Of course, I'm am fully available to answer questions and provide help.

Thanks

Jabari


October 10
On Wednesday, 10 October 2018 at 16:15:56 UTC, Jabari Zakiya wrote:
> [...]

Looking forward to this :)
October 10
On Wednesday, 10 October 2018 at 16:15:56 UTC, Jabari Zakiya wrote:
> https://gist.github.com/jzakiya/6c7e1868bd749a6b1add62e3e3b2341e

As i understand, main thread preallocates global memory and tracks it, and other threads don't track it?
October 10
On Wednesday, 10 October 2018 at 20:43:01 UTC, Kagamin wrote:
> On Wednesday, 10 October 2018 at 16:15:56 UTC, Jabari Zakiya wrote:
>> https://gist.github.com/jzakiya/6c7e1868bd749a6b1add62e3e3b2341e
>
> As i understand, main thread preallocates global memory and tracks it, and other threads don't track it?

Here's an abreviated elementary explanation of the algorithm and implementation.
You will need to read (download) my paper "The Segmented Sieve of Zakiya (SSoZ)"
because I will make reference to things in it, to keep this as short as possible.

https://www.scribd.com/doc/228155369/The-Segmented-Sieve-of-Zakiya-SSoZ

To really understand the math and theory of the algorithm requires primarily
just understanding Table 3 on page 3 of the paper, as it encapsulates everything.
You can read the paper to understand the language of the algorithm used in the code.

Twinprimes are 2 (odd) primes that differ by only 2 eg. 3-5, 5-7, 11-13, 29-31.

Table 3 shows all the residues values (prime candidates|pcs) and residues groups
(resgroups|columns) to find all the primes upto 541 using P5 (modpg = 30, rescnt = 8).

For a given value N, it will be represented with a PG table of some number of
resgroups, with max size I call Kmax (the regroups value N residues in).

Using P5, I only need to sieve primes along the residue tracks (restracks) that can
produce twinprimes, here 11-13, 17-19, 29-31. Thus I create 3 byte arrays, one for each twinpair, and use the lower 2 bits to represent the upper and lower twinprime restracks.

Then for each twinpair (here 3) I run 3 thread which perform the SSoZ on the entire Kmax length of resgroups in parallel. At the end I accumulate the results and print them out. This, in a nutshell, is what the algorithm does. The paper gives you enough to understand the fundamental nature of the algorithm, though I've learned so much more than in 2014. :)

The larger the PG the more twinpair restracks (see page 14) there are to use. For larger numbers you want to use the largest PG possible that 'fits' into the hardware cpu. All my development|testing was done on laptops using Intel I5|I7 cpus with 4 or 8 threads. I'm really interested how it performs on other platforms (ARM, AMD, PowerPC, etc).


The main proc "twinprimes_ssoz" manages program flow and set as follows:

1) accepts the input number (an integer) in "val" from the cli

2) sets "num" to be first odd number < "val" if "val" even

3) calls "selectPG" with "num" to select optimum Prime Generator (PG) parameters

4) compute various working parameters per selected PG and number value (see refs)

5) compute global values for number of primes, and their values, <= sqrt(num) for
selected PG with proc "sozpg"

6) Set initial twinprime count for selected PG

7) Then with proc "segsieve" allocate one thread to perform SSoZ (segmented sieve of zakiya)
on each twinprime pair residues (determined by selected PG), and count number of twinprmes
computed in each thread.

8) Then determine true total twinprime count and last twinprime <= "num"

It also is timing different intervals and prints out diagnostics and final output.


The proc "twins_sieve" is the primary function that manages and performs the SSoZ for
a given twinprim pair parameters in each thread. The more threads the faster the process goes.

I'll provide more info later. I have to run now. I wanted to get this out now while I
was at my laptop, and online.
October 10
On 10/10/2018 03:05 PM, Jabari Zakiya wrote:
> https://www.scribd.com/doc/228155369/The-Segmented-Sieve-of-Zakiya-SSoZ

It would be great if you could provide a link to a freely downloadable version of this.
October 11
On Wednesday, 10 October 2018 at 16:15:56 UTC, Jabari Zakiya wrote:
> I would like to include in my paper a good comparison of various implementations in different compiled languages (C/C++, D, Nim, etc) to show how it performs with each.

If you want help with your paper, possibly some kind of decent financial incentive would be appropriate. If the algorithm benefits from more threads than finding or creating an implementation that runs on a GPU would probably be the true performance test. CPUs have like 4-8 cores in the mainstream? A GPU has hundreds, though with some limitations.
October 11
On Wednesday, 10 October 2018 at 22:25:17 UTC, Neia Neutuladh wrote:
> On 10/10/2018 03:05 PM, Jabari Zakiya wrote:
>> https://www.scribd.com/doc/228155369/The-Segmented-Sieve-of-Zakiya-SSoZ
>
> It would be great if you could provide a link to a freely downloadable version of this.

You can download the paper for free from that link. Did you have trouble doing it?

Here's another link to paper.

https://www.academia.edu/7583194/The_Segmented_Sieve_of_Zakiya_SSoZ_


Here, again, is the link to the Nim code. Just install Nim (current 0.19.0) and compile and run it per instructions in code. I recommend people do that to see its outputs and verify its results.

https://gist.github.com/jzakiya/6c7e1868bd749a6b1add62e3e3b2341e
October 11
On Thursday, 11 October 2018 at 00:22:10 UTC, tide wrote:
> On Wednesday, 10 October 2018 at 16:15:56 UTC, Jabari Zakiya wrote:
>> I would like to include in my paper a good comparison of various implementations in different compiled languages (C/C++, D, Nim, etc) to show how it performs with each.
>
> If you want help with your paper, possibly some kind of decent financial incentive would be appropriate. If the algorithm benefits from more threads than finding or creating an implementation that runs on a GPU would probably be the true performance test. CPUs have like 4-8 cores in the mainstream? A GPU has hundreds, though with some limitations.

I'm writing the paper anyway (just like the others), so other implementations are icing on the cake to show implementation variations, as a benefit to readers. Maybe if I set up a website and created a Rosetta Code repo for people to post their different language implementations, and offer a T-shirt for fastest implementation. :-)

Yes, a GPU based implementation would be the epitome for this algorithm, by far. This is actually why I have gotten the algorithm to this implementation so that the number crunching can all be done in parallel threads. (It would also be screamingly fast done in hardware in a FPGA too.) However, I only have standard consumer grade laptops. Hopefully someone(s) with sufficient hardware, interest, and time, will take this upon themselves to do this and publicize their results.
October 10
On 10/10/2018 07:52 PM, Jabari Zakiyth wrote:
> On Wednesday, 10 October 2018 at 22:25:17 UTC, Neia Neutuladh wrote:
>> On 10/10/2018 03:05 PM, Jabari Zakiya wrote:
>>> https://www.scribd.com/doc/228155369/The-Segmented-Sieve-of-Zakiya-SSoZ
>>
>> It would be great if you could provide a link to a freely downloadable
>> version of this.
>
> You can download the paper for free from that link. Did you have trouble
> doing it?

I think the problem is, scribd requires an account, which they apparently happy to link to an existing Google or Facebook account.

> Here's another link to paper.
>
> https://www.academia.edu/7583194/The_Segmented_Sieve_of_Zakiya_SSoZ_

Similarly, that one requires a Google account.

> Here, again, is the link to the Nim code. Just install Nim (current
> 0.19.0) and compile and run it per instructions in code. I recommend
> people do that to see its outputs and verify its results.
>
> https://gist.github.com/jzakiya/6c7e1868bd749a6b1add62e3e3b2341e

That works! :)

Ali

October 11
On Thursday, 11 October 2018 at 05:11:20 UTC, Ali Çehreli wrote:
> On 10/10/2018 07:52 PM, Jabari Zakiyth wrote:
> > On Wednesday, 10 October 2018 at 22:25:17 UTC, Neia Neutuladh
> wrote:
> >> On 10/10/2018 03:05 PM, Jabari Zakiya wrote:
> >>> 
> https://www.scribd.com/doc/228155369/The-Segmented-Sieve-of-Zakiya-SSoZ
> >>
> >> It would be great if you could provide a link to a freely
> downloadable
> >> version of this.
> >
> > You can download the paper for free from that link. Did you
> have trouble
> > doing it?
>
> I think the problem is, scribd requires an account, which they apparently happy to link to an existing Google or Facebook account.
>
> > Here's another link to paper.
> >
> > 
> https://www.academia.edu/7583194/The_Segmented_Sieve_of_Zakiya_SSoZ_
>
> Similarly, that one requires a Google account.
>
> > Here, again, is the link to the Nim code. Just install Nim
> (current
> > 0.19.0) and compile and run it per instructions in code. I
> recommend
> > people do that to see its outputs and verify its results.
> >
> > 
> https://gist.github.com/jzakiya/6c7e1868bd749a6b1add62e3e3b2341e
>
> That works! :)
>
> Ali

What country are you trying to get access from, because I know people in the US have gotten the papers from those link, for free and without an account. It may also have something to do with your browser. They work for me in various modern browsers, including the Tor Browser.


« First   ‹ Prev
1 2 3 4 5