Thread overview
Multi-threaded sorting of text file
Jan 04, 2020
MGW
Jan 04, 2020
Alex
Jan 05, 2020
Ali Çehreli
January 04, 2020
Need help:
There' s a large text file (hundreds of thousands of lines).
The structure is as follows:
2345|wedwededwedwedwe ......
872625|rfrferwewweww .....
23|rergrferfefer ....
................

It is necessary to sort this file by the first field having received:
23|rergrferfefer.......
2345|wedwededwedwedwe.......
872625|rfrferwewweww.......

There are also N CPU (from 4 to 8) and 16 Gb of Memory. Necessary
come up with an algorithm in D for fast sorting using multithreading.


January 04, 2020
On Saturday, 4 January 2020 at 07:51:49 UTC, MGW wrote:
> Need help:
> There' s a large text file (hundreds of thousands of lines).
> The structure is as follows:
> 2345|wedwededwedwedwe ......
> 872625|rfrferwewweww .....
> 23|rergrferfefer ....
> ................
>
> It is necessary to sort this file by the first field having received:
> 23|rergrferfefer.......
> 2345|wedwededwedwedwe.......
> 872625|rfrferwewweww.......
>
> There are also N CPU (from 4 to 8) and 16 Gb of Memory. Necessary
> come up with an algorithm in D for fast sorting using multithreading.

As far as I know, there isn't any native in D. Maybe I overlooked some at code.dlang.org. But there are plenty out there in the wild. Found this on the first shoot:
https://stackoverflow.com/questions/23531625/multithreaded-sorting-application/23532317
January 04, 2020
On 1/3/20 11:51 PM, MGW wrote:
> Need help:
> There' s a large text file (hundreds of thousands of lines).

How long are the lines? If 1K bytes, 100M would fit in memory just fine. There is a parallel quick sort example on the std.parallelism page:

  https://dlang.org/phobos/std_parallelism.html

> The structure is as follows:
> 2345|wedwededwedwedwe ......
> 872625|rfrferwewweww .....
> 23|rergrferfefer ....
> .................
> 
> It is necessary to sort this file by the first field having received:
> 23|rergrferfefer.......
> 2345|wedwededwedwedwe.......
> 872625|rfrferwewweww.......

Are you going to write the result back to a file? Then you would hardly notice any improvement from parallelism because relative slowness of I/O would determine the overall performance.

> 
> There are also N CPU (from 4 to 8) and 16 Gb of Memory. Necessary
> come up with an algorithm in D for fast sorting using multithreading.
> 
> 

Ali