Thread overview
First experience with Threads
Oct 06, 2012
Era Scarecrow
Oct 06, 2012
Ali Çehreli
Oct 06, 2012
Era Scarecrow
Oct 06, 2012
Era Scarecrow
October 06, 2012
 Just a little experience and perhaps some help on the subject. This is a partial repost from another forum too so. I've always saw how much threading was an annoyance trying to follow along (the API alone) but programming it is more annoying. I've never actually done multi-thread programming so this is a first for me.


 First the problem. Trying to load up a data structure (that's fairly big) can take a fair amount of time, but if the records and structures never need to touch eachother, there's no reason they cannot be handled on separate cores/threads (or that's my logic on it anyways).


 In order to try and use more cores, I've split off the loading and unpacking stages as separate. So first off within half a second the whole memory is filled with 80Mb of data and all the records are separated. Now that they are separated, they can all be unpacked by the different cores.

 Part of a problem is when the thread activates, just because you start a thread doesn't mean it runs right away (it will run when it's ready), an any data that still relies on it via a delegate becomes a violate pointer data (At least in VisualD) and that data may change. So...

[code]
  class Record {
    //and stuff
    void loadSubRecords();
  }
  Record[] recordList; //and stuff

  foreach(rec; recordList) {
    Thread th = new Thread( () {rec.loadSubRecords()} );
    th.start();
  }
[/code]

 Rec (and even ref rec) may change at any time (Worse is during it's update or before the thread starts). So if we go with to copying an index instead it does improve a bit. So long as the data is copied before the next foreach loop it's fine, otherwise I may still change and it may do something unwanted.

[code]
  foreach(i, rec; recordList) {
    Thread th = new Thread( ()
       {
         int index = i;
         recordList[index].loadSubRecords();
       });
    th.start();
  }
[/code]

Several other combinations came up. I think I found an easy way to handle it without adding in unneeded mutexes and whatnot. What seems to work is if I pack all the data for the job I need in a structure, and have that structure start the thread (inside), then the chances of the problem happening go away (hopefully completely).

[code]
  //or something similar
  struct Packed {
    Thread thread;
    Record record;
    void run() {
      assert(record);
      thread = new Thread( (){record.loadSubRecords();} );
      thread.start();
    }
  }

  //bad way of thread handling, but makes sense.
  Packed[] obj;
  obj.length = recordList.length;

  foreach(i, rec; recordList) {
    obj[i].record = rec; //class is reference type remember
    obj[i].run(); //returns right away, but thread is running too
  }
  threads_joinAll();
[/code]

 So long as the records (and subrecords) never touch eachother then mutexes and semephores aren't needed 90% of the time.

 Now since the record count in the original file is 40k, having 40k of threads not only is dumb, but also expensive to set up. So instead I set up job groups.

[code]
  struct PackedList {
    Thread thread;
    Record[] recordList;

    void runWork() {
      foreach(rec; recordList)
        rec.loadSubRecords();
    }

    void run() {
      assert(recordList);
      thread = new Thread( (){this.runWork();} );
      thread.start();
    }
  }
[/code]

 With this basic idea, drop a thousand in one PackedList and start it, then grab another thousand and drop them into another PackedList. They'll run until their workload is done.

 Is there a suggested magic number of how many threads per core you should use? If you have say a quad core, you can have 4 threads going (obviously) but if they go to sleep waiting on system resources or something (loading a file, saving, something other), then the core may be unused. It makes sense to have 2 per core since then if it gets silent it has another it can pick up on. I'm guessing 2-4 would be the number of threads to do this type of work.
October 06, 2012
On 10/06/2012 06:17 AM, Era Scarecrow wrote:
> if the records and structures
> never need to touch eachother, there's no reason they cannot be handled
> on separate cores/threads (or that's my logic on it anyways).

Have you considered std.parallelism? If you can represent the data as a slice, then a parallel foreach loop on that data is all you need:

  foreach (data; parallel(dataSlice)) {
      // ... each data will be handled individually in parallel ...
  }

There is the following chapter about that module, which covers most of std.parallelism:

  http://ddili.org/ders/d.en/parallelism.html

Even though I have made a second pass to include the appearently-newly-added features, there are some features of std.parallelism that are missing in the chapter.

Although you don't seem to need it, there is also message passing concurrency:

  http://ddili.org/ders/d.en/concurrency.html

Ali

October 06, 2012
On Saturday, 6 October 2012 at 14:01:30 UTC, Ali Çehreli wrote:
> Have you considered std.parallelism? If you can represent the data as a slice, then a parallel foreach loop on that data is all you need:

> There is the following chapter about that module, which covers  most of std.parallelism:

 Still heavily relying on TDPL which covered concurrency and message passing and shared, but not std.parallelism. On the other hand it does look like it contains more of what I wanted.

> Even though I have made a second pass to include the apparently-newly-added features, there are some features of std.parallelism that are missing in the chapter.

> Although you don't seem to need it, there is also message passing concurrency:

 For the moment I wanted to avoid message passing and shared, as they seem more complex than they need to be for now. I'm writing a merger (for game files) and in there you have records that modify other records, and records that don't. Only records that modify other records need to (and can run) in parallel, the rest if they qualify just get added.

 So once again, thank you and I'll give it a try after I read through it.
October 06, 2012
 Well I've tried using parallel as shown and it appears to be as efficient as my own struct/job based one, which is very promising. I'll consider using it more later. Still got plenty of reading and work to do before I get there.