Thread overview
From C++14 and Java 1.8
Apr 21, 2013
bearophile
Apr 21, 2013
dsimcha
Apr 21, 2013
bearophile
Apr 21, 2013
dsimcha
Apr 22, 2013
Martin Nowak
Apr 21, 2013
bearophile
April 21, 2013
Through Reddit I've seen this interesting report about C++14:

http://isocpp.org/blog/2013/04/trip-report-iso-c-spring-2013-meeting


Variable templates: sometimes in D I've felt a little desire for a shorter syntax, a templated alias:

alias Foo(T) = Bar!(T, "red");

But in the end I think this that syntax sugar isn't so necessary.

- - - - - - - - -

Dynamic Arrays: they replace the variable-sized stack-allocated arrays of C99. I'd like something like this in D2 (Issue 9832) with a better integration with the type system.

- - - - - - - - -

optional<T>: see the little maybeTo helper function I have suggested in
http://d.puremagic.com/issues/show_bug.cgi?id=6840

-----------------------------------

From another thread about the future Java 1.8 I've seen Java parallelSort:

http://blog.sanaulla.info/2013/04/08/arrays-sort-versus-arrays-parallelsort/


That blog post explains quickly how it works:

<<
Arrays#parallelSort uses Fork/Join framework introduced in Java 7 to assign the sorting tasks to multiple threads available in the thread pool. This is called eating your own dog food. Fork/Join implements a work stealing algorithm where in a idle thread can steal tasks queued up in another thread.
An overview of Arrays#parallelSort:

The method uses a threshold value and any array of size lesser than the threshold value is sorted using the Arrays#sort() API (i.e sequential sorting). And the threshold is calculated considering the parallelism of the machine, size of the array and is calculated as:

private static final int getSplitThreshold(int n) {
  int p = ForkJoinPool.getCommonPoolParallelism();
  int t = (p > 1) ? (1 + n / (p << 3)) : n;
  return t < MIN_ARRAY_SORT_GRAN ? MIN_ARRAY_SORT_GRAN : t;
}

Once its decided whether to sort the array in parallel or in serial, its now to decide how to divide the array in to multiple parts and then assign each part to a Fork/Join task which will take care of sorting it and then another Fork/Join task which will take care of merging the sorted arrays. The implementation in JDK 8 uses this approach:
- Divide the array into 4 parts.
- Sort the first two parts and then merge them.
- Sort the next two parts and then merge them.
And the above steps are repeated recursively with each part until the size of the part to sort is not lesser than the threshold value calculated above.
>>


I think it's worth adding something similar as strategy of std.algorithm.sort.

Bye,
bearophile
April 21, 2013
On Sunday, 21 April 2013 at 12:08:54 UTC, bearophile wrote:
> Arrays#parallelSort uses Fork/Join framework introduced in Java 7 to assign the sorting tasks to multiple threads available in the thread pool. This is called eating your own dog food. Fork/Join implements a work stealing algorithm where in a idle thread can steal tasks queued up in another thread.
> An overview of Arrays#parallelSort:
>
> The method uses a threshold value and any array of size lesser than the threshold value is sorted using the Arrays#sort() API (i.e sequential sorting). And the threshold is calculated considering the parallelism of the machine, size of the array and is calculated as:
>
> private static final int getSplitThreshold(int n) {
>   int p = ForkJoinPool.getCommonPoolParallelism();
>   int t = (p > 1) ? (1 + n / (p << 3)) : n;
>   return t < MIN_ARRAY_SORT_GRAN ? MIN_ARRAY_SORT_GRAN : t;
> }
>
> Once its decided whether to sort the array in parallel or in serial, its now to decide how to divide the array in to multiple parts and then assign each part to a Fork/Join task which will take care of sorting it and then another Fork/Join task which will take care of merging the sorted arrays. The implementation in JDK 8 uses this approach:
> - Divide the array into 4 parts.
> - Sort the first two parts and then merge them.
> - Sort the next two parts and then merge them.
> And the above steps are repeated recursively with each part until the size of the part to sort is not lesser than the threshold value calculated above.
>>>
>
>
> I think it's worth adding something similar as strategy of std.algorithm.sort.

FWIW, I created a parallel sort in D a while back using std.parallelism.  It was part of std.parallel_algorithm, a half-finished project that I abandoned because I was disappointed at how poorly most of it was scaling in practice, probably due to memory bandwidth.  If you have some expensive-to-compare types, though, it may be worthwhile.

https://github.com/dsimcha/parallel_algorithm/blob/master/parallel_algorithm.d
April 21, 2013
dsimcha:

> I abandoned because I was disappointed at how poorly most of it was scaling in practice, probably due to memory bandwidth.

Then do you know why the Java version seems to be advantageous (with four cores)?

Bye,
bearophile
April 21, 2013
On Sunday, 21 April 2013 at 13:30:32 UTC, bearophile wrote:
> dsimcha:
>
>> I abandoned because I was disappointed at how poorly most of it was scaling in practice, probably due to memory bandwidth.
>
> Then do you know why the Java version seems to be advantageous (with four cores)?
>
> Bye,
> bearophile

I don't know Java very well, but possiblities include:

1.  Sorting using a virtual or otherwise non-inlined comparison function.  This makes the sorting require much more CPU time but not a lot more memory bandwidth.  It does beg the question, though, of why the comparison function isn't inlined, especially since modern JITs can sometimes inline virtual functions.

2.  Different hardware than I tested on, maybe with better memory bandwidth.

3.  Expensive comparison functions.  I didn't test this in D either because I couldn't think of a good use case.  I tested the D parallel sort using small primitive types (ints and floats and stuff).
April 21, 2013
> Dynamic Arrays: they replace the variable-sized stack-allocated arrays of C99. I'd like something like this in D2 (Issue 9832) with a better integration with the type system.

With some more changes:

http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20130415/078426.html

Bye,
bearophile
April 22, 2013
On 04/21/2013 04:46 PM, dsimcha wrote:
> 2.  Different hardware than I tested on, maybe with better memory
> bandwidth.

Your implementation performs a lot of copying. Maybe an in-place parallel sort algorithm would perform better, e.g. parallel quicksort.


> 3.  Expensive comparison functions.  I didn't test this in D either
> because I couldn't think of a good use case.  I tested the D parallel
> sort using small primitive types (ints and floats and stuff).

String sorting is a good use case with slightly higher comparison cost.