December 31, 2018
On 12/31/18 8:20 AM, Atila Neves wrote:
> On Sunday, 30 December 2018 at 22:31:27 UTC, Walter Bright wrote:
>> On 12/30/2018 5:18 AM, Atila Neves wrote:
>>> On Saturday, 29 December 2018 at 09:29:30 UTC, Walter Bright wrote:
>>>> http://aras-p.info/blog/2018/12/28/Modern-C-Lamentations/
>>>>
>>>> Time to show off your leet D skilz and see how good we can do it in D!
>>>
>>> I'm on it. As I write, I'm timing (compile and run time) several C++, D, and Rust implementations and writing a blog post. I'm only on the 2nd implementation but D is winning... :)
>>>
>>> I'm going to shamelessly steal Timon's range code in this thread and the generator one posted on reddit.
>>
>> Wow! I'm looking forward to it.
> 
> Blog:
> 
> https://atilanevesoncode.wordpress.com/2018/12/31/comparing-pythagorean-triples-in-c-d-and-rust/ 
> 
> 
> Reddit:
> 
> https://www.reddit.com/r/programming/comments/ab71ag/comparing_pythagorean_triples_in_c_d_and_rust/ 
> 
> 
> 
> Hacker news something something dark side.
> 
> 

I tried the version I came up with with a dedicated looping range (https://run.dlang.io/is/hkqhvZ), cuts the range version down to 550ms (my timings are on my macbook, but my baseline was literally exactly what yours was, 154ms).

In any case, it does seem like the optimizer/compiler is much better at dealing with nested loops than 3 ranges. This is probably a case where the low level code is so fast (doing 3 multiplications, an addition and an equality check) that the loop processing becomes significant.

-Steve
December 31, 2018
On Sun, Dec 30, 2018 at 02:37:59PM -0800, Walter Bright via Digitalmars-d wrote:
> On 12/30/2018 6:27 AM, H. S. Teoh wrote:
> > Recently I noticed that LDC now compiles every function into their own section
> 
> DMD has always done that.

Then it must be doing something wrong, since running dmd with -L-gc-sections produces a 2 MB executable, but running ldc2 (without any special options) produces a 456 KB executable. Exactly the same set of source files. No dependencies on compiler-specific features in the code.

To put this more in perspective, I re-tested this with a trivial Hello World program:

	import std.stdio;
	void main() {
		writeln("Hello world");
	}

Compile this with dmd:

	$ dmd -L--gc-sections test.d
	$ ls -l test
	-rwxrwxr-x 1 hsteoh hsteoh 967416 Dec 31 07:43 test

Compile this with ldc2:

	$ ldc2 test.d
	hsteoh@crystal:/tmp$ \ls -l test
	-rwxrwxr-x 1 hsteoh hsteoh 24632 Dec 31 07:44 test

Note the order of magnitude difference in size, and that ldc2 achieves this by default, with no additional options needed.

How do you make dmd produce the same (or comparable) output?


> > and runs LTO, including GC of unreferenced sections, by default.
> 
> I've tried using the GC feature of ld, but it would produce executables that crashed.
> 
> But that was years ago, perhaps ld has improved. If someone would like to turn that on with dmd, it would be a worthwhile experiment.

Maybe it's because certain required sections need to be marked in some way so that the linker doesn't discard them?


T

-- 
The irony is that Bill Gates claims to be making a stable operating system and Linus Torvalds claims to be trying to take over the world. -- Anonymous
December 31, 2018
On 12/31/18 10:15 AM, Steven Schveighoffer wrote:
> On 12/31/18 8:20 AM, Atila Neves wrote:
>> On Sunday, 30 December 2018 at 22:31:27 UTC, Walter Bright wrote:
>>> On 12/30/2018 5:18 AM, Atila Neves wrote:
>>>> On Saturday, 29 December 2018 at 09:29:30 UTC, Walter Bright wrote:
>>>>> http://aras-p.info/blog/2018/12/28/Modern-C-Lamentations/
>>>>>
>>>>> Time to show off your leet D skilz and see how good we can do it in D!
>>>>
>>>> I'm on it. As I write, I'm timing (compile and run time) several C++, D, and Rust implementations and writing a blog post. I'm only on the 2nd implementation but D is winning... :)
>>>>
>>>> I'm going to shamelessly steal Timon's range code in this thread and the generator one posted on reddit.
>>>
>>> Wow! I'm looking forward to it.
>>
>> Blog:
>>
>> https://atilanevesoncode.wordpress.com/2018/12/31/comparing-pythagorean-triples-in-c-d-and-rust/ 
>>
>>
>> Reddit:
>>
>> https://www.reddit.com/r/programming/comments/ab71ag/comparing_pythagorean_triples_in_c_d_and_rust/ 
>>
>>
>>
>> Hacker news something something dark side.
>>
>>
> 
> I tried the version I came up with with a dedicated looping range (https://run.dlang.io/is/hkqhvZ), cuts the range version down to 550ms (my timings are on my macbook, but my baseline was literally exactly what yours was, 154ms).
> 
> In any case, it does seem like the optimizer/compiler is much better at dealing with nested loops than 3 ranges. This is probably a case where the low level code is so fast (doing 3 multiplications, an addition and an equality check) that the loop processing becomes significant.
> 

And the answer was provided by someone who examined the compiler output:

https://www.reddit.com/r/programming/comments/ab71ag/comparing_pythagorean_triples_in_c_d_and_rust/ecy6rqn/

And it fixes the problem with Timon's D version as well, new time down to 172ms (see my reply to that post).

-Steve
December 31, 2018
On 12/31/2018 7:51 AM, H. S. Teoh wrote:
> How do you make dmd produce the same (or comparable) output?

I don't know. To find out would take some time comparing the .o files.

December 31, 2018
On 12/31/2018 8:05 AM, Steven Schveighoffer wrote:
> And the answer was provided by someone who examined the compiler output:
> 
> https://www.reddit.com/r/programming/comments/ab71ag/comparing_pythagorean_triples_in_c_d_and_rust/ecy6rqn/

For reference, the reddit comment is:

"Looking at the generated code in compiler explorer, it looks like the Rust compiler is not hoisting the multiplications out of the loops, while the C++ compiler (I used clang) does. Furthermore, it seems like using `for x in y..=z` etc results in quite convoluted conditions.
This code seems to perform the same as the C++: https://godbolt.org/z/-nzALh
It looks like there's some things to fix in the rust compiler.."



> And it fixes the problem with Timon's D version as well, new time down to 172ms 


One thing to consider is optimizers are designed by looking at the code generated by popular programming languages and popular programming methods. Hence the traditional loop structure gets a heluva lot of attention. The range version is fairly new, and so it will likely do less well.

> (see my reply to that post).

Your reply is:

"Ooh, that's interesting. Same issue with the D version.
I had to work a bit on it, but this does work and is 172ms vs the ~1000ms:

  return
    recurrence!"a[n-1]+1"(1)
    .then!((z) {
       auto ztotal = z * z;
       return iota(1, z + 1).then!((x) {
           auto xtotal = x * x;
           return iota(x, z + 1)
              .filter!(y => y * y + xtotal == total)
              .map!(y => tuple(x,y,z));
              });
       });
"

Atila, can you please update the blog post?
December 31, 2018
On 12/31/18 1:43 PM, Walter Bright wrote:
> On 12/31/2018 8:05 AM, Steven Schveighoffer wrote:
>  > (see my reply to that post).
> 
> Your reply is:
> 
> "Ooh, that's interesting. Same issue with the D version.
> I had to work a bit on it, but this does work and is 172ms vs the ~1000ms:
> 
>    return
>      recurrence!"a[n-1]+1"(1)
>      .then!((z) {
>         auto ztotal = z * z;
>         return iota(1, z + 1).then!((x) {
>             auto xtotal = x * x;
>             return iota(x, z + 1)
>                .filter!(y => y * y + xtotal == total)
>                .map!(y => tuple(x,y,z));
>                });
>         });
> "
> 
> Atila, can you please update the blog post?

This isn't a fair comparison though -- I'm doing work here that in the other languages the compiler is doing (hoisting the multiplications outside the inner loops). It's not a straight port.

I just wanted to point out that this accounts for how the rust and C++ versions are faster than the D range versions. It would be good to look into why the D compilers are not seeing that optimization possibility.

-Steve
December 31, 2018
On Monday, 31 December 2018 at 13:20:35 UTC, Atila Neves wrote:
> Blog:
>
> https://atilanevesoncode.wordpress.com/2018/12/31/comparing-pythagorean-triples-in-c-d-and-rust/
>
> Reddit:
>
> https://www.reddit.com/r/programming/comments/ab71ag/comparing_pythagorean_triples_in_c_d_and_rust/
>
>
> Hacker news something something dark side.

LTO often helps optimize range code (and phobos code generally), so I tried it out on the variants in your repo. Indeed, the range variants improve, but not the others. Still doesn't bring the range version into line with other three variants.

Compile lines:
* No LTO:  ldc2 -O2
* LTO:  ldc2 -O2 -flto=thin -defaultlib=phobos2-ldc-lto,druntime-ldc-lto

LDC 1.13.0; Macbook Pro, 16GB RAM.

The runtime for simple, lambda, and generator variants came in at 120ms each for both LTO off and LTO on. The range version was 1080ms for no LTO, and 780ms with LTO.

I bumped N up to 3000 to see if more differentiation would show up. There was some, but overall the results were still very consistent with N at 1000.

--Jon

December 31, 2018
On 30.12.18 22:12, sarn wrote:
> On Sunday, 30 December 2018 at 16:51:07 UTC, Jacob Carlborg wrote:
>> On 2018-12-30 15:46, Andrei Alexandrescu wrote:
>>> On 12/29/18 9:03 PM, Timon Gehr wrote:
>>>> alias then(alias a)=(r)=>map!a(r).joiner;
>>> The "then" abstraction is pretty awesome. Thanks!
>>
>> Isn't that usually called "flatMap"?
> 
> Yeah, in Haskell monad terminology, "then" means >>, but flatMap is >>= ("bind").  So flatMap is a less confusing name for some people.

Pun intended. Welcome to D, where 'enum' means 'const', 'const' means 'readonly', 'lazy' means 'by name', 'assert' means 'assume' and 'real' does not mean 'real' (in fact, I really like the 'ireal' and 'creal' keywords, pity they are being phased out). :)

The wider context is that I have argued many times that it makes sense to put 'trivial' one-liners like this one into Phobos even if for no other reason than to standardize their names.
December 31, 2018
On 12/31/2018 2:28 PM, Timon Gehr wrote:
> Welcome to D, where 'enum' means 'const', 'const' means 'readonly', 'lazy' means 'by name', 'assert' means 'assume' and 'real' does not mean 'real' (in fact, I really like the 'ireal' and 'creal' keywords, pity they are being phased out). :)

D's "by name" are the template alias parameters.
January 01, 2019
On 2018-12-31 16:51, H. S. Teoh wrote:

> Then it must be doing something wrong, since running dmd with
> -L-gc-sections produces a 2 MB executable, but running ldc2 (without any
> special options) produces a 456 KB executable. Exactly the same set of
> source files. No dependencies on compiler-specific features in the code.
> 
> To put this more in perspective, I re-tested this with a trivial Hello
> World program:
> 
> 	import std.stdio;
> 	void main() {
> 		writeln("Hello world");
> 	}
> 
> Compile this with dmd:
> 
> 	$ dmd -L--gc-sections test.d
> 	$ ls -l test
> 	-rwxrwxr-x 1 hsteoh hsteoh 967416 Dec 31 07:43 test
> 
> Compile this with ldc2:
> 
> 	$ ldc2 test.d
> 	hsteoh@crystal:/tmp$ \ls -l test
> 	-rwxrwxr-x 1 hsteoh hsteoh 24632 Dec 31 07:44 test
> 
> Note the order of magnitude difference in size, and that ldc2 achieves
> this by default, with no additional options needed.
> 
> How do you make dmd produce the same (or comparable) output?

For me I get comparable output with DMD and LDC:

LDC: 932 KB
DMD (--gc-sections): 957 KB
DMD: 988 KB

This is running using Docker containers (which are running Ubuntu).

Funny thing, on macOS it's the opposite of your experience:

LDC: 5 MB
LDC (-dead_strip): 957 KB
DMD: 882 KB
DMD (-dead_strip): 361 KB

-- 
/Jacob Carlborg