January 12, 2013
On Fri, Jan 11, 2013 at 01:36:21PM -0800, Walter Bright wrote:
> On 1/11/2013 1:21 PM, Dmitry Olshansky wrote:
> >12-Jan-2013 00:50, Walter Bright пишет:
[...]
> >>buildTrie:
> >>
> >>     Contains a number of functions that return auto, but no mention
> >>of what is returned. While I like auto, in these cases it is not helpful, because the user needs to know what type is returned.
> >>
> >
> >Trust me you won't like it when you see it :) Part of the reason it's hidden.  But you are correct that documentation on these artifacts is *ehm* ... sketchy.  Will fix/update.
> 
> If the compiler type is hard to grok, ok, but at least the documentation should make it clear what is returned.
[...]

Would an alias be appropriate in this case, so that we can put a concrete name to the function return type? Or is that an abuse of alias?


T

-- 
Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it. -- Brian W. Kernighan
January 12, 2013
On 1/11/2013 4:14 PM, H. S. Teoh wrote:
> Would an alias be appropriate in this case, so that we can put a
> concrete name to the function return type? Or is that an abuse of alias?

Actually, that sounds like a good idea.

January 12, 2013
On Friday, 11 January 2013 at 20:57:57 UTC, Dmitry Olshansky wrote:
> You can print total counts after each bench, there is a TLS varaible written at the end of it. But anyway I like your numbers! :)

Okay, I couldn't resist having a short look at the results, specifically the benchmark of the new isSymbol implementation, where LDC beats DMD by roughly 10x. The reason for the nice performance results is mainly that LDC optimizes the classifyCall loop containing the trie lookup down to the following fairly optimal piece of code (eax is the overall counter that gets stored to lastCount):

---
  40bc90:       8b 55 00                mov    edx,DWORD PTR [rbp+0x0]
  40bc93:       89 d6                   mov    esi,edx
  40bc95:       c1 ee 0d                shr    esi,0xd
  40bc98:       40 0f b6 f6             movzx  esi,sil
  40bc9c:       0f b6 34 31             movzx  esi,BYTE PTR [rcx+rsi*1]
  40bca0:       48 83 c5 04             add    rbp,0x4
  40bca4:       0f b6 da                movzx  ebx,dl
  40bca7:       c1 e6 05                shl    esi,0x5
  40bcaa:       c1 ea 08                shr    edx,0x8
  40bcad:       83 e2 1f                and    edx,0x1f
  40bcb0:       09 f2                   or     edx,esi
  40bcb2:       41 0f b7 14 50          movzx  edx,WORD PTR [r8+rdx*2]
  40bcb7:       c1 e2 08                shl    edx,0x8
  40bcba:       09 da                   or     edx,ebx
  40bcbc:       48 c1 ea 06             shr    rdx,0x6
  40bcc0:       4c 01 ca                add    rdx,r9
  40bcc3:       48 8b 14 d1             mov    rdx,QWORD PTR [rcx+rdx*8]
  40bcc7:       48 0f a3 da             bt     rdx,rbx
  40bccb:       83 d0 00                adc    eax,0x0
  40bcce:       48 ff cf                dec    rdi
  40bcd1:       75 bd                   jne    40bc90
---

The code DMD generates for the lookup, on the other hand, is pretty ugly, including several values being spilled to the stack, and also doesn't get inlined.

This is, of course, just a microbenchmark, but it is cases like this which make me wish that we would just use LLVM (or GCC, for that matter) for the reference compiler – and I'm not talking about the slightly Frankensteinian endeavor that LDC is here. Walter, my intention is not at all to doubt your ability at a compiler writer; we all know the stories of how you used to annoy the team leads at the big companies by beating their performance numbers single-handedly, and I'm sure you could e.g. fix your backend to match the performance of the LDC-generated code for Dmitry's benchmark in no time. The question is just: Are we as a community big, resourceful enough to justify spending time on that?

Sure, there would still be things we will have to fix ourselves when using another backend, such as SEH support in LLVM. But performance will always be a central selling point of a language like D, and do we really want to take the burden of keeping up with the competition ourselves, when we can just draw on the work of full-time backend developers at Intel, AMD, Apple and others for free? Given the current developments in microprocessors and given that applications such as graphics and scientific computing are naturally a good fit for D, what's next? You taking a year off from active language development to implement an auto-vectorizer for your backend?

I know this question has been brought up before (if never really answered), and I don't want to start another futile discussion, but given the developments in the compiler/languages landscape over the last few years, it strikes me as an increasingly bad decision to stick with an obscure, poorly documented backend which nobody knows how to use – and nobody wants to learn how to use either, because, oops, they couldn't even redistribute their own work.

Let's put aside all the other arguments (most of which I didn't even mention) for a moment, even the performance aspect; I think that the productivity aspect alone, both regarding duplicated work and accessibility of the project to new developers, makes it hard to justify forging leveraging the momentum of an established backend project like LLVM. [1]

Maybe it is naïve to think that the situation could ever change for DMD. But I sincerely hope that the instant a promising self-hosted (as far as the frontend goes) compiler project shows up at the horizon, it will gain the necessary amount of official endorsement – and manpower, especially in the form of your (Walter's) expertise – to make that final, laborious stretch to release quality. If we just sit there and wait for somebody to come along with a new production-ready compiler which is better, faster and shinier than DMD, we will wait for a long, long time – this might happen for a Lisp dialect, but not for D.

Sorry for the rant, [2]
David



[1] The reasons for which I'm focusing on LLVM here are not so much its technical qualities as its liberal BSD-like license – if it is good enough for Apple, Intel (also a compiler vendor) and their lawyer teams, it is probably also for us. The code could even be integrated into commercial products such as DMC without problems.


[2] And for any typos which might undermine my credibility – it is way too early in the morning here.
January 12, 2013
On 1/11/2013 9:17 PM, David Nadlinger wrote:
> Sorry for the rant, [2]
> David

No problem. I think it's great that you're the champion for LDC. Having 3 robust compilers for D is only a win, and I don't think having 3 takes away from D at all.

The 3 compilers have different strengths and weaknesses. For example, LDC doesn't work with VS, and so would have cost us the possible design win at Remedy.
January 12, 2013
12-Jan-2013 05:31, Walter Bright пишет:
> On 1/11/2013 4:14 PM, H. S. Teoh wrote:
>> Would an alias be appropriate in this case, so that we can put a
>> concrete name to the function return type? Or is that an abuse of alias?
>
> Actually, that sounds like a good idea.
>

Except that it's created out of template parameters by wrapping them in a couple of template. So not, can't do.

-- 
Dmitry Olshansky
January 12, 2013
12-Jan-2013 09:17, David Nadlinger пишет:
> On Friday, 11 January 2013 at 20:57:57 UTC, Dmitry Olshansky wrote:
>> You can print total counts after each bench, there is a TLS varaible
>> written at the end of it. But anyway I like your numbers! :)
>
> Okay, I couldn't resist having a short look at the results, specifically
> the benchmark of the new isSymbol implementation, where LDC beats DMD by
> roughly 10x. The reason for the nice performance results is mainly that
> LDC optimizes the classifyCall loop containing the trie lookup down to
> the following fairly optimal piece of code (eax is the overall counter
> that gets stored to lastCount):

So these are legit? Coooooool!

BTW I'm having about 2-3 times better numbers on DMD 32bits with oldish AMD K10. Can you test 32bit versions also, could it be some glitch in 64bit codegen?

>
> ---
>    40bc90:       8b 55 00                mov    edx,DWORD PTR [rbp+0x0]
>    40bc93:       89 d6                   mov    esi,edx
>    40bc95:       c1 ee 0d                shr    esi,0xd
>    40bc98:       40 0f b6 f6             movzx  esi,sil
>    40bc9c:       0f b6 34 31             movzx  esi,BYTE PTR [rcx+rsi*1]
>    40bca0:       48 83 c5 04             add    rbp,0x4
>    40bca4:       0f b6 da                movzx  ebx,dl
>    40bca7:       c1 e6 05                shl    esi,0x5
>    40bcaa:       c1 ea 08                shr    edx,0x8
>    40bcad:       83 e2 1f                and    edx,0x1f
>    40bcb0:       09 f2                   or     edx,esi
>    40bcb2:       41 0f b7 14 50          movzx  edx,WORD PTR [r8+rdx*2]
>    40bcb7:       c1 e2 08                shl    edx,0x8
>    40bcba:       09 da                   or     edx,ebx
>    40bcbc:       48 c1 ea 06             shr    rdx,0x6
>    40bcc0:       4c 01 ca                add    rdx,r9
>    40bcc3:       48 8b 14 d1             mov    rdx,QWORD PTR [rcx+rdx*8]
>    40bcc7:       48 0f a3 da             bt     rdx,rbx
>    40bccb:       83 d0 00                adc    eax,0x0
>    40bcce:       48 ff cf                dec    rdi
>    40bcd1:       75 bd                   jne    40bc90
> ---

This looks quite nice indeed.

>
> The code DMD generates for the lookup, on the other hand, is pretty
> ugly, including several values being spilled to the stack, and also
> doesn't get inlined.

To be honest one of the major problems I see with DMD is a lack of principled reliable inliner. Currently it may inline or not 2 equivalent  pieces of code just because one of it has early return,  or switch statement or whatever. And it's about to time to start inlining functions with loops as it's not 90-s anymore.


> [1] The reasons for which I'm focusing on LLVM here are not so much its
> technical qualities as its liberal BSD-like license – if it is good
> enough for Apple, Intel (also a compiler vendor) and their lawyer teams,
> it is probably also for us. The code could even be integrated into
> commercial products such as DMC without problems.
>

I like LLVM, and next to everybody in industry like it. Another example is AMD. They are building their compiler infrastructure for GPUs on top of LLVM.

> [2] And for any typos which might undermine my credibility – it is way
> too early in the morning here.


-- 
Dmitry Olshansky
January 12, 2013
On Friday, 11 January 2013 at 19:31:13 UTC, Dmitry Olshansky wrote:
> It's been long over due to present the work I did during the last GSOC as it was a summer not winter of code after all. Unfortunately some compiler bugs, a new job :) and unrelated events of importance have postponed its release beyond measure.
>
> Anyway it's polished and ready for the good old collective destruction called peer review. I'm looking for a review manager.
> [SNIP]

I don't have anything to add, but I still want to voice my appreciation for your work.

Well done. I hope this makes it through.
January 12, 2013
On 2013-01-11 20:31, Dmitry Olshansky wrote:
> It's been long over due to present the work I did during the last GSOC
> as it was a summer not winter of code after all. Unfortunately some
> compiler bugs, a new job :) and unrelated events of importance have
> postponed its release beyond measure.
>

I assume that "sicmp" is faster than "icmp"? In that case you might want to mention that in the documentation.


-- 
/Jacob Carlborg
January 12, 2013
On Sat, Jan 12, 2013 at 12:20:46PM +0400, Dmitry Olshansky wrote:
> 12-Jan-2013 05:31, Walter Bright пишет:
> >On 1/11/2013 4:14 PM, H. S. Teoh wrote:
> >>Would an alias be appropriate in this case, so that we can put a concrete name to the function return type? Or is that an abuse of alias?
> >
> >Actually, that sounds like a good idea.
> >
> 
> Except that it's created out of template parameters by wrapping them in a couple of template. So not, can't do.
[...]

I was going to suggest something like:

	alias typeof(return) ConcreteName;

But then I realized it's impossible to use ConcreteName in the function signature, which makes it pointless to have the alias since it won't show up in the Ddoc. :-(

Although, maybe something like this *might* be possible (but it's sorta getting to the point of being ridiculously complex just for the sake of making ddoc display a single identifier):

	template ConcreteName(A,B,C) {
		alias ConcreteName = SomeTemplate!(A, OtherTemplate!B, ...);
	}

	ConcreteName!(A,B,C) func(A,B,C,D)(D args) {
		ConcreteName!(A,B,C) retVal = ...;
		return retVal;
	}


T

-- 
Three out of two people have difficulties with fractions. -- Dirk Eddelbuettel
January 12, 2013
12-Jan-2013 20:16, H. S. Teoh пишет:
> On Sat, Jan 12, 2013 at 12:20:46PM +0400, Dmitry Olshansky wrote:
>> 12-Jan-2013 05:31, Walter Bright пишет:
>>> On 1/11/2013 4:14 PM, H. S. Teoh wrote:
>>>> Would an alias be appropriate in this case, so that we can put a
>>>> concrete name to the function return type? Or is that an abuse of
>>>> alias?
>>>
>>> Actually, that sounds like a good idea.
>>>
>>
>> Except that it's created out of template parameters by wrapping them
>> in a couple of template. So not, can't do.
> [...]
>
> I was going to suggest something like:
>
> 	alias typeof(return) ConcreteName;
>
> But then I realized it's impossible to use ConcreteName in the function
> signature, which makes it pointless to have the alias since it won't
> show up in the Ddoc. :-(
>
> Although, maybe something like this *might* be possible (but it's sorta
> getting to the point of being ridiculously complex just for the sake of
> making ddoc display a single identifier):
>
> 	template ConcreteName(A,B,C) {
> 		alias ConcreteName = SomeTemplate!(A, OtherTemplate!B, ...);
> 	}
>
> 	ConcreteName!(A,B,C) func(A,B,C,D)(D args) {
> 		ConcreteName!(A,B,C) retVal = ...;
> 		return retVal;
> 	}

For basic needs there are 2 alias for custom Tries that map dchar -> bool and dchar -> some integer.

http://blackwhale.github.com/phobos/uni.html#CodepointSetTrie
http://blackwhale.github.com/phobos/uni.html#CodepointTrie

Both accept custom selection of bits per Trie level. The docs are not really helpful in this area yet. I'll make another pass and add more examples, glossary and so on.

-- 
Dmitry Olshansky