May 27, 2009
bearophile, el 27 de mayo a las 15:55 me escribiste:
> Christian Kamm:
> > The release 0.9.1 of LDC, the LLVM based compiler for the D programming language, contains the following major improvements:
> 
> Very good. I'll try this too.
> 
> 
> >    * turn GC allocations to allocas if possible
> 
> Phobos1 of DMD too has alloca, so can this optimization be folded in DMD too?

I don't think so, they are D specific LLVM optimization passes, see: http://www.dsource.org/projects/ldc/browser/gen/passes

> >    * simplify or remove certain calls to D runtime functions
> 
> What calls?

From what I saw in: http://www.dsource.org/projects/ldc/browser/gen/passes/SimplifyDRuntimeCalls.cpp

- Remove libcall for arr.length = N if N <= arr.length
- Remove libcall for cast(T[]) arr if it's safe to do so.
- Remove libcall if the return value is unused.
- Turn slice copies into llvm.memcpy when safe


-- 
Leandro Lucarella (luca) | Blog colectivo: http://www.mazziblog.com.ar/blog/
----------------------------------------------------------------------------
GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145  104C 949E BFB6 5F5A 8D05)
----------------------------------------------------------------------------
May 27, 2009
bearophile wrote:
> Christian Kamm:
>> The release 0.9.1 of LDC, the LLVM based compiler for the D programming language, contains the following major improvements:
> 
> Very good. I'll try this too.
> 
> 
>>    * turn GC allocations to allocas if possible
> 
> Phobos1 of DMD too has alloca, so can this optimization be folded in DMD too?

These optimizations are done on LLVM IR, so unless DMD switches to LLVM they'll need to be completely reimplemented for DMD.

This one is only done for certain GC allocations by the way, not all of them. The ones currently implemented are:
 * new Struct/int/float/etc.,
 * uninitialized arrays (used for arr1 ~ arr2, for instance),
 * zero-initialized arrays (e.g. new int[N])
 * new Class, unless
   a) it has a destructor,
   b) it has a custom allocator (overloads new), or
   c) it has a custom deallocator (overloads delete).

>>    * simplify or remove certain calls to D runtime functions
> 
> What calls?

Besides the GC calls turned into allocas, these are currently implemented:
 * For "arr.length = N", the runtime call is deleted if
   a) the resulting .ptr is unused,
   b) N == 0, or
   c) both arr.length and N are constant integers, and N <= arr.length.
      I'd like to extend this one to a more general N <= arr.length (i.e. not just constant integers), but I'm not quite sure what the best way to implement it is yet.

 * For "cast(T[]) some_array", the compiler generates a runtime call to compute the new array length. This call will be optimized out if
   a) the old length was 0,
   b) T.sizeof == U.sizeof, where U is the element type of some_array, or
   c) T.sizeof % U.sizeof == 0

 * arr1[] = arr2[] is turned into a memcpy() if alias analysis can prove they don't overlap. (Only relevant if assertions or array bounds checks are enabled, as they both are by default, otherwise they're turned into memcpy() in the first place)

 * Runtime calls that don't do anything but allocate GC memory (and optionally initialize it) are deleted if their return value is unused.
May 27, 2009
First of all: Thanks for your your great work.

Unfortunately I still have the debugging problem already described in the debugger group (using the x86-64 build on Ubuntu 9.04).

I compile a programm using

ldmd -g -debug test.d

Then I try to debug using gdb with the D patches:

(gdb) list
1	../sysdeps/x86_64/elf/start.S: No such file or directory.
	in ../sysdeps/x86_64/elf/start.S


Any hints on this?

Thanks and best regards,
Timo
May 27, 2009
Frits van Bommel:

Thank you for your answers.

> This one is only done for certain GC allocations by the way, not all of them.
> The ones currently implemented are:
>   * new Struct/int/float/etc.,
>   * uninitialized arrays (used for arr1 ~ arr2, for instance),
>   * zero-initialized arrays (e.g. new int[N])
>   * new Class, unless
>     a) it has a destructor,
>     b) it has a custom allocator (overloads new), or
>     c) it has a custom deallocator (overloads delete).

I'm trying to find situations where that's true, but in two small programs that use both structs and classes (that don't escape the scope and follow your unless list) I see:

call	_d_allocmemoryT
call _d_allocclass
Are those calls to variants of alloca()?

While looking for those alloca I have also tested code that has the following two lines one after the other:
    auto a = new int[1000];
    a[] = 2;

That code is very common, because you currently can't write:
    auto a = new int[1000] = 2;

The latest LDC compiles that as:

	pushl	%esi
	subl	$4016, %esp
	leal	16(%esp), %esi
	movl	%esi, (%esp)
	movl	$4000, 8(%esp)
	movl	$0, 4(%esp)
	call	memset
	movl	%esi, (%esp)
	movl	$2, 8(%esp)
	movl	$1000, 4(%esp)
	call	_d_array_init_i32

I think the memset may be avoided.

Bye and thank you,
bearophile
May 27, 2009
Great news, happy to see x86_64 on the feature list :)

May 27, 2009
Robert Clipsham wrote:
...
> I actually thought about giving this a try about a week ago. I managed to get all but one file to compile with the current version of the D2 front end (2.021 I believe)... Not bad considering my (lack of) C/C++ knowledge. If there's some interest for it I suppose I could give it another shot and try to merge the latest front end, it'd be a good chance for me to finally learn C/C++ (I made the <del>mistake</del> decision to learn D first... I found it too awesome to bother learning the obviously inferior C/C++ :P).

I'm interested, for sure.


May 27, 2009
bearophile wrote:
> Frits van Bommel:
> 
> Thank you for your answers.
> 
>> This one is only done for certain GC allocations by the way, not all of them. The ones currently implemented are:
>>   * new Struct/int/float/etc.,
>>   * uninitialized arrays (used for arr1 ~ arr2, for instance),
>>   * zero-initialized arrays (e.g. new int[N])
>>   * new Class, unless
>>     a) it has a destructor,
>>     b) it has a custom allocator (overloads new), or
>>     c) it has a custom deallocator (overloads delete).
> 
> I'm trying to find situations where that's true, but in two small programs that use both structs and classes (that don't escape the scope and follow your unless list) I see:
> 
> call	_d_allocmemoryT
> call _d_allocclass
> Are those calls to variants of alloca()?

No, those are GC allocations.

This small program contains no gc allocations with ldc -O3:
-----
struct Struct {
    int i, j = 4;
}

class Class {
    int i, j = 6;
}

int frob(T)(T t) {
    t.i = 4;
    return t.j;
}

int withStruct() {
    return frob(new Struct);
}

int withClass() {
    return frob(new Class);
}
-----

It does still contain them when inlining is disabled, as it is by default with -O2 (aka -O); this seems to be because the LLVM pass that adds parameter attributes (like nocapture, better known as 'scope' in these newsgroups) is missing from the default list of optimizations :(. I'll fix this in the repository soon.

Another constraint I forgot to mention: it doesn't work for allocations in loops, because it's tricky to figure out whether the allocation is still reachable when the loop reaches the same position again.
(For this reason, the pass by default runs before each inliner run and once after all inlining is done since the inliner can inline code into loops, yet allows for simplifications that make escape analysis more accurate)

> While looking for those alloca I have also tested code that has the following two lines one after the other:
>     auto a = new int[1000];
>     a[] = 2;
> 
> That code is very common, because you currently can't write:
>     auto a = new int[1000] = 2;
> 
> The latest LDC compiles that as:
> 
> 	pushl	%esi
> 	subl	$4016, %esp
> 	leal	16(%esp), %esi
> 	movl	%esi, (%esp)
> 	movl	$4000, 8(%esp)
> 	movl	$0, 4(%esp)
> 	call	memset
> 	movl	%esi, (%esp)
> 	movl	$2, 8(%esp)
> 	movl	$1000, 4(%esp)
> 	call	_d_array_init_i32
> 
> I think the memset may be avoided.

That's trickier to get right, because the optimizer would have to look ahead to see the new memset call is always followed by the initialization, with no reads in between.
The 1-byte element case can probably be handled by LLVM if _d_array_init_i8 is replaced by another memset, though. (and similarly, _d_array_init_i16 could be handled for cases like 0xFFFF, but not 0x1234, by turning it into memset).
May 28, 2009
Frits van Bommel:

Thanks for your answers (and your work).

> That's trickier to get right, because the optimizer would have to look ahead to see the new memset call is always followed by the initialization, with no reads in between.

I understand, and that may be more complex to do if the code is multi-thread. It's not a very important optimization, because most times the initialization time is a small percentage of the time spent to process the array. But it may deserve to receive a look as a medium-low-priority optimization anyway.

Bye,
bearophile
May 28, 2009
On Wed, May 27, 2009 at 9:32 PM, Frits van Bommel <fvbommel@remwovexcapss.nl> wrote:
> Robert Clipsham wrote:
>>
>> Andrei Alexandrescu wrote:
>>>
>>> Great! Quick question - are you supporting or considering supporting D2? I looked on the project's home page and couldn't find that information. There's mention of druntime for D2 which makes me hopeful.
>>>
>>> Thanks,
>>>
>>> Andrei
>>
>> D2 support in LDC has been started, but it does not have any of D2's features implemented. I believe all that has been done is the frontend added. Last time I checked ldc + d2 would not compile.
>>
>> I believe that the plan is to make D1 solid (more so than DMD), before moving on to D2. One of the developers might be able to give a more accurate status.
>
> That's pretty accurate, actually.
>
> A while back somebody sent in a patch to get D2 support working, but by the time it got properly reviewed it was already outdated...
>
> The main problem on this front is that nobody currently working on LDC is particularly interested in D2.
>

The way the frontend is developed is also a factor.

Right now we have two frontend source dirs. One for D1 and one for D2. Every time a new DMD release is out, this means: cleaning out the files we don't need (or can't use) in LDC, merging the changes into LDC, repeating for D2.

But merging isn't always trivial. Since we have quite a lot of changes and conflicts are common.

Also often LDC bugreports are frontend related, and both D1 and D2 need identical fixes applied, this is often forgotten.

So, for now we're focusing on D1 and unfortunately D2 has fallen behind to a point where it's just plain broken.

I would like to see more improvements to the way the DMD frontend is developed - regarding other users than the DMD compiler - before picking up D2 again. More specifically:

* Seperation of frontend and backend needs to be more clear. The file structure is one thing, but it would be nice if the DMD backend specific parts in the frontend code could be versioned out for DMD. A simple ` #if IN_DMD ' would be a big improvement.

* put the code under version control, that could simplify pulling fixes into our tree.

I was going to put a few more entries on this list, but I could probably provide some patches there instead of just asking for things to happen .. But the main point is, D1 support is more than enough work already, however these two simple improvements would be a big help, and will benefit others who wish to use the frontend in the future as well.

-Tomas
May 28, 2009
Robert Clipsham Wrote:

> Not bad considering my (lack of) C/C++ knowledge. If there's some interest for it I suppose I could give it another shot and try to merge the latest front end, it'd be a good chance for me to finally learn C/C++ (I made the <del>mistake</del> decision to learn D first... I found it too awesome to bother learning the obviously inferior C/C++ :P).

I found recently that properly designed C++ code can live happily without all that esoteric macro/template crap and can be pretty readable and understandable even using nasty antipatterns. This being achieved simply by using C++ subset that is supported on various platforms. Code that does the job instead of casting black magic.