Jump to page: 1 28  
Page
Thread overview
Creeping Bloat in Phobos
Sep 27, 2014
Walter Bright
Sep 27, 2014
Peter Alexander
Sep 27, 2014
H. S. Teoh
Sep 28, 2014
deadalnix
Sep 27, 2014
Brad Roberts
Sep 27, 2014
Walter Bright
Sep 27, 2014
Brad Roberts
Sep 27, 2014
Walter Bright
Sep 29, 2014
Martin Nowak
Sep 29, 2014
Martin Nowak
Sep 27, 2014
David Nadlinger
Sep 27, 2014
Walter Bright
Sep 27, 2014
H. S. Teoh
Sep 27, 2014
bearophile
Sep 27, 2014
H. S. Teoh
Sep 28, 2014
Marc Schütz
Sep 28, 2014
Marco Leise
Sep 28, 2014
bearophile
Sep 28, 2014
Walter Bright
Sep 28, 2014
bearophile
Sep 28, 2014
Walter Bright
Sep 29, 2014
Marco Leise
Sep 28, 2014
Walter Bright
Sep 28, 2014
bearophile
Sep 28, 2014
Walter Bright
Sep 28, 2014
bearophile
Sep 28, 2014
Walter Bright
Sep 29, 2014
monarch_dodra
Sep 29, 2014
monarch_dodra
Sep 28, 2014
H. S. Teoh
Sep 28, 2014
Dmitry Olshansky
Sep 28, 2014
Walter Bright
Sep 29, 2014
Marco Leise
Sep 29, 2014
Dicebot
Sep 29, 2014
monarch_dodra
Sep 28, 2014
Dmitry Olshansky
Sep 28, 2014
Walter Bright
Sep 29, 2014
Dmitry Olshansky
Sep 29, 2014
Marco Leise
Sep 29, 2014
monarch_dodra
Sep 28, 2014
Walter Bright
Oct 01, 2014
Kagamin
Sep 28, 2014
Peter Alexander
Sep 28, 2014
Uranuz
Sep 28, 2014
H. S. Teoh
Sep 28, 2014
John Colvin
Sep 28, 2014
Walter Bright
Sep 28, 2014
Uranuz
Sep 28, 2014
Dmitry Olshansky
Sep 28, 2014
Uranuz
Sep 28, 2014
Dmitry Olshansky
Sep 29, 2014
ketmar
Sep 29, 2014
Marco Leise
Sep 28, 2014
Walter Bright
Sep 29, 2014
Marco Leise
Sep 27, 2014
bearophile
Sep 27, 2014
Walter Bright
Sep 27, 2014
Peter Alexander
Sep 27, 2014
Walter Bright
Sep 29, 2014
Martin Nowak
Sep 28, 2014
Dmitry Olshansky
Sep 28, 2014
Walter Bright
Sep 29, 2014
Dicebot
Sep 30, 2014
Dmitry Olshansky
September 27, 2014
From time to time, I take a break from bugs and enhancements and just look at what some piece of code is actually doing. Sometimes, I'm appalled. Phobos, for example, should be a lean and mean fighting machine:


http://www.nbcnews.com/id/38545625/ns/technology_and_science-science/t/king-tuts-chariots-were-formula-one-cars/#.VCceNmd0xjs

Instead, we have something more akin to:


http://untappedcities.com/2012/10/31/roulez-carrosses-carriages-of-versailles-arrive-in-arras/

More specifically, I looked at std.file.copy():

  https://github.com/D-Programming-Language/phobos/blob/master/std/file.d

Which is 3 lines of code:

  void copy(in char[] from, in char[] to) {
        immutable result = CopyFileW(from.tempCStringW(), to.tempCStringW(), false);
        if (!result)
            throw new FileException(to.idup);
  }

Compiling this code for Windows produces the rather awful:

_D3std4file4copyFxAaxAaZv       comdat
        assume  CS:_D3std4file4copyFxAaxAaZv
L0:             push    EBP
                mov     EBP,ESP
                mov     EDX,FS:__except_list
                push    0FFFFFFFFh
                lea     EAX,-0220h[EBP]
                push    offset _D3std4file4copyFxAaxAaZv[0106h]
                push    EDX
                mov     FS:__except_list,ESP
                sub     ESP,8
                sub     ESP,041Ch
                push    0
                push    dword ptr 0Ch[EBP]
                push    dword ptr 8[EBP]
                call    near ptr _D3std8internal7cstring21__T11tempCSÇàÆTuTaZÇìÆFNbNixAaZSÇ┬├3Res
                mov     dword ptr -4[EBP],0
                lea     EAX,-0220h[EBP]
                call    near ptr _D3std8internal7cstring21__T11tempCStringTuTaZ11tempCStringFNbNixAaZ3Res3ptrMxFNaNbNdNiNfZPxu
                push    EAX
                lea     EAX,-0430h[EBP]
                push    dword ptr 014h[EBP]
                push    dword ptr 010h[EBP]
                call    near ptr _D3std8internal7cstring21__T11tempCSÇàÆTuTaZÇìÆFNbNixAaZSÇ┬├3Res
                mov     dword ptr -4[EBP],1
                lea     EAX,-0430h[EBP]
                call    near ptr _D3std8internal7cstring21__T11tempCStringTuTaZ11tempCStringFNbNixAaZ3Res3ptrMxFNaNbNdNiNfZPxu
                push    EAX
                call    dword ptr __imp__CopyFileW@12
                mov     -01Ch[EBP],EAX
                mov     dword ptr -4[EBP],0
                call    near ptr L83
                jmp short       L8F
L83:            lea     EAX,-0220h[EBP]
                call    near ptr _D3std8internal7cstring21__T11tempCStringTuTaZ11tempCStringFNbNixAaZ3Res6__dtorMFNbNiZv
                ret
L8F:            mov     dword ptr -4[EBP],0FFFFFFFFh
                call    near ptr L9D
                jmp short       LA9
L9D:            lea     EAX,-0430h[EBP]
                call    near ptr _D3std8internal7cstring21__T11tempCStringTuTaZ11tempCStringFNbNixAaZ3Res6__dtorMFNbNiZv
                ret
LA9:            cmp     dword ptr -01Ch[EBP],0
                jne     LF3
                mov     ECX,offset FLAT:_D3std4file13FileException7__ClassZ
                push    ECX
                call    near ptr __d_newclass
                add     ESP,4
                push    dword ptr 0Ch[EBP]
                mov     -018h[EBP],EAX
                push    dword ptr 8[EBP]
                call    near ptr _D6object12__T4idupTxaZ4idupFNaNbNdNfAxaZAya
                push    EDX
                push    EAX
                call    dword ptr __imp__GetLastError@0
                push    EAX
                push    dword ptr _D3std4file13FileException6__vtblZ[02Ch]
                push    dword ptr _D3std4file13FileException6__vtblZ[028h]
                push    095Dh
                mov     EAX,-018h[EBP]
                call    near ptr _D3std4file13FileException6__ctorMFNfxAakAyakZC3std4file13FileException
                push    EAX
                call    near ptr __d_throwc
LF3:            mov     ECX,-0Ch[EBP]
                mov     FS:__except_list,ECX
                mov     ESP,EBP
                pop     EBP
                ret     010h
                mov     EAX,offset FLAT:_D3std4file13FileException6__vtblZ[0310h]
                jmp     near ptr __d_framehandler

which is TWICE as much generated code as for D1's copy(), which does the same thing. No, it is not because D2's compiler sux. It's because it has become encrustified with gee-gaws, jewels, decorations, and other crap.

To scrape the barnacles off, I've filed:

https://issues.dlang.org/show_bug.cgi?id=13541
https://issues.dlang.org/show_bug.cgi?id=13542
https://issues.dlang.org/show_bug.cgi?id=13543
https://issues.dlang.org/show_bug.cgi?id=13544

I'm sure there's much more in std.file (and elsewhere) that can be done. Guys, when developing Phobos/Druntime code, please look at the assembler once in a while and see what is being wrought. You may be appalled, too.



September 27, 2014
On Saturday, 27 September 2014 at 20:57:53 UTC, Walter Bright wrote:
> From time to time, I take a break from bugs and enhancements and just look at what some piece of code is actually doing. Sometimes, I'm appalled.

Me too, and yes it can be appalling. It's pretty bad for even simple range chains, e.g.

import std.algorithm, std.stdio;
int main(string[] args) {
  return cast(int)args.map!("a.length").reduce!"a+b"();
}

Here's what LDC produces (with -O -inline -release -noboundscheck)

__Dmain:
0000000100001480	pushq	%r15
0000000100001482	pushq	%r14
0000000100001484	pushq	%rbx
0000000100001485	movq	%rsi, %rbx
0000000100001488	movq	%rdi, %r14
000000010000148b	callq	0x10006df10 ## symbol stub for: __D3std5array14__T5emptyTAyaZ5emptyFNaNbNdNfxAAyaZb
0000000100001490	xorb	$0x1, %al
0000000100001492	movzbl	%al, %r9d
0000000100001496	leaq	_.str12(%rip), %rdx ## literal pool for: "/Users/pja/ldc2-0.14.0-osx-x86_64/bin/../import/std/algorithm.d"
000000010000149d	movq	0xcbd2c(%rip), %r8 ## literal pool symbol address: __D3std9algorithm24__T6reduceVAyaa3_612b62Z124__T6reduceTS3std9algorithm85__T9MapResultS633std10functional36__T8unaryFunVAyaa8_612e6c656e677468Z8unaryFunTAAyaZ9MapResultZ6reduceFNaNfS3std9algorithm85__T
00000001000014a4	movl	$0x2dd, %edi
00000001000014a9	movl	$0x3f, %esi
00000001000014ae	xorl	%ecx, %ecx
00000001000014b0	callq	0x10006e0a2 ## symbol stub for: __D3std9exception14__T7enforceTbZ7enforceFNaNfbLAxaAyamZb
00000001000014b5	movq	(%rbx), %r15
00000001000014b8	leaq	0x10(%rbx), %rsi
00000001000014bc	leaq	-0x1(%r14), %rdi
00000001000014c0	callq	0x10006df10 ## symbol stub for: __D3std5array14__T5emptyTAyaZ5emptyFNaNbNdNfxAAyaZb
00000001000014c5	testb	$0x1, %al
00000001000014c7	jne	0x1000014fa
00000001000014c9	addq	$-0x2, %r14
00000001000014cd	addq	$0x20, %rbx
00000001000014d1	nopw	%cs:(%rax,%rax)
00000001000014e0	addq	-0x10(%rbx), %r15
00000001000014e4	movq	%r14, %rdi
00000001000014e7	movq	%rbx, %rsi
00000001000014ea	callq	0x10006df10 ## symbol stub for: __D3std5array14__T5emptyTAyaZ5emptyFNaNbNdNfxAAyaZb
00000001000014ef	decq	%r14
00000001000014f2	addq	$0x10, %rbx
00000001000014f6	testb	$0x1, %al
00000001000014f8	je	0x1000014e0
00000001000014fa	movl	%r15d, %eax
00000001000014fd	popq	%rbx
00000001000014fe	popq	%r14
0000000100001500	popq	%r15
0000000100001502	ret

and for:

import std.algorithm, std.stdio;
int main(string[] args) {
  int r = 0;
  foreach (i; 0..args.length)
    r += args[i].length;
  return r;
}

__Dmain:
00000001000015c0	xorl	%eax, %eax
00000001000015c2	testq	%rdi, %rdi
00000001000015c5	je	0x1000015de
00000001000015c7	nopw	(%rax,%rax)
00000001000015d0	movl	%eax, %eax
00000001000015d2	addq	(%rsi), %rax
00000001000015d5	addq	$0x10, %rsi
00000001000015d9	decq	%rdi
00000001000015dc	jne	0x1000015d0
00000001000015de	ret

(and sorry, don't even bother looking at what dmd does...)

I'm not complaining about LDC here (although I'm surprised array.empty isn't inlined). The way ranges are formulated make them difficult to optimize. I think there's things we can do here in the library. Maybe I'll write up something about that at some point.

I think the takeaway here is that people should be aware of (a) what kind of instructions their code is generating, (b) what kind of instructions their code SHOULD be generating, and (c) what is practically possible for present-day compilers. Like you say, it helps to look at the assembled code once in a while to get a feel for this kind of thing. Modern compilers are good, but they aren't magic.

September 27, 2014
On Sat, Sep 27, 2014 at 09:59:17PM +0000, Peter Alexander via Digitalmars-d wrote:
> On Saturday, 27 September 2014 at 20:57:53 UTC, Walter Bright wrote:
> >From time to time, I take a break from bugs and enhancements and just look at what some piece of code is actually doing. Sometimes, I'm appalled.
> 
> Me too, and yes it can be appalling. It's pretty bad for even simple range chains, e.g.
> 
> import std.algorithm, std.stdio;
> int main(string[] args) {
>   return cast(int)args.map!("a.length").reduce!"a+b"();
> }

I vaguely recall somebody mentioning a while back that range-based code is poorly optimized because compilers weren't designed to recognize such patterns before. I wonder if there are ways for the compiler to recognize range primitives and apply special optimizations to them.

I do find, though, that gdc -O3 generally tends to do a pretty good job of reducing range-based code to near-minimal assembly. Sadly, dmd is changing too fast for gdc releases to catch up with the latest and greatest, so I haven't been using gdc very much recently. :-(


T

-- 
If Java had true garbage collection, most programs would delete themselves upon execution. -- Robert Sewell
September 27, 2014
What we're seeing here is pretty much the same problem that early c++ suffered from: abstraction penalty.  It took years of work to help overcome it, both from the compiler and the library.  Not having trivial functions inlined and optimized down through standard techniques like dead store elimination, value range propagation, various loop restructurings, etc means that code will look like what Walter and you have shown.  Given DMD's relatively weak inliner, I'm not shocked by Walter's example.  I am curious why ldc failed to inline those functions.

On 9/27/2014 2:59 PM, Peter Alexander via Digitalmars-d wrote:
> On Saturday, 27 September 2014 at 20:57:53 UTC, Walter Bright wrote:
>> From time to time, I take a break from bugs and enhancements and just
>> look at what some piece of code is actually doing. Sometimes, I'm
>> appalled.
>
> Me too, and yes it can be appalling. It's pretty bad for even simple
> range chains, e.g.
>
> import std.algorithm, std.stdio;
> int main(string[] args) {
>    return cast(int)args.map!("a.length").reduce!"a+b"();
> }
>
> Here's what LDC produces (with -O -inline -release -noboundscheck)
>
> __Dmain:
> 0000000100001480    pushq    %r15
> 0000000100001482    pushq    %r14
> 0000000100001484    pushq    %rbx
> 0000000100001485    movq    %rsi, %rbx
> 0000000100001488    movq    %rdi, %r14
> 000000010000148b    callq    0x10006df10 ## symbol stub for:
> __D3std5array14__T5emptyTAyaZ5emptyFNaNbNdNfxAAyaZb
> 0000000100001490    xorb    $0x1, %al
> 0000000100001492    movzbl    %al, %r9d
> 0000000100001496    leaq    _.str12(%rip), %rdx ## literal pool for:
> "/Users/pja/ldc2-0.14.0-osx-x86_64/bin/../import/std/algorithm.d"
> 000000010000149d    movq    0xcbd2c(%rip), %r8 ## literal pool symbol
> address:
> __D3std9algorithm24__T6reduceVAyaa3_612b62Z124__T6reduceTS3std9algorithm85__T9MapResultS633std10functional36__T8unaryFunVAyaa8_612e6c656e677468Z8unaryFunTAAyaZ9MapResultZ6reduceFNaNfS3std9algorithm85__T
>
> 00000001000014a4    movl    $0x2dd, %edi
> 00000001000014a9    movl    $0x3f, %esi
> 00000001000014ae    xorl    %ecx, %ecx
> 00000001000014b0    callq    0x10006e0a2 ## symbol stub for:
> __D3std9exception14__T7enforceTbZ7enforceFNaNfbLAxaAyamZb
> 00000001000014b5    movq    (%rbx), %r15
> 00000001000014b8    leaq    0x10(%rbx), %rsi
> 00000001000014bc    leaq    -0x1(%r14), %rdi
> 00000001000014c0    callq    0x10006df10 ## symbol stub for:
> __D3std5array14__T5emptyTAyaZ5emptyFNaNbNdNfxAAyaZb
> 00000001000014c5    testb    $0x1, %al
> 00000001000014c7    jne    0x1000014fa
> 00000001000014c9    addq    $-0x2, %r14
> 00000001000014cd    addq    $0x20, %rbx
> 00000001000014d1    nopw    %cs:(%rax,%rax)
> 00000001000014e0    addq    -0x10(%rbx), %r15
> 00000001000014e4    movq    %r14, %rdi
> 00000001000014e7    movq    %rbx, %rsi
> 00000001000014ea    callq    0x10006df10 ## symbol stub for:
> __D3std5array14__T5emptyTAyaZ5emptyFNaNbNdNfxAAyaZb
> 00000001000014ef    decq    %r14
> 00000001000014f2    addq    $0x10, %rbx
> 00000001000014f6    testb    $0x1, %al
> 00000001000014f8    je    0x1000014e0
> 00000001000014fa    movl    %r15d, %eax
> 00000001000014fd    popq    %rbx
> 00000001000014fe    popq    %r14
> 0000000100001500    popq    %r15
> 0000000100001502    ret
>
> and for:
>
> import std.algorithm, std.stdio;
> int main(string[] args) {
>    int r = 0;
>    foreach (i; 0..args.length)
>      r += args[i].length;
>    return r;
> }
>
> __Dmain:
> 00000001000015c0    xorl    %eax, %eax
> 00000001000015c2    testq    %rdi, %rdi
> 00000001000015c5    je    0x1000015de
> 00000001000015c7    nopw    (%rax,%rax)
> 00000001000015d0    movl    %eax, %eax
> 00000001000015d2    addq    (%rsi), %rax
> 00000001000015d5    addq    $0x10, %rsi
> 00000001000015d9    decq    %rdi
> 00000001000015dc    jne    0x1000015d0
> 00000001000015de    ret
>
> (and sorry, don't even bother looking at what dmd does...)
>
> I'm not complaining about LDC here (although I'm surprised array.empty
> isn't inlined). The way ranges are formulated make them difficult to
> optimize. I think there's things we can do here in the library. Maybe
> I'll write up something about that at some point.
>
> I think the takeaway here is that people should be aware of (a) what
> kind of instructions their code is generating, (b) what kind of
> instructions their code SHOULD be generating, and (c) what is
> practically possible for present-day compilers. Like you say, it helps
> to look at the assembled code once in a while to get a feel for this
> kind of thing. Modern compilers are good, but they aren't magic.
September 27, 2014
On 9/27/2014 2:59 PM, Peter Alexander wrote:
> On Saturday, 27 September 2014 at 20:57:53 UTC, Walter Bright wrote:
>> From time to time, I take a break from bugs and enhancements and just look at
>> what some piece of code is actually doing. Sometimes, I'm appalled.
>
> Me too, and yes it can be appalling. It's pretty bad for even simple range
> chains, e.g.
>
> import std.algorithm, std.stdio;
> int main(string[] args) {
>    return cast(int)args.map!("a.length").reduce!"a+b"();
> }
>
> Here's what LDC produces (with -O -inline -release -noboundscheck)

Part of this particular case problem is not a compiler optimizer weakness, but that autodecode problem I've been throwing (!) chairs through windows on.

September 27, 2014
On Sat, Sep 27, 2014 at 03:26:35PM -0700, Walter Bright via Digitalmars-d wrote:
> On 9/27/2014 2:59 PM, Peter Alexander wrote:
> >On Saturday, 27 September 2014 at 20:57:53 UTC, Walter Bright wrote:
> >>From time to time, I take a break from bugs and enhancements and just look at what some piece of code is actually doing. Sometimes, I'm appalled.
> >
> >Me too, and yes it can be appalling. It's pretty bad for even simple range chains, e.g.
> >
> >import std.algorithm, std.stdio;
> >int main(string[] args) {
> >   return cast(int)args.map!("a.length").reduce!"a+b"();
> >}
> >
> >Here's what LDC produces (with -O -inline -release -noboundscheck)
> 
> Part of this particular case problem is not a compiler optimizer weakness, but that autodecode problem I've been throwing (!) chairs through windows on.

If we can get Andrei on board, I'm all for killing off autodecoding.


T

-- 
MAS = Mana Ada Sistem?
September 27, 2014
Walter Bright:

>> import std.algorithm, std.stdio;
>> int main(string[] args) {
>>   return cast(int)args.map!("a.length").reduce!"a+b"();
>> }
>>
>> Here's what LDC produces (with -O -inline -release -noboundscheck)
>
> Part of this particular case problem is not a compiler optimizer weakness, but that autodecode problem I've been throwing (!) chairs through windows on.

There is no char auto decoding in this program, right?

Note: in Phobos now we have std.algorithm.sum, that is better than reduce!"a+b"().

Bye,
bearophile
September 27, 2014
On 9/27/2014 3:26 PM, Brad Roberts via Digitalmars-d wrote:
> What we're seeing here is pretty much the same problem that early c++ suffered
> from: abstraction penalty.  It took years of work to help overcome it, both from
> the compiler and the library.  Not having trivial functions inlined and
> optimized down through standard techniques like dead store elimination, value
> range propagation, various loop restructurings, etc means that code will look
> like what Walter and you have shown.  Given DMD's relatively weak inliner, I'm
> not shocked by Walter's example.  I am curious why ldc failed to inline those
> functions.

Again, this accumulation of barnacles is not a failure of the optimizer. It's a failure of adding gee-gaws to the source code without checking their effect.

September 27, 2014
H. S. Teoh:

> If we can get Andrei on board, I'm all for killing off autodecoding.

Killing auto-decoding for std.algorithm functions will break most of my D2 code... perhaps we can do that in a D3 language.

Bye,
bearophile
September 27, 2014
On 9/27/2014 3:54 PM, Walter Bright via Digitalmars-d wrote:
> On 9/27/2014 3:26 PM, Brad Roberts via Digitalmars-d wrote:
>> What we're seeing here is pretty much the same problem that early c++
>> suffered
>> from: abstraction penalty.  It took years of work to help overcome it,
>> both from
>> the compiler and the library.  Not having trivial functions inlined and
>> optimized down through standard techniques like dead store
>> elimination, value
>> range propagation, various loop restructurings, etc means that code
>> will look
>> like what Walter and you have shown.  Given DMD's relatively weak
>> inliner, I'm
>> not shocked by Walter's example.  I am curious why ldc failed to
>> inline those
>> functions.
>
> Again, this accumulation of barnacles is not a failure of the optimizer.
> It's a failure of adding gee-gaws to the source code without checking
> their effect.

Look at Peter's example, it's better for this, I believe.  Why isn't empty being inlined?  That's a tiny little function with a lot of impact.

Of course there's more than just optimization, but it's a big player in the game too.
« First   ‹ Prev
1 2 3 4 5 6 7 8