June 24, 2013 Re: top time wasters in DMD, as reported by gprof | ||||
---|---|---|---|---|
| ||||
Posted in reply to Richard Webb | Am 24.06.2013 18:15, schrieb Richard Webb:
> On Monday, 24 June 2013 at 15:04:37 UTC, dennis luehring wrote:
>>
>> i know - my question was - how does that look using msvc...
>>
>
>
> I just did a very quick test using the latest DMD source:
>
> Using the command line
>
> -release -unittest -c
> D:\DTesting\dmd.2.063\src\phobos\std\algorithm.d
>
>
> DMD built with DMC takes ~49 seconds to complete, but DMD build
> with VC2008 only takes ~12 seconds. (Need to get a proper VC
> build done to test it properly).
> Looks like the DMC build spends far more time allocating memory,
> even though the peak memory usage is only slightly lower in the
> VS version?
>
i got similar speed improvemnts using vs2010
|
June 25, 2013 Re: top time wasters in DMD, as reported by gprof - VS2010/VTune results | ||||
---|---|---|---|---|
| ||||
Posted in reply to Richard Webb Attachments: | Am 24.06.2013 18:15, schrieb Richard Webb:
> DMD built with DMC takes ~49 seconds to complete, but DMD build
> with VC2008 only takes ~12 seconds. (Need to get a proper VC
> build done to test it properly).
> Looks like the DMC build spends far more time allocating memory,
> even though the peak memory usage is only slightly lower in the
> VS version?
i've done VS2012 + Intel VTune Amp XE 2013 profiling - see the attached zipped csv file
|
June 25, 2013 Re: top time wasters in DMD, as reported by gprof - VS2010/VTune results | ||||
---|---|---|---|---|
| ||||
Posted in reply to dennis luehring | Am 25.06.2013 07:51, schrieb dennis luehring:
> Am 24.06.2013 18:15, schrieb Richard Webb:
>> DMD built with DMC takes ~49 seconds to complete, but DMD build
>> with VC2008 only takes ~12 seconds. (Need to get a proper VC
>> build done to test it properly).
>> Looks like the DMC build spends far more time allocating memory,
>> even though the peak memory usage is only slightly lower in the
>> VS version?
>
> i've done VS2012 + Intel VTune Amp XE 2013 profiling - see the attached
> zipped csv file
sorry it was VS2010 (and VTune Trial)
the results of VTune seems to be different to Walters
|
June 25, 2013 Re: top time wasters in DMD, as reported by gprof - VS2010/VTune results | ||||
---|---|---|---|---|
| ||||
Posted in reply to dennis luehring | Am 25.06.2013 07:51, schrieb dennis luehring: > Am 24.06.2013 18:15, schrieb Richard Webb: >> DMD built with DMC takes ~49 seconds to complete, but DMD build >> with VC2008 only takes ~12 seconds. (Need to get a proper VC >> build done to test it properly). >> Looks like the DMC build spends far more time allocating memory, >> even though the peak memory usage is only slightly lower in the >> VS version? > > i've done VS2012 + Intel VTune Amp XE 2013 profiling - see the attached > zipped csv file > > the AMD CodeXL results are also different - both VTune and CodeXL fully integrated into VS2010 and using "same" settings btw nice to read: http://www.yosefk.com/blog/how-profilers-lie-the-cases-of-gprof-and-kcachegrind.html |
June 25, 2013 Re: top time wasters in DMD, as reported by gprof | ||||
---|---|---|---|---|
| ||||
Posted in reply to Martin Nowak | Am Mon, 24 Jun 2013 21:01:36 +0200 schrieb Martin Nowak <code@dawg.eu>: > On 06/24/2013 08:43 PM, Martin Nowak wrote: > > > > I can try to install kernel debuginfo that 12% might contain some useful information. > > http://codepad.org/gWrGvm40 Interesting. So to troll a bit, do I see it right, that dmd is mostly a Unicode conversion and memory allocation tool ? -- Marco |
June 25, 2013 Re: top time wasters in DMD, as reported by gprof | ||||
---|---|---|---|---|
| ||||
Posted in reply to Marco Leise | On 25 June 2013 07:46, Marco Leise <Marco.Leise@gmx.de> wrote:
> Am Mon, 24 Jun 2013 21:01:36 +0200
> schrieb Martin Nowak <code@dawg.eu>:
>
>> On 06/24/2013 08:43 PM, Martin Nowak wrote:
>> >
>> > I can try to install kernel debuginfo that 12% might contain some useful information.
>>
>> http://codepad.org/gWrGvm40
>
> Interesting. So to troll a bit, do I see it right, that dmd is mostly a Unicode conversion and memory allocation tool ?
>
The D front end does nothing *but* allocate memory... and sometimes from all this allocation (if your computer doesn't die) a compiled program is produced.
--
Iain Buclaw
*(p < e ? p++ : p) = (c & 0x0f) + '0';
|
June 25, 2013 Re: top time wasters in DMD, as reported by gprof - VS2010/VTune results | ||||
---|---|---|---|---|
| ||||
Posted in reply to dennis luehring | On Tuesday, 25 June 2013 at 06:21:09 UTC, dennis luehring wrote: > Am 25.06.2013 07:51, schrieb dennis luehring: >> Am 24.06.2013 18:15, schrieb Richard Webb: >>> DMD built with DMC takes ~49 seconds to complete, but DMD build >>> with VC2008 only takes ~12 seconds. (Need to get a proper VC >>> build done to test it properly). >>> Looks like the DMC build spends far more time allocating memory, >>> even though the peak memory usage is only slightly lower in the >>> VS version? >> >> i've done VS2012 + Intel VTune Amp XE 2013 profiling - see the attached >> zipped csv file >> >> > > the AMD CodeXL results are also different - both VTune and CodeXL fully integrated into VS2010 and using "same" settings > > btw nice to read: http://www.yosefk.com/blog/how-profilers-lie-the-cases-of-gprof-and-kcachegrind.html GProf tends to be pretty useless for actual profiling in my experience. I think the best way is to use a sampling profiler such as 'perf' (a part of the linux project on a recent debian/ubuntu/mint type 'perf' into console to get info about what package to install, docs at https://perf.wiki.kernel.org/index.php/Tutorial, 'oprofile' (pretty much the same featureset as perf, sometimes hard to set up) or VTune mentioned here. Never expect gprof to give you reliable data as to how much time which function takes. Callgrind/kcachegrind is also pretty good if your code doesn't spend a lot of time on i/o, system calls, etc (as the main code is running in a slow VM - anything not running in that VM will seem to run much faster). Furthermore, _neither_ of these requires compiling with special flags. As for debug symbols, it's best to enable optimizations together with enabling debug symbols. Optimizations are not a big issue - even if some functions were inlined, these tools give you per-line and per-instruction results. Not to mention cache hits/misses, branches, branch mispredictions, and if you use CPU specific event IDs whatever else your CPU can record. AND it doesn't affect performance of profiled code measurably, unless you set an insanely high sample rate. And if this sounds difficult to configure, most of these tools (perf at the very least) have very sane defaults that give way more useful results than gprof. TLDR: gprof is horrible. Never use it for profiling. There are approximaly 5 billion better tools that give more detailed results _and_ are easier to use. I seriously need to write a blog post/article about this. |
June 25, 2013 Re: top time wasters in DMD, as reported by gprof | ||||
---|---|---|---|---|
| ||||
Posted in reply to Iain Buclaw | On 6/25/13 2:13 AM, Iain Buclaw wrote:
> On 25 June 2013 07:46, Marco Leise<Marco.Leise@gmx.de> wrote:
>> Am Mon, 24 Jun 2013 21:01:36 +0200
>> schrieb Martin Nowak<code@dawg.eu>:
>>
>>> On 06/24/2013 08:43 PM, Martin Nowak wrote:
>>>>
>>>> I can try to install kernel debuginfo that 12% might contain some useful
>>>> information.
>>>
>>> http://codepad.org/gWrGvm40
>>
>> Interesting. So to troll a bit, do I see it right, that dmd is
>> mostly a Unicode conversion and memory allocation tool ?
>>
>
> The D front end does nothing *but* allocate memory... and sometimes
> from all this allocation (if your computer doesn't die) a compiled
> program is produced.
Then maybe it should use its own malloc that uses the bump-the-pointer approach from large chunks allocated with malloc.
Andrei
|
June 25, 2013 Re: top time wasters in DMD, as reported by gprof | ||||
---|---|---|---|---|
| ||||
Posted in reply to Marco Leise | On Tuesday, 25 June 2013 at 06:46:54 UTC, Marco Leise wrote:
> Am Mon, 24 Jun 2013 21:01:36 +0200
> schrieb Martin Nowak <code@dawg.eu>:
>
>> On 06/24/2013 08:43 PM, Martin Nowak wrote:
>> >
>> > I can try to install kernel debuginfo that 12% might contain some useful
>> > information.
>>
>> http://codepad.org/gWrGvm40
>
> Interesting. So to troll a bit, do I see it right, that dmd is
> mostly a Unicode conversion and memory allocation tool ?
Maybe not DMD, so much as GNU ld? Whatever the case, I'm not surprised to see their iconv that high up in the list; in my experience, it's horrifically slow.
Ah, and a fun question: does that number change significantly when you modify your locale variables? (I think it should be enough to just export LC_CTYPE="C".)
-Wyatt
|
June 25, 2013 Re: top time wasters in DMD, as reported by gprof | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | On 25 June 2013 17:56, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote: > On 6/25/13 2:13 AM, Iain Buclaw wrote: >> >> On 25 June 2013 07:46, Marco Leise<Marco.Leise@gmx.de> wrote: >>> >>> Am Mon, 24 Jun 2013 21:01:36 +0200 >>> >>> schrieb Martin Nowak<code@dawg.eu>: >>> >>>> On 06/24/2013 08:43 PM, Martin Nowak wrote: >>>>> >>>>> >>>>> I can try to install kernel debuginfo that 12% might contain some >>>>> useful >>>>> information. >>>> >>>> >>>> http://codepad.org/gWrGvm40 >>> >>> >>> Interesting. So to troll a bit, do I see it right, that dmd is mostly a Unicode conversion and memory allocation tool ? >>> >> >> The D front end does nothing *but* allocate memory... and sometimes from all this allocation (if your computer doesn't die) a compiled program is produced. > > > Then maybe it should use its own malloc that uses the bump-the-pointer approach from large chunks allocated with malloc. > I meant it in the most light hearted way possible. Though it's no secret that heavily templated code, coupled with string mixins (eg: Neat is a good example which consumes at least 3GB memory to compile one module IIRC) can see memory allocations sore. I do believe feep preloads the boehm-gc when compiling to mitigate this. :o) -- Iain Buclaw *(p < e ? p++ : p) = (c & 0x0f) + '0'; |
Copyright © 1999-2021 by the D Language Foundation