New malloc() for win32 that should produce faster DMD's and faster D code that uses malloc() (page 3)

On Tuesday, 6 August 2013 at 17:48:57 UTC, Walter Bright wrote: > On 8/6/2013 5:13 AM, Richard Webb wrote: >> It's possible that other library routines are causing some of the remaining >> difference from the MSVC build (e.g. the profiler suggests that the DMC build >> spends somewhat more time inside memcpy than the MSVC build). >> >> Not sure if it's down to implementation or optimization though - might be down >> to intrinsics/inlining and such? (the proflie for the DMC build says it's using >> ~1% of its time inside strlen and the profile for the MSVC build doesn't mention >> it at all, which i guess is because it's using an intrinsic version). > > > If it's inlined then it won't show up in the profile. And yes, it's possible MSVC has a faster memcpy(). After all, enormous effort has been poured into memcpy(). If you use a profiler with line or instruction granularity (like perf on Linux), it will show up. On Windows, that would probably be VTune and CodeAnalyst.

August 06, 2013

Re: New malloc() for win32 that should produce faster DMD's and faster D code that uses malloc()

Posted by Kiith-Sa
in reply to Kiith-Sa

Permalink

Kiith-Sa

Posted in reply to Kiith-Sa

Permalink

On Tuesday, 6 August 2013 at 18:38:43 UTC, Kiith-Sa wrote:
> On Tuesday, 6 August 2013 at 17:48:57 UTC, Walter Bright wrote:
>> On 8/6/2013 5:13 AM, Richard Webb wrote:
>>> It's possible that other library routines are causing some of the remaining
>>> difference from the MSVC build (e.g. the profiler suggests that the DMC build
>>> spends somewhat more time inside memcpy than the MSVC build).
>>>
>>> Not sure if it's down to implementation or optimization though - might be down
>>> to intrinsics/inlining and such? (the proflie for the DMC build says it's using
>>> ~1% of its time inside strlen and the profile for the MSVC build doesn't mention
>>> it at all, which i guess is because it's using an intrinsic version).
>>
>>
>> If it's inlined then it won't show up in the profile. And yes, it's possible MSVC has a faster memcpy(). After all, enormous effort has been poured into memcpy().
>
> If you use a profiler with line or instruction granularity
> (like perf on Linux), it will show up. On Windows, that would probably
> be VTune and CodeAnalyst.

(obviously, as a part of the function it was inlined into,
but you'll get the time consumed at lines/instructions from the inlined function)

On Saturday, August 03, 2013 14:55:29 Walter Bright wrote: > The execrable existing implementation was scrapped, and the new one uses > Windows HeapAlloc(). > > http://ftp.digitalmars.com/snn.lib > > This is for testing porpoises, and of course for those that Feel Da Need For Speed. But what if I prefer to test dolphins? ;) - Jonathan M Davis P.S. So long, and thanks for all the fish.

On 8/3/2013 3:28 PM, Jonathan M Davis wrote: > On Saturday, August 03, 2013 14:55:29 Walter Bright wrote: >> This is for testing porpoises, and of course for those that Feel Da Need For >> Speed. > > But what if I prefer to test dolphins? ;) They all look alike anyway, what's the difference?

Forums