July 26, 2015
On Sunday, 26 July 2015 at 14:36:09 UTC, Johan Holmberg wrote:
> C++ with <stdio.h> style IO:    0.40s
> C++ with <fstream> style IO:   0.31s
> D 2.067                                    1.75s
> D 2.068 beta 2:                        0.69s
> Perl:                                         1.49s
> Python:                                    1.86s
>
> So on Ubuntu, the C++ <fstream> version was clearly best. And the improvement in DMD 2.068 beta "only" a factor of 2.5 from 2.067.
>
> /johan

It would be better to compare with LDC or GDC to match the same backend as C++. That is a little harder since they don't have 2.068 yet.
July 27, 2015
On 07/26/2015 09:04 PM, Jesse Phillips wrote:
> 
> It would be better to compare with LDC or GDC to match the same backend as C++. That is a little harder since they don't have 2.068 yet.

Reading a file is IO and memcpy limited, has nothing to do with compiler optimizations. Clearly we must be doing some unnecessary copying or allocating.
July 27, 2015
Are you including program startup and exit in the timing? For comparison, can you include the timings of an empty do-nothing program in all the languages?
July 27, 2015
On Sun, Jul 26, 2015 at 5:36 PM, Andrei Alexandrescu via Digitalmars-d < digitalmars-d@puremagic.com> wrote:

> On 7/26/15 10:35 AM, Johan Holmberg via Digitalmars-d wrote:
>
>>
>> On Sat, Jul 25, 2015 at 10:12 PM, Andrei Alexandrescu via Digitalmars-d <digitalmars-d@puremagic.com <mailto:digitalmars-d@puremagic.com>> wrote:
>>
>>     On 7/25/15 1:53 PM, Johan Holmberg via Digitalmars-d wrote:
>> [...]
>>         I download a dmd 2.068 beta, and re-tried with my input file:
>>         now the D
>>         program takes 1.6s (a 10x improvement).
>>
>>     Great, though it still seems to be behind the C++ version, which is
>>     a bummer. -- Andrei
>> [... linux numbers removed ...]
>>
>
> I think we should investigate this and bring performance to par. Anyone interested? -- Andrei
>


Back on MacOS again, I thought I should try to run "Instruments" on my program. I'm not familiar with the DMD source code, but I did the following:

- downloaded the DMD source from Github + built it
- rebuilt my program with this dmd
- used Instruments (the MacOS profiler) on my program

Two things showed up in Instruments that seemed suspicious, both in "stdio.d":

1) calls to "__tls_get_addr" inside readlnImpl" (taking 0.25s out of the
total 1.69s according to Instruments). I added "__gshared" to the static
variables "lineptr" and "n" to see if it had any effect (see below for
results).

2) calls to "std.algorithm.endsWith" inside File.ByLine.Impl.popFront (taking 0.10s according to Intruments). I replaced it with a simpler test using inline code.

The timings running my program normally (not using Instruments now), became as follows with the different versions of dmd:

dmd unmodified: 1.59s
dmd with change 1): 1.33s
dmd with change 1+2): 1.22s
C++ using <stdio.h>: 1.13s    (for comparison)

My changes to dmd are of course not correct, but my program still works as before at least. If 1) and 2) could be changed "the right way" the difference to the C++ program would be much smaller on MacOS (I haven't looked further into the Linux results).

Does this help getting forward?

/johan


July 27, 2015
On Monday, 27 July 2015 at 12:03:40 UTC, Johan Holmberg wrote:
> On Sun, Jul 26, 2015 at 5:36 PM, Andrei Alexandrescu via Digitalmars-d < digitalmars-d@puremagic.com> wrote:
>
>>[...]
>
>
> Back on MacOS again, I thought I should try to run "Instruments" on my program. I'm not familiar with the DMD source code, but I did the following:
>
> [...]

IIRC D's tls is particularly slow on OS X
July 27, 2015
On Mon, Jul 27, 2015 at 11:03 AM, via Digitalmars-d < digitalmars-d@puremagic.com> wrote:

> Are you including program startup and exit in the timing? For comparison, can you include the timings of an empty do-nothing program in all the languages?
>

Yes, I measure the whole program. But these startup/exit times are really small. Reading /dev/null takes 0.003s in both C++ and D, and 0.007s in Perl. "Nothing" compared to the other times.

/johan


July 27, 2015
Martin Nowak <code+news.digitalmars@dawg.eu> wrote:
> On 07/26/2015 09:04 PM, Jesse Phillips wrote:
>> 
>> It would be better to compare with LDC or GDC to match the same backend as C++. That is a little harder since they don't have 2.068 yet.
> 
> Reading a file is IO and memcpy limited, has nothing to do with compiler optimizations. Clearly we must be doing some unnecessary copying or allocating.

Or too much syscalls because of non-optimal buffering?

Tobi
July 27, 2015
On Monday, 27 July 2015 at 08:52:07 UTC, Martin Nowak wrote:
> On 07/26/2015 09:04 PM, Jesse Phillips wrote:
>> 
>> It would be better to compare with LDC or GDC to match the same backend as C++. That is a little harder since they don't have 2.068 yet.
>
> Reading a file is IO and memcpy limited, has nothing to do with compiler optimizations. Clearly we must be doing some unnecessary copying or allocating.

Unless the only code being exercised is only a system call to read and a system call to memcpy, then I'll stick with the notion that the backends may have something to do with it or if it is just tested with the same backend.
July 29, 2015
On 2015-07-27 14:03, Johan Holmberg via Digitalmars-d wrote:

> Back on MacOS again, I thought I should try to run "Instruments" on my
> program. I'm not familiar with the DMD source code, but I did the following:
>
> - downloaded the DMD source from Github + built it
> - rebuilt my program with this dmd
> - used Instruments (the MacOS profiler) on my program
>
> Two things showed up in Instruments that seemed suspicious, both in
> "stdio.d":
>
> 1) calls to "__tls_get_addr" inside readlnImpl" (taking 0.25s out of the
> total 1.69s according to Instruments). I added "__gshared" to the static
> variables "lineptr" and "n" to see if it had any effect (see below for
> results).
>
> 2) calls to "std.algorithm.endsWith" inside File.ByLine.Impl.popFront
> (taking 0.10s according to Intruments). I replaced it with a simpler
> test using inline code.
>
> The timings running my program normally (not using Instruments now),
> became as follows with the different versions of dmd:
>
> dmd unmodified: 1.59s
> dmd with change 1): 1.33s
> dmd with change 1+2): 1.22s
> C++ using <stdio.h>: 1.13s    (for comparison)

I recommend you also try using LDC. It has a better optimizer and is using native TLS on OS X.

-- 
/Jacob Carlborg
July 29, 2015
On Wed, Jul 29, 2015 at 11:47 AM, Jacob Carlborg via Digitalmars-d < digitalmars-d@puremagic.com> wrote:

> On 2015-07-27 14:03, Johan Holmberg via Digitalmars-d wrote:
>
>> The timings running my program normally (not using Instruments now), became as follows with the different versions of dmd:
>>
>> dmd unmodified: 1.59s
>> dmd with change 1): 1.33s
>> dmd with change 1+2): 1.22s
>> C++ using <stdio.h>: 1.13s    (for comparison)
>>
>
> I recommend you also try using LDC. It has a better optimizer and is using
> native TLS on OS X.
> /Jacob Carlborg
>


Is there a LDC that incorporates the changes coming in DMD 2.068 that made my code run 10x faster compared with 2.067? (the one Andrei talked about in the StackOverflow-link given earlier in this thread: https://github.com/D-Programming-Language/phobos/pull/3089 ).

I have tried "ldc2-0.15.2-beta1-osx-x86_64" and also built LDC from the Git-archive sources. In both cases I get times around 13s. This is close to my original "bad" numbers from DMD 2.067 (15s).

I assume I have to wait until there is a LDC using the same Phobos version as DMD 2.068 uses.

/johan