April 15, 2012
On Sunday, April 15, 2012 04:21:09 Joseph Rushton Wakeling wrote:
> On 14/04/12 23:03, q66 wrote:
> > He also uses a class. And -noboundscheck should be automatically induced
> > by
> > -release.
> 
> ... but the methods are marked as final -- shouldn't that substantially reduce any speed hit from using class instead of struct?

In theory. If they don't override anything, then that signals to the compiler that they don't need to be virtual, in which case, they _shouldn't_ be virtual, but that's up to the compiler to optimize, and I don't know how good it is about that right now. Certainly, if you had code like

class C
{
    final int foo() { return 42;}
}

and benchmarking showed that it was the same speed as

class C
{
    int foo() { return 42;}
}

when compiled with -O and -inline, then I'd submit a bug report (maybe an enhancement request?) on the compiler failing to make final functions non- virtual.

Also, if the function is doing enough work, then whether it's virtual or not really doesn't make any difference, because the function itself costs so much more than the extra cost of the virtual function call.

- Jonathan M Davis
April 15, 2012
On Saturday, April 14, 2012 19:31:40 Jonathan M Davis wrote:
> On Sunday, April 15, 2012 04:21:09 Joseph Rushton Wakeling wrote:
> > On 14/04/12 23:03, q66 wrote:
> > > He also uses a class. And -noboundscheck should be automatically induced
> > > by
> > > -release.
> > 
> > ... but the methods are marked as final -- shouldn't that substantially reduce any speed hit from using class instead of struct?
> 
> In theory. If they don't override anything, then that signals to the compiler that they don't need to be virtual, in which case, they _shouldn't_ be virtual, but that's up to the compiler to optimize, and I don't know how good it is about that right now. Certainly, if you had code like
> 
> class C
> {
>     final int foo() { return 42;}
> }
> 
> and benchmarking showed that it was the same speed as
> 
> class C
> {
>     int foo() { return 42;}
> }
> 
> when compiled with -O and -inline, then I'd submit a bug report (maybe an enhancement request?) on the compiler failing to make final functions non- virtual.

Actually, if you try and benchmark it, make sure that the code can't know that the reference is exactly a C. In theory, the compiler could be smart enough to know in a case such as

auto c = new C;
auto a = c.foo();

that c is exactly a C and that therefore, it can just inline the call to foo even if it's virtual. If c is set from another function, it can't do that. e.g.

auto c = bar();
auto a = c.foo();

The compiler _probably_ isn't that smart, but it might be, so you'd have to be careful about that.

- Jonathan M Davis
April 15, 2012
On Sunday, 15 April 2012 at 02:20:34 UTC, Joseph Rushton Wakeling wrote:
> On 14/04/12 23:03, q66 wrote:
>> He also uses a class. And -noboundscheck should be automatically induced by
>> -release.
>
> Ahh, THAT probably explains why some of my numerical code is so markedly different in speed when compiled using DMD with or without the -release switch.
>  It's a MAJOR difference -- between code taking say 5min to run, compared to half an hour or more.

I know this isn't what your post was about, but you really should compile numerical code with GDC instead of DMD if you care about performance. It generates much faster floating point code.
April 15, 2012
On 15/04/12 04:37, jerro wrote:
> I know this isn't what your post was about, but you really should compile
> numerical code with GDC instead of DMD if you care about performance. It
> generates much faster floating point code.

It's exactly what I do. :-)
April 15, 2012
> On Saturday, 14 April 2012 at 19:51:21 UTC, Joseph Rushton Wakeling wrote:
>> GDC has all the regular gcc optimization flags available IIRC. The ones on the
>> GDC man page are just the ones specific to GDC.
> I'm not talking about compiler flags, but the "inline" keyword in the C++ source
> code. I saw some discussion about "@inline" but it seems not implemented (yet?).
> Well, that is not a priority for D anyway.
>
> About compiler optimizations, -finline-functions and -fweb are part of -O3. I
> tried to compile with -no-bounds-check, but made no diference for DMD and GDC.
> It probably is part of -release as q66 said.

Ah yes, you're right.  I do wonder if your seeming speed differences are magnified because the whole operation is only 2-4 seconds long: if your algorithm were operating over a longer timeframe I think you'd likely find the relative speed differences decrease.  (I have a memory of a program that took ~0.004s with a C/C++ version and 1s with D, and the difference seemed to be just startup time for the D program.)

What really amazes me is the difference between g++, DMD and GDC in size of the executable binary.  100 orders of magnitude!

3 remarks about the D code.  One is that much of it still seems very "C-ish"; I'd be interested to see how speed and executable size differ if things like the file opening, or the reading of characters, are done with more idiomatic D code.  Sounds stupid as the C stuff should be fastest, but I've been surprised sometimes at how using idiomatic D formulations can improve things.

Second remark is more of a query -- can Predictor.p() and .update really be marked as pure?  Their result for a given input actually varies depending on the current values of cxt and ct, which are modified outside of function scope.

Third remark -- again a query -- why the GC.disable ... ?
April 15, 2012
>(I have a memory of a program that took
> ~0.004s with a C/C++ version and 1s with D, and the difference seemed to be just startup time for the D program.)

I have never seen anything like that. Usually the minimal time to run a D program is something like:

j@debian:~$ time ./hello
Hello world!

real	0m0.001s
user	0m0.000s
sys	0m0.000s


> What really amazes me is the difference between g++, DMD and GDC in size of the executable binary.  100 orders of magnitude!

With GDC those flags(for gdmd):

-fdata-sections -ffunction-sections -L--gc-sections -L-l

help a lot if you want to reduce a size of executable. Besides, this overhead is the standard library and runtime and won't be much larger in larger programs.
April 15, 2012
On Sunday, 15 April 2012 at 03:41:55 UTC, jerro wrote:
>>(I have a memory of a program that took
>> ~0.004s with a C/C++ version and 1s with D, and the difference seemed to be just startup time for the D program.)
>
> I have never seen anything like that. Usually the minimal time to run a D program is something like:
>
> j@debian:~$ time ./hello
> Hello world!
>
> real	0m0.001s
> user	0m0.000s
> sys	0m0.000s
>
>
>> What really amazes me is the difference between g++, DMD and GDC in size of the executable binary.  100 orders of magnitude!
>
> With GDC those flags(for gdmd):
>
> -fdata-sections -ffunction-sections -L--gc-sections -L-l
>
> help a lot if you want to reduce a size of executable. Besides, this overhead is the standard library and runtime and won't be much larger in larger programs.

The last flag should be -L-s

April 15, 2012
On 15/04/12 05:41, jerro wrote:
> I have never seen anything like that. Usually the minimal time to run a D
> program is something like:
>
> j@debian:~$ time ./hello
> Hello world!
>
> real 0m0.001s
> user 0m0.000s
> sys 0m0.000s

Yea, my experience too in general.  I can't remember exactly what I was testing but if it's what I think it was (and have just retested:-), the difference may have been less pronounced (maybe 0.080s for D compared to 0.004 for C++) and that would have been due to not enabling optimizations for D.

I have another pair of stupid D-vs.-C++ speed-test files where with optimizations engaged, D beats C++: the dominant factor is lots of output to console, so I guess this is D's writeln() beating C++'s cout.

>> What really amazes me is the difference between g++, DMD and GDC in size of
>> the executable binary. 100 orders of magnitude!
>
> With GDC those flags(for gdmd):
>
> -fdata-sections -ffunction-sections -L--gc-sections -L-l
>
> help a lot if you want to reduce a size of executable. Besides, this overhead is
> the standard library and runtime and won't be much larger in larger programs.

Ahh! I hadn't realized that the libphobos package on Ubuntu didn't install a compiled version of the library.  (DMD does.)
April 15, 2012
On Sunday, 15 April 2012 at 02:56:21 UTC, Joseph Rushton Wakeling wrote:
>> On Saturday, 14 April 2012 at 19:51:21 UTC, Joseph Rushton Wakeling wrote:
>>> GDC has all the regular gcc optimization flags available IIRC. The ones on the
>>> GDC man page are just the ones specific to GDC.
>> I'm not talking about compiler flags, but the "inline" keyword in the C++ source
>> code. I saw some discussion about "@inline" but it seems not implemented (yet?).
>> Well, that is not a priority for D anyway.
>>
>> About compiler optimizations, -finline-functions and -fweb are part of -O3. I
>> tried to compile with -no-bounds-check, but made no diference for DMD and GDC.
>> It probably is part of -release as q66 said.
>
> Ah yes, you're right.  I do wonder if your seeming speed differences are magnified because the whole operation is only 2-4 seconds long: if your algorithm were operating over a longer timeframe I think you'd likely find the relative speed differences decrease.  (I have a memory of a program that took ~0.004s with a C/C++ version and 1s with D, and the difference seemed to be just startup time for the D program.)
Well, this don't seem to be true:

1.2MB compressible
encode:
C++:   0.11s (100%)
D-inl: 0.14s (127%)
decode
C++:   0.12s (100%)
D-inl: 0.16s (133%)

~200MB compressible
encode:
C++:   17.2s (100%)
D-inl: 21.5s (125%)
decode:
C++:   16.3s (100%)
D-inl: 24,5s (150%)

3,8GB, barelly-compressible
encode:
C++:   412s  (100%)
D-inl: 512s  (124%)


> What really amazes me is the difference between g++, DMD and GDC in size of the executable binary.  100 orders of magnitude!
I have remarked it in another topic before, with a simple "hello world". I need to update there, now that I got DMD working. BTW, it is 2 orders of magnitude.

>
> 3 remarks about the D code.  One is that much of it still seems very "C-ish"; I'd be interested to see how speed and executable size differ if things like the file opening, or the reading of characters, are done with more idiomatic D code.
>  Sounds stupid as the C stuff should be fastest, but I've been surprised sometimes at how using idiomatic D formulations can improve things.

Well, it may indeed be faster, especially the IO that is dependent on things like buffering and so on. But for this I just wanted something as close as the C++ code as possible.

> Second remark is more of a query -- can Predictor.p() and .update really be marked as pure?  Their result for a given input actually varies depending on the current values of cxt and ct, which are modified outside of function scope.
Yeah, I don't know. I just did just throw those qualifiers against the compiler, and saw what sticks. And I was testing the decode speed specially to see easily if the output was corrupted. But maybe it haven't corrupted because the compiler don't optimize based on "pure" yet... there was no speed difference too.. so...

>
> Third remark -- again a query -- why the GC.disable ... ?
Just to be sure that the speed difference wasn't caused by the GC. It didn't make any speed difference also, and it is indeed a bad idea.


April 15, 2012
On 15/04/12 09:23, ReneSac wrote:
>> What really amazes me is the difference between g++, DMD and GDC in size of
>> the executable binary. 100 orders of magnitude!
> I have remarked it in another topic before, with a simple "hello world". I need
> to update there, now that I got DMD working. BTW, it is 2 orders of magnitude.

Ack.  Yes, 2 orders of magnitude, 100 _times_ larger.  That'll teach me to send comments to a mailing list at an hour of the morning so late that it can be called early. ;-)

>> Sounds stupid as the C stuff should be fastest, but I've been surprised
>> sometimes at how using idiomatic D formulations can improve things.
>
> Well, it may indeed be faster, especially the IO that is dependent on things
> like buffering and so on. But for this I just wanted something as close as the
> C++ code as possible.

Fair dos.  Seems worth trying the idiomatic alternatives before assuming the speed difference is always going to be so great, though.

> Yeah, I don't know. I just did just throw those qualifiers against the compiler,
> and saw what sticks. And I was testing the decode speed specially to see easily
> if the output was corrupted. But maybe it haven't corrupted because the compiler
> don't optimize based on "pure" yet... there was no speed difference too.. so...

I think it's because although there's mutability outside the function, those variables are still internal to the struct.  e.g. if you try and compile this code:

      int k = 23;

      pure int twotimes(int a)
      {
            auto b = 2*a;
            auto c = 2*k;
            return b+c;
      }

... it'll fail, but if you try and compile,

      struct TwoTimes
      {
            int k = 23;

            pure int twotimes(int a)
            {
                  auto b = 2*a;
                  auto c = 2*k;
                  return b+c;
            }
      }

... the compiler accepts it.  Whether that's because it's acceptably pure, or because the compiler just doesn't detect this case of impurity, is another matter.  The int k is certainly mutable from outside the scope of the function, so AFAICS it _should_ be disallowed.