June 02, 2013
On Friday, 31 May 2013 at 09:43:05 UTC, Manu wrote:
>  - Require explicitly export when you want to create shared objects.
>>
>> This one is an enabler for an optimizer to finalize virtual method. With
>> this mechanism in place, the compile knows all the override and can
>> finalize many calls during LTO. I especially like that one as it allow for
>> stripping many symbols at link time and allow for other LTO in general (for
>> instance, the compiler can choose custom calling conventions for some
>> methods, knowing all call sites).
>>
>> The explicit export one have my preference, however, ti require that
>> symbol used in shared lib are explicitly declared export. I think we
>> shouldn't break the virtual by default behavior, but we still need to
>> figure out a way to make thing more performant on this point.
>>
>
> You're concerned about the magnitude of breakage introducing an explicit
> virtual requirement. This would seem to be a much much bigger breakage to
> me, and I don't think it's intuitive.

Officially, we are just starting to support shared object. I know you use them, and that this is a breakage to you. However, I think it is reasonable to expect that most people do not. Still, the migration should be carefully handled.

This change is actually a fairly big deal as it empower the optimizer quite a lot, and not only for virtual functions. It can do way more than the dev could manually annotating any function.

This is proven technology, not the sufficiently smart compiler here. Ti be fair, I'd push for this change even if final was the default, as it enable for many other optimizations.

As the example shown, manually annotating everything isn't going to provide 5% to 7%, which irrelevant in most code path, and when a compiler could improve the situation significantly automatically.
June 02, 2013
Am 02.06.2013 08:33, schrieb Manu:
> On 2 June 2013 01:19, Paulo Pinto <pjmlp@progtools.org
> <mailto:pjmlp@progtools.org>> wrote:
> [...]
>
>     At least when I did my traineeship at CERN (2003-2004) that was the
>     case.
>
>
> I hope CERN has better software engineers than Cambridge University ;)
> Most of these guys are mathematicians and physicists first, and
> programmers second.

I was on the Atlas group that does real time processing from the data coming out of the particle accelerator, as a way to filter out uninteresting data.

And also offline multicore clustered data analysis of the saved data.

They even rewrote the network stack to bypass the latency introduced by IP protocols.

You are right on the guys backgrounds, but on our case we had monthly meetings about performance of the whole stack, to look for spots worth
improving.

--
Paulo



June 02, 2013
On 06/01/2013 04:58 PM, finalpatch wrote:
> However I retested on a windows 7 machine with GDC compiler and the results were very different.
> 
> orignal: 545ms
> * the first 2 optimizations which helped the most on OSX with LDC has almost
> zero effect
> * hand unroll overloaded vector arithmetic operators - 280ms (265ms improvment)
> * "final" to all class methods - 200ms (80ms improvment)

What flags were you using with each compiler?

June 02, 2013
Hi Joseph,

The flags I used
OSX LDC: -O3 -release
WIN GDC: -O3 -fno-bounds-check -frelease

Joseph Rushton Wakeling <joseph.wakeling@webdrake.net> writes:

> On 06/01/2013 04:58 PM, finalpatch wrote:
>> However I retested on a windows 7 machine with GDC compiler and the results were very different.
>> 
>> orignal: 545ms
>> * the first 2 optimizations which helped the most on OSX with LDC has almost
>> zero effect
>> * hand unroll overloaded vector arithmetic operators - 280ms (265ms improvment)
>> * "final" to all class methods - 200ms (80ms improvment)
>
> What flags were you using with each compiler?
>

-- 
finalpatch
June 02, 2013
On 2013-06-01 23:08, Jonathan M Davis wrote:

> If you don't need polymorphism, then in general, you shouldn't use a class
> (though sometimes it might make sense simply because it's an easy way to get a
> reference type). Where it becomes more of a problem is when you need a few
> polymorphic functions and a lot of non-polymorphic functions (e.g. when a
> class has a few methods which get overridden and then a lot of properties
> which it makes no sense to override). In that case, you have to use a class,
> and then you have to mark a lot of functions as final. This is what folks like
> Manu and Don really don't like, particularly when they're in environments
> where the extra cost of the virtual function calls actually matters.

If a reference type is needed but not a polymorphic type, then a final class can be used.

-- 
/Jacob Carlborg
June 02, 2013
On Sunday, June 02, 2013 11:53:26 Jacob Carlborg wrote:
> On 2013-06-01 23:08, Jonathan M Davis wrote:
> > If you don't need polymorphism, then in general, you shouldn't use a class (though sometimes it might make sense simply because it's an easy way to get a reference type). Where it becomes more of a problem is when you need a few polymorphic functions and a lot of non-polymorphic functions (e.g. when a class has a few methods which get overridden and then a lot of properties which it makes no sense to override). In that case, you have to use a class, and then you have to mark a lot of functions as final. This is what folks like Manu and Don really don't like, particularly when they're in environments where the extra cost of the virtual function calls actually matters.
> If a reference type is needed but not a polymorphic type, then a final class can be used.

Yes. The main problem is when you have a class with a few methods which should be virtual and a lot that don't. You're forced to mark a large number of functions as final. That burden can be lessened by using final with a colon rather than marking them individually, but rather what seems to inevitably happen is that programmers forget to mark any of them as final (Manu can rant quite a bit about that, as he's had to deal with it at work, and it's cost him quite a bit of time, as he has to go through every function which wasn't marked as final and determine whether it's actuallly supposed to be virtual or not). Having non-virtual be the default makes functions efficient by default.

- Jonathan M Davis
June 02, 2013
On 06/02/2013 11:32 AM, finalpatch wrote:
> The flags I used
> OSX LDC: -O3 -release
> WIN GDC: -O3 -fno-bounds-check -frelease

Does adding -inline make a difference to initial performance (i.e. before your manual interventions)?  I guess it's already covered by -O3 in both cases, but a while back I did notice some differences in "default" LDC and GDC performance that seemed to relate to inlining.
June 02, 2013
On Sunday, 2 June 2013 at 07:32:10 UTC, Manu wrote:
> On 2 June 2013 01:19, Paulo Pinto <pjmlp@progtools.org> wrote:
>
>> Am 01.06.2013 16:24, schrieb Benjamin Thaut:
>>
>>  Am 01.06.2013 01:30, schrieb Manu:
>>>
>>>> On 1 June 2013 09:15, bearophile <bearophileHUGS@lycos.com
>>>> <mailto:bearophileHUGS@lycos.**com <bearophileHUGS@lycos.com>>> wrote:
>>>>
>>>>     Manu:
>>>>
>>>>         On 1 June 2013 01:12, bearophile <bearophileHUGS@lycos.com
>>>>         <mailto:bearophileHUGS@lycos.**com <bearophileHUGS@lycos.com>>>
>>>> wrote:
>>>>
>>>>             Manu:
>>>>
>>>>
>>>>               Frankly, this is a textbook example of why STL is the
>>>>             spawn of satan. For
>>>>
>>>>                 some reason people are TAUGHT that it's reasonable to
>>>>                 write code like
>>>>                 this.
>>>>
>>>>
>>>>             There are many kinds of D code, not everything is a high
>>>>             performance
>>>>             ray-tracer or 3D game. So I'm sure there are many many
>>>>             situations where
>>>>             using the C++ STL is more than enough. As most tools, you
>>>>             need to know
>>>>             where and when to use them. So it's not a Satan-spawn :-)
>>>>
>>>>
>>>>         So why are we having this conversation at all then if faster
>>>>         isn't better in this instance?
>>>>
>>>>
>>>>     Faster is better in this instance.
>>>>     What's wrong is your thinking that the STL as the spawn of Satan in
>>>>     general.
>>>>
>>>>
>>>> Ah, but that's because it is ;)
>>>> Rule of thumb: never use STL in tight loops. problem solved (well,
>>>> mostly)...
>>>>
>>>
>>> I have to agree here. Whenever you have a codebase that has to work on 9
>>> platforms and 6 compilers the S in STL vanishes. Also the
>>> implementations are so varying in quality that you might get really good
>>> performance on one platform but really bad on another. It seems like
>>> everyone in the games industry avoids STL like the plague.
>>>
>>> Kind Regards
>>> Benjamin Thaut
>>>
>>
>> I used to have that experience even with C, when I started using it around
>> 1994. C++ was even worse between CFront, ARM and ongoing standardization
>> work.
>>
>> As for STL, I can assure that HPC guys are huge fans of STL and Boost.
>>
>
> The funny thing about HPC guys though, at least in my experience (a bunch
> of researchers from Cambridge who I often give _basic_ optimisation tips),
> is they don't write/run 'high performance software', they're actually
> pretty terrible programmers and have a tendency to write really low
> performing software, but run it on super high performance computers, and
> then call the experience high performance computing...
> It bends my mind to see them demand an order of magnitude more computing
> power to run an algorithm that's hamstrung by poor choices of containers or
> algorithms that probably cost them an order of magnitude in performance ;)
> And then the Universities take their demands seriously and deliver them
> more hardware! O_O
>
> At least when I did my traineeship at CERN (2003-2004) that was the case.
>>
>
> I hope CERN has better software engineers than Cambridge University ;)
> Most of these guys are mathematicians and physicists first, and programmers
> second.

In my experience, physicists are terrible programmers. I should know, I am one! As soon as you step outside the realm of simple,  < 10kloc, pure procedural code, the supposed "HPC"
guys don't generally have the first clue how to write something fast.

CERN is responsible for the abomination that is ROOT, but to be fair to them there is a lot of good code from there too.
June 02, 2013
On 2 June 2013 12:05, Joseph Rushton Wakeling <joseph.wakeling@webdrake.net> wrote:
> On 06/02/2013 11:32 AM, finalpatch wrote:
>> The flags I used
>> OSX LDC: -O3 -release
>> WIN GDC: -O3 -fno-bounds-check -frelease
>
> Does adding -inline make a difference to initial performance (i.e. before your manual interventions)?  I guess it's already covered by -O3 in both cases, but a while back I did notice some differences in "default" LDC and GDC performance that seemed to relate to inlining.

-O3 is covered in the GDC case (turns on -finline-functions).


--
Iain Buclaw

*(p < e ? p++ : p) = (c & 0x0f) + '0';
June 02, 2013
On 06/02/2013 08:33 AM, Manu wrote:
> Most of these guys are mathematicians and physicists first, and programmers second.

You've hit the nail on the head, but it's also a question of priorities.  It's _essential_ that the maths or physics be understood and done right.  It's essential that the programs correctly reflect that maths or physics.  It's merely _desirable_ that the programs run as fast as possible, or be well designed from a maintenance point of view, or any of the other things that matter to trained software developers.  (In my day job I have to continually force myself to _not_ refactor or optimize my code, even though I'd get a lot of pleasure out of doing so, because it's working adequately and my work priority is to get results out of it.)

That in turn leads to a hiring situation where the preference is to have mathematicians or physicists who can program, rather than programmers who can learn the maths.  It doesn't help that because of the way academic funding is made available, the pay scales mean that it's not really possible to attract top-level developers (unless they have some kind of keen moral desire to work on academic research); in addition, you usually have to hire them as PhD students or postdocs or so on (I've also seen masters' students roped in to this end), which obviously constrains the range of people that you can hire and the range of skills that will be available, and also the degree of commitment these people can put into long-term vision and maintenance of the codebase.

There's also a training problem -- in my experience, most physics undergraduates are given a crash course in C++ in their first year and not much in the way of real computer science or development training.  In my case as a maths undergraduate the first opportunity to learn programming was in the 3rd year of my degree course, and it was a crash course in a very narrow subset of C dedicated towards numerical programming.  And if (like me) you then go on into research, you largely have to self-teach, which can lead to some very idiosyncratic approaches.

I hope that this will change, because programming is now an absolutely essential part of just about every avenue of scientific research.  But as it stands, it's a serious problem.