August 01, 2013
On Wednesday, 31 July 2013 at 22:58:56 UTC, Walter Bright wrote:
> On 7/31/2013 2:40 PM, Bill Baxter wrote:
>> Are you serious that you can't fathom how it could be confusing to someone than
>> talking about differences in run times?
>
> Yes.
>
> And no, I'm not talking about confusing to someone who lives in an undiscovered stone age tribe in the Amazon. I'm talking about computer programmers.

I'm only a casual programmer, and I love the speed metric you've used. A 75% speed increase means that the new compiler will be 75% through compiling a second equivalent program by the time the previous compiler finishes the first. It conjures images like the ones in the Mopar video you posted. That's amazing to me. Here's to burning rubber, Walter!
August 01, 2013
On Wednesday, 31 July 2013 at 22:58:56 UTC, Walter Bright wrote:
> On 7/31/2013 2:40 PM, Bill Baxter wrote:
>> Are you serious that you can't fathom how it could be confusing to someone than
>> talking about differences in run times?
>
> Yes.
>
> And no, I'm not talking about confusing to someone who lives in an undiscovered stone age tribe in the Amazon. I'm talking about computer programmers.
>
>
>> If you say something is faster than something else you want the two numbers to
>> be something you can relate to.  Like MPH.  Everyone has a clear concept of what
>> MPH is.  We use it every day.  So to say 25 MPH is 25% faster than 20 MPH is
>> perfectly clear.  But nobody talks about program execution speed in terms of
>> programs per second.
>
> Yes, they do, and certainly in "lines per second". Google it and see for yourself. And as you well understand, from using the same program to compile, the number of lines cancels out when comparing speeds.
>
> There is nothing mysterious or confusing about this. Seriously.
>
>
>> So I think it's pretty clear why that would be harder for
>> people to grok than changes in car speeds or run times.
>
> To be blunt, Baloney!
Can we please stop this dumb argument?

I think the source of the confusion is that programmers only use execution times to measure execution speed and will often say that they sped up a program by 50% when they halved the execution time, implying that the "speed" went up by 50%.  Well, as Walter points out, a real speed, ie output/sec, would go up 100% in that scenario, so the programmers' language is technically incorrect.  This breaks many programmers' intuitions, hence all the complaining in this thread.

However, this whole debate is a bikeshed topic, as nobody really cares if the gain was 43% or 75%, ie nobody is using that precision for any purpose anyway.  Walter is technically correct, that's all that matters.

On Wednesday, 31 July 2013 at 23:26:32 UTC, Walter Bright wrote:
> On 7/31/2013 3:58 PM, John Colvin wrote:
>> It's a quite impressively unbalanced education that provides understanding of
>> memory allocation strategies, hashing and the performance pitfalls of integer
>> division, but not something as basic as a speed.
>
> Have you ever seen those cards that some "electrical engineers" carry around, with the following equations on them:
>
>     V = I * R
>     R = V / I
>     I = V / R
>
> ?
>
> I found it: https://docs.google.com/drawings/d/1StlhTYjiUEljnfVtFjP1BXLbixO30DIkbw-DpaYJoA0/edit?hl=en&pli=1
>
> Unbelievable. The author of it writes:
>
> "I'm going to explain to you how to use this cheat sheet in case you've never seen this before."
>
> http://blog.ricardoarturocabral.com/2010/07/electronic-electrical-cheat-sheets.html
>
> Makes you want to cry.
No real electrical engineer would ever use that card, as you connote with your quotes.  If they don't have Ohm's law and the resulting algebra drilled into their head, they better find another job.  I suspect that chart is for amateurs from other backgrounds who happen to be doing some electrical work.
August 01, 2013
On Thursday, 1 August 2013 at 07:47:25 UTC, Joakim wrote:
> On Wednesday, 31 July 2013 at 22:58:56 UTC, Walter Bright wrote:
>> Have you ever seen those cards that some "electrical engineers" carry around, with the following equations on them:
>>
>>    V = I * R
>>    R = V / I
>>    I = V / R
>>
>> ?
>>
>> I found it: https://docs.google.com/drawings/d/1StlhTYjiUEljnfVtFjP1BXLbixO30DIkbw-DpaYJoA0/edit?hl=en&pli=1
>>
>> Unbelievable. The author of it writes:
>>
>> "I'm going to explain to you how to use this cheat sheet in case you've never seen this before."
>>
>> http://blog.ricardoarturocabral.com/2010/07/electronic-electrical-cheat-sheets.html
>>
>> Makes you want to cry.
> No real electrical engineer would ever use that card, as you connote with your quotes.  If they don't have Ohm's law and the resulting algebra drilled into their head, they better find another job.  I suspect that chart is for amateurs from other backgrounds who happen to be doing some electrical work.

Screw engineers, *anybody* who doesn't know these laws shouldn't be allowed anywhere *near* electricity :D
August 02, 2013
31-Jul-2013 22:20, Walter Bright пишет:
> On 7/31/2013 8:26 AM, Dmitry Olshansky wrote:
>> Ouch... to boot it's always aligned by word size, so
>> key % sizeof(size_t) == 0
>> ...
>> rendering lower 2-3 bits useless, that would make straight slice lower
>> bits
>> approach rather weak :)
>
> Yeah, I realized that, too. Gotta shift it right 3 or 4 bits.

And that helped a bit... Anyhow after doing a bit more pervasive integer hash power of 2 tables stand up to their promise.

The pull that reaps the minor speed benefit over the original (~2% speed gain!):
https://github.com/D-Programming-Language/dmd/pull/2436

Not bad given that _aaGetRValue takes only a fraction of time itself.

I failed to see much of any improvement on Win32 though, allocations are dominating the picture.

And sharing the joy of having a nice sampling profiler, here is what AMD CodeAnalyst have to say (top X functions by CPU clocks not halted).

Original DMD:

Function	 CPU clocks	 DC accesses	 DC misses
RTLHeap::Alloc	 49410	 520	 3624
Obj::ledata	 10300	 1308	 3166
Obj::fltused	 6464	 3218	 6
cgcs_term	 4018	 1328	 626
TemplateInstance::semantic	 3362	 2396	 26
Obj::byte	 3212	 506	 692
vsprintf	 3030	 3060	 2
ScopeDsymbol::search	 2780	 1592	 244
_pformat	 2506	 2772	 16
_aaGetRvalue	 2134	 806	 304
memmove	 1904	 1084	 28
strlen	 1804	 486	 36
malloc	 1282	 786	 40
Parameter::foreach	 1240	 778	 34
StringTable::search	 952	 220	 42
MD5Final	 918	 318	

Variation of DMD with pow-2 tables:

Function	 CPU clocks	 DC accesses	 DC misses
RTLHeap::Alloc	 51638	 552	 3538
Obj::ledata	 9936	 1346	 3290
Obj::fltused	 7392	 2948	 6
cgcs_term	 3892	 1292	 638
TemplateInstance::semantic	 3724	 2346	 20
Obj::byte	 3280	 548	 676
vsprintf	 3056	 3006	 4
ScopeDsymbol::search	 2648	 1706	 220
_pformat	 2560	 2718	 26
memcpy	 2014	 1122	 46
strlen	 1694	 494	 32
_aaGetRvalue	 1588	 658	 278
Parameter::foreach	 1266	 658	 38
malloc	 1198	 758	 44
StringTable::search	 970	 214	 24
MD5Final	 866	 274	 2


This underlies the point that DMC RTL allocator is the biggest speed detractor. It is "followed" by ledata (could it be due to linear search inside?) and surprisingly the tiny Obj::fltused is draining lots of cycles (is it called that often?).

-- 
Dmitry Olshansky
August 02, 2013
Walter Bright, el 30 de July a las 11:13 me escribiste:
> On 7/30/2013 2:59 AM, Leandro Lucarella wrote:
> >I just want to point out that being so much people getting this wrong (and even fighting to convince other people that the wrong interpretation is right) might be an indication that the message you wanted to give in that blog is not extremely clear :)
> 
> It never occurred to me that anyone would have any difficulty understanding the notion of "speed". After all, we deal with it every day when driving.

That's a completely different context, and I don't think anyone think in terms of percentage of speed in the daily life (you just say "my car is twice as fast" or stuff like that, but I think people hardly say "my car is 10% faster" in informal contexts).

For me the problem is, because in informal contexts one tend to think in multipliers of speed, not percentages (or at least I do), is where the confusion comes from, is somehow counter intuitive. I understood what you mean, but I had to think about it, my first reaction was to think you were saying the compiler took 1/4 of the original time. Then I did the math and verified what you said was correct. But I had to do the math.

I'm not say is right or wrong for people to have this reflex of thinking about multipliers, I'm just saying if you care about transmitting the message as clear as you can, is better to use numbers everybody can intuitively think about.

And this is in reply to Andrei too. I understand your POV, but if your main goal is communication (instead of education about side topics), I think is better to stick with numbers and language that minimizes confusion and misinterpretations.

Just a humble opinion of yours truly.

-- 
Leandro Lucarella (AKA luca)                     http://llucax.com.ar/
----------------------------------------------------------------------
GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145  104C 949E BFB6 5F5A 8D05)
----------------------------------------------------------------------
You can try the best you can
If you try the best you can
The best you can is good enough
August 02, 2013
On 8/2/2013 6:16 AM, Dmitry Olshansky wrote:
> 31-Jul-2013 22:20, Walter Bright пишет:
>> On 7/31/2013 8:26 AM, Dmitry Olshansky wrote:
>>> Ouch... to boot it's always aligned by word size, so
>>> key % sizeof(size_t) == 0
>>> ...
>>> rendering lower 2-3 bits useless, that would make straight slice lower
>>> bits
>>> approach rather weak :)
>>
>> Yeah, I realized that, too. Gotta shift it right 3 or 4 bits.
>
> And that helped a bit... Anyhow after doing a bit more pervasive integer hash
> power of 2 tables stand up to their promise.
>
> The pull that reaps the minor speed benefit over the original (~2% speed gain!):
> https://github.com/D-Programming-Language/dmd/pull/2436

2% is worth taking.


> Not bad given that _aaGetRValue takes only a fraction of time itself.
>
> I failed to see much of any improvement on Win32 though, allocations are
> dominating the picture.
>
> And sharing the joy of having a nice sampling profiler, here is what AMD
> CodeAnalyst have to say (top X functions by CPU clocks not halted).
>
> Original DMD:
>
> Function     CPU clocks     DC accesses     DC misses
> RTLHeap::Alloc     49410     520     3624
> Obj::ledata     10300     1308     3166
> Obj::fltused     6464     3218     6
> cgcs_term     4018     1328     626
> TemplateInstance::semantic     3362     2396     26
> Obj::byte     3212     506     692
> vsprintf     3030     3060     2
> ScopeDsymbol::search     2780     1592     244
> _pformat     2506     2772     16
> _aaGetRvalue     2134     806     304
> memmove     1904     1084     28
> strlen     1804     486     36
> malloc     1282     786     40
> Parameter::foreach     1240     778     34
> StringTable::search     952     220     42
> MD5Final     918     318
>
> Variation of DMD with pow-2 tables:
>
> Function     CPU clocks     DC accesses     DC misses
> RTLHeap::Alloc     51638     552     3538
> Obj::ledata     9936     1346     3290
> Obj::fltused     7392     2948     6
> cgcs_term     3892     1292     638
> TemplateInstance::semantic     3724     2346     20
> Obj::byte     3280     548     676
> vsprintf     3056     3006     4
> ScopeDsymbol::search     2648     1706     220
> _pformat     2560     2718     26
> memcpy     2014     1122     46
> strlen     1694     494     32
> _aaGetRvalue     1588     658     278
> Parameter::foreach     1266     658     38
> malloc     1198     758     44
> StringTable::search     970     214     24
> MD5Final     866     274     2
>
>
> This underlies the point that DMC RTL allocator is the biggest speed detractor.
> It is "followed" by ledata (could it be due to linear search inside?) and
> surprisingly the tiny Obj::fltused is draining lots of cycles (is it called that
> often?).

It's not fltused() that is taking up time, it is the static function following it. The sampling profiler you're using is unaware of non-global function names.

August 02, 2013
On 2013-08-02 15:44:13 +0000, Leandro Lucarella said:
> I'm not say is right or wrong for people to have this reflex of thinking
> about multipliers, I'm just saying if you care about transmitting the
> message as clear as you can, is better to use numbers everybody can
> intuitively think about.
> 
> And this is in reply to Andrei too. I understand your POV, but if your
> main goal is communication (instead of education about side topics),
> I think is better to stick with numbers and language that minimizes
> confusion and misinterpretations.
> 
> Just a humble opinion of yours truly.


Fair enough. So what would have been a better way to convey the quantitative improvement?

Thanks,

Andrei

August 02, 2013
On Friday, 2 August 2013 at 17:16:30 UTC, Andrei Alexandrescu wrote:
> On 2013-08-02 15:44:13 +0000, Leandro Lucarella said:
>> I'm not say is right or wrong for people to have this reflex of thinking
>> about multipliers, I'm just saying if you care about transmitting the
>> message as clear as you can, is better to use numbers everybody can
>> intuitively think about.
>> 
>> And this is in reply to Andrei too. I understand your POV, but if your
>> main goal is communication (instead of education about side topics),
>> I think is better to stick with numbers and language that minimizes
>> confusion and misinterpretations.
>> 
>> Just a humble opinion of yours truly.
>
>
> Fair enough. So what would have been a better way to convey the quantitative improvement?

Not to speak on Leandro's behalf, but I think the obvious answer is "Reduced compile times by 43%".

It's much more useful to express it that way because it's easier to apply. Say I have a program that takes 100 seconds to compile. Knowing that the compilation time is reduced by 43% makes it easy to see that my program will now take 57 seconds. Knowing that compilation is 75% faster doesn't help much at all - I have to get out a calculator and divide by 1.75.

It's always better to use a measure that is linear with what you care about. Here, most people care about how long their programs take to compile, not how many programs they can compile per second.
August 02, 2013
On 8/2/13 10:44 AM, Peter Alexander wrote:
> Not to speak on Leandro's behalf, but I think the obvious answer is
> "Reduced compile times by 43%".
>
> It's much more useful to express it that way because it's easier to
> apply. Say I have a program that takes 100 seconds to compile. Knowing
> that the compilation time is reduced by 43% makes it easy to see that my
> program will now take 57 seconds. Knowing that compilation is 75% faster
> doesn't help much at all - I have to get out a calculator and divide by
> 1.75.
>
> It's always better to use a measure that is linear with what you care
> about. Here, most people care about how long their programs take to
> compile, not how many programs they can compile per second.

That's cool, thanks!

Andrei
August 02, 2013
Ha ha, I am a design/controls engineer who deals with speeds and accelerations on a daily basis and yet I was also confused by Walter's statement.

I guess the confusion arises from what one expects (as opposed to understands) by the word "speed" in the given context.

In the context of compiling my SW programs, I only see a dark console with a blocked cursor which I cannot use and every second waited will be felt directly. I don't see any  action or hint of speed. This makes me think that a faster compiler supposed to make me wait less. This creates a kind of mental link between the word "speed" and the feeling of waiting. Hence the expectation: 50% faster compiler should make me wait less by 50%.

Instead of a dark console with a blocked cursor, if I see lots of lines which are been compiled scrolling at very high speed on the screen (like when installing some programs) then I would relate speed with the number of lines scrolling. And my expectation would probably change to: 50% faster compiler would compile 50% more lines per second.

What I am saying is that even though technically we understand what speed is, its the intuitive subjective feeling based on the context which causes an experience of "something doesn't add up".

I will stop blabbering now.