June 05, 2018
On Tuesday, 5 June 2018 at 18:46:41 UTC, Timon Gehr wrote:
> On 05.06.2018 18:50, DigitalDesigns wrote:
>> With a for loop, it is pretty much a wrapper on internal cpu logic so it will be near as fast as possible.
>
> This is not even close to being true for modern CPUs. There are a lot of architectural and micro-architectural details that affect performance but are not visible or accessible in your for loop. If you care about performance, you will need to test anyway, as even rather sophisticated models of CPU performance don't get everything right.

Those optimizations are not part of the instruction set so are irrelevant. They will occur with ranges too.

For loops HAVE a direct cpu semantic! Do you doubt this?


Cpu's do not have range semantics. Ranges are layers on top of compiler semantics... you act like they are equivalent, they are not! All range semantics must go through the library code then to the compiler then to cpu. For loops of all major systems languages go almost directly to cpu instructions.

for(int i = 0; i < N; i++)

translates in to either increment and loop or jump instructions.

There is absolutely no reason why any decent compiler would not use what the cpu has to offer. For loops are language semantics, Ranges are library semantics. To pretend they are equivalent is wrong and no amount of justifying will make them the same. I actually do not know even any commercial viable cpu exists without loop semantics. I also no of no commercially viable compiler that does not wrap those instructions in a for loop(or while, or whatever) like syntax that almost maps directly to the cpu instructions.

> Also, it is often not necessary to be "as fast as possible". It is usually more helpful to figure out where the bottleneck is for your code and concentrate optimization effort there, which you can do more effectively if you can save time and effort for the remaining parts of your program by writing simple and obviously correct range-based code, which often will be fast as well.

It's also often not necessary to be "as slow as possible". I'm not asking for about generalities but specifics. It's great to make generalizations about how things should be but I would like to know how they are. Maybe in theory ranges could be more optimal than other semantics but theory never equals practice.



June 05, 2018
On Tuesday, 5 June 2018 at 19:05:27 UTC, DigitalDesigns wrote:
> For loops HAVE a direct cpu semantic! Do you doubt this?

What are they?

And for bonus points, are they actually used by compilers?

Then the final question: is the generated code any different than inlined ranges?
June 05, 2018
On 6/5/18 3:05 PM, DigitalDesigns wrote:

> It's also often not necessary to be "as slow as possible". I'm not asking for about generalities but specifics. It's great to make generalizations about how things should be but I would like to know how they are. Maybe in theory ranges could be more optimal than other semantics but theory never equals practice.

Again, I want to stress, ranges are not "as slow as possible", and it's clear from the numbers posted here that they are faster than for loops in practice, at least for this example.

-Steve

June 05, 2018
On Tuesday, 5 June 2018 at 19:05:27 UTC, DigitalDesigns wrote:
> For loops HAVE a direct cpu semantic! Do you doubt this?

...

Right. If you're gonna keep running your mouth off. How about looking at some disassembly then.

for(auto i=0; i<a.length; i+=strideAmount)

Using ldc -O4 -release for x86_64 processors, the initialiser translates to:

mov byte ptr [rbp + rcx], 0

The comparison translates to:

cmp r13, rcx
ja .LBB0_2

And the increment and store translates to:

mov byte ptr [rbp + rcx], 0
movsxd rcx, eax
add eax, 3

So. It uses three of the most basic instructions you can think of: mov, cmp, j<cond>, add.

Now, what might you ask are the instructions that a range compiles down to when everything is properly inlined?

The initialisation, since it's a function, pulls from the stack.

mov rax, qword ptr [rsp + 16]
movsxd rcx, dword ptr [rsp + 32]

But the comparison looks virtually identical.

cmp rax, rcx
jb .LBB2_4

But how does it do the add? With some register magic.

movsxd rcx, edx
lea edx, [rcx + r9]

Now, what that looks like it's doing to me is combing the pointer load and index increment in to two those two instructions. One instruction less than the flat for loop.

In conclusion. The semantics you talk about are literally some of the most basic instructions in computing; and that escaping the confines of a for loop for a foreach loop can let the compiler generate more efficient code than 50-year-old compsci concepts can.
June 05, 2018
On Tuesday, 5 June 2018 at 20:07:06 UTC, Ethan wrote:
> On Tuesday, 5 June 2018 at 19:05:27 UTC, DigitalDesigns wrote:
>> For loops HAVE a direct cpu semantic! Do you doubt this?
>
> ...
>
> Right. If you're gonna keep running your mouth off. How about looking at some disassembly then.
>
> for(auto i=0; i<a.length; i+=strideAmount)
>
> Using ldc -O4 -release for x86_64 processors, the initialiser translates to:
>
> mov byte ptr [rbp + rcx], 0
>
> The comparison translates to:
>
> cmp r13, rcx
> ja .LBB0_2
>
> And the increment and store translates to:
>
> mov byte ptr [rbp + rcx], 0
> movsxd rcx, eax
> add eax, 3
>
> So. It uses three of the most basic instructions you can think of: mov, cmp, j<cond>, add.
>
> Now, what might you ask are the instructions that a range compiles down to when everything is properly inlined?
>
> The initialisation, since it's a function, pulls from the stack.
>
> mov rax, qword ptr [rsp + 16]
> movsxd rcx, dword ptr [rsp + 32]
>
> But the comparison looks virtually identical.
>
> cmp rax, rcx
> jb .LBB2_4
>
> But how does it do the add? With some register magic.
>
> movsxd rcx, edx
> lea edx, [rcx + r9]
>
> Now, what that looks like it's doing to me is combing the pointer load and index increment in to two those two instructions. One instruction less than the flat for loop.
>
> In conclusion. The semantics you talk about are literally some of the most basic instructions in computing; and that escaping the confines of a for loop for a foreach loop can let the compiler generate more efficient code than 50-year-old compsci concepts can.

Ok asshat! You still don't get it! I didn't say ranges would not compile down to the same thing! Do you have trouble understanding the English language?

You don't seem to get the concept where ranges are library solutions and someone can some along at any time and modify some code and WHAM! They no longer compile down to your efficient precious instructions. That is far more unlikely to occur with language semantics. Why is that so difficult for you to understand you sure do you have an attitude for someone that has difficulty with English.


June 05, 2018
On 6/5/18 5:22 PM, DigitalDesigns wrote:
> On Tuesday, 5 June 2018 at 20:07:06 UTC, Ethan wrote:

>> In conclusion. The semantics you talk about are literally some of the most basic instructions in computing; and that escaping the confines of a for loop for a foreach loop can let the compiler generate more efficient code than 50-year-old compsci concepts can.
> 
> Ok asshat! You still don't get it! I didn't say ranges would not compile down to the same thing! Do you have trouble understanding the English language?

Nope, he doesn't. Look at what you said:

"Maybe in theory ranges could be more optimal than other semantics but theory never equals practice. "

And now you have been shown (multiple times) that in practice ranges in fact outperform for loops. Including the assembly to prove it (which helps with this comment: "Having some "proof" that they are working well would ease my mind.")

So tone down the attitude, you got what you *clearly* asked for but seem reluctant to acknowledge. Ranges are good, for loops are good too, but not as. So maybe you should just use ranges and use the correct optimization flags and call it a day? Or else use for loops and accept that even though they may not run as quickly, they are "safer" to use since some malicious coder could come along and add in sleeps inside the std.algorithm functions.

-Steve
June 05, 2018
On Tuesday, 5 June 2018 at 21:22:27 UTC, DigitalDesigns wrote:
> Ok asshat!

Take it to reddit. Back your arguments up with actual knowledge and intelligence, not unfounded agression.
June 05, 2018
On Tuesday, 5 June 2018 at 21:52:03 UTC, Ethan wrote:
> On Tuesday, 5 June 2018 at 21:22:27 UTC, DigitalDesigns wrote:
>> Ok asshat!
>
> Take it to reddit. Back your arguments up with actual knowledge and intelligence, not unfounded agression.

You are an idiot! You obviously do not understand basic logic. Unfounded aggression? Yep, way to see how you didn't start it! Must be nice being the bully!
June 05, 2018
On Tuesday, 5 June 2018 at 21:54:20 UTC, DigitalDesigns wrote:
> You are an idiot!

Take it to reddit. Back your arguments up with actual knowledge and intelligence, not unfounded agression.
June 05, 2018
On Tuesday, 5 June 2018 at 19:18:15 UTC, Adam D. Ruppe wrote:
> On Tuesday, 5 June 2018 at 19:05:27 UTC, DigitalDesigns wrote:
>> For loops HAVE a direct cpu semantic! Do you doubt this?
>
> What are they?
>
> And for bonus points, are they actually used by compilers?
>
> Then the final question: is the generated code any different than inlined ranges?

It doesn't matter! The issue that I said was not that ranges were slower but that ranges exist on an abstract on top of language semantics! that means that they can never be faster than the language itself! Anything that a range does can never be faster than doing it by hand.

This is not a hard concept to understand. I know everyone wants to defend their precious ranges but what happens is irrelevant in some particular case. Ranges could be 100x faster in some specific case but it doesn't change the fact that they are an abstraction on top of the language, not in the language.

I've already pointed out, and made it clear, I will do it one last time:

There is no guarantee(doesn't have to be 100% by the rule of god) by the compiler that a ranges performance will even come close to hand written code or that it won't all of a sudden change when someone decides to "optimize" it and actually makes it worse! These problems are far less likely to occur in a well established language semantic.

I can't believe people are really defending library solutions as not having these issues, but I guess that is the nature of most humans.