stride in slices (page 4)

On Tuesday, 5 June 2018 at 22:20:08 UTC, DigitalDesigns wrote: > It doesn't matter! The issue that I said was not that ranges were slower but that ranges exist on an abstract on top of language semantics! that means that they can never be faster than the language itself! Anything that a range does can never be faster than doing it by hand. This is the best part. Ranges *ARE* a language semantic. https://tour.dlang.org/tour/en/basics/ranges

On Tuesday, 5 June 2018 at 21:35:03 UTC, Steven Schveighoffer wrote: > On 6/5/18 5:22 PM, DigitalDesigns wrote: >> On Tuesday, 5 June 2018 at 20:07:06 UTC, Ethan wrote: > >>> In conclusion. The semantics you talk about are literally some of the most basic instructions in computing; and that escaping the confines of a for loop for a foreach loop can let the compiler generate more efficient code than 50-year-old compsci concepts can. >> >> Ok asshat! You still don't get it! I didn't say ranges would not compile down to the same thing! Do you have trouble understanding the English language? > > Nope, he doesn't. Look at what you said: > > "Maybe in theory ranges could be more optimal than other semantics but theory never equals practice. " > > And now you have been shown (multiple times) that in practice ranges in fact outperform for loops. Including the assembly to prove it (which helps with this comment: "Having some "proof" that they are working well would ease my mind.") No, you have shown a few fucking cases, why are you guys attacking me for being dense? You can't prove that ranges are more optimal than direct semantics! Do it! I'd like to see you try! > So tone down the attitude, you got what you *clearly* asked for but seem reluctant to acknowledge. Ranges are good, for loops are good too, but not as. So maybe you should just use ranges and use the correct optimization flags and call it a day? Or else use for loops and accept that even though they may not run as quickly, they are "safer" to use since some malicious coder could come along and add in sleeps inside the std.algorithm functions. > > -Steve What it seems is that a few of you are upset because I didn't bow down to your precious range semantics and ask for clarification. At first I was jumped on then someone did some tests and found out that it wasn't so rosy like everyone thought. Of course, the work around is to force optimizations that fix the problem when the problem doesn't exist in for loops. Then you come along and tell me that specific cases prove the general case... that is real dense. You know, it takes two to have an attitude! I asked for information regarding stride. I got the range version, it turned out to be slower in some corner case for some bizarre reason. I was then told it required optimizations(why? That is fishy why the corner cause would be 200% slower for a weird edge case) and then I was told that ranges are always faster(which is what you just said because you act like one example proves everything). Every step of the way I am told "don't worry". You've already stepped in the shit once and you expect me to believe everything you say? Why is it so hard to have a test suite that checks the performance of range constructs instead of just getting me to believe you? Huh? Do you really think I'm suppose to believe every thing any asshat says on the net just because they want me to? Back up your beliefs, that simple. Provide timings for all the range functions in various combinations and give me a worse case scenario compared to their equivalent hand-coded versions. Once you do that then I will be able to make an informed decision rather than doing what you really want, which is except your world as authority regardless of the real truth.

On Tuesday, 5 June 2018 at 22:28:44 UTC, DigitalDesigns wrote: > On Tuesday, 5 June 2018 at 21:35:03 UTC, Steven Schveighoffer wrote: >> [...] > [...] Does ranges not evaluate lazily on some cases. So it'll avoid unnecessary work...and be much faster and efficient. If I'm correct. > > [...]

June 06, 2018

Re: stride in slices

Posted by Timon Gehr
in reply to DigitalDesigns

Permalink

Timon Gehr

Posted in reply to DigitalDesigns

Permalink

On 05.06.2018 21:05, DigitalDesigns wrote:
> On Tuesday, 5 June 2018 at 18:46:41 UTC, Timon Gehr wrote:
>> On 05.06.2018 18:50, DigitalDesigns wrote:
>>> With a for loop, it is pretty much a wrapper on internal cpu logic so it will be near as fast as possible.
>>
>> This is not even close to being true for modern CPUs. There are a lot of architectural and micro-architectural details that affect performance but are not visible or accessible in your for loop. If you care about performance, you will need to test anyway, as even rather sophisticated models of CPU performance don't get everything right.
> 
> Those optimizations are not part of the instruction set so are irrelevant. They will occur with ranges too.
> ...

I was responding to claims that for loops are basically a wrapper on internal CPU logic and nearly as fast as possible. Both of those claims were wrong.

> For loops HAVE a direct cpu semantic! Do you doubt this?
> ...

You'd have to define what that means. (E.g., Google currently shows no hits for "direct CPU semantics".)

> 
> Cpu's do not have range semantics. Ranges are layers on top of compiler semantics... you act like they are equivalent, they are not!

I don't understand why you bring this up nor what you think it means.

The compiler takes a program and produces some machine code that has the right behavior. Performance is usually not formally specified. In terms of resulting behavior, code with explicit for loops and range-based code may have identical semantics. Which one executes faster depends on internal details of the compiler and the target architecture, and it may change over time, e.g. between compiler releases.

> All range semantics must go through the library code then to the compiler then to cpu. For loops of all major systems languages go almost directly to cpu instructions.
> 
> for(int i = 0; i < N; i++)
> 
> translates in to either increment and loop or jump instructions.
> ...

Sure, or whatever else the compiler decides to do. It might even be translated into a memcpy call. Even if you want to restrict yourself to use only for loops, my point stands. Write maintainable code by default and let the compiler do what it does. Then optimize further in those cases where the resulting code is actually too slow. Test for performance regressions.

> There is absolutely no reason why any decent compiler would not use what the cpu has to offer. For loops are language semantics, Ranges are library semantics.

Not really. Also, irrelevant.

> To pretend they are equivalent is wrong and no amount of justifying will make them the same.

Again, I don't think this point is part of this discussion.

> I actually do not know even any commercial viable cpu exists without loop semantics.

What does it mean for a CPU to have "loop semantics"? CPUs typically have an instruction pointer register and possibly some built-in instructions to manipulate said instruction pointer. x86 has some built-in loop instructions, but I think they are just there for legacy support and not actually something you want to use in performant code.

> I also no of no commercially viable compiler that does not wrap those instructions in a for loop(or while, or whatever) like syntax that almost maps directly to the cpu instructions.
> ...

The compiler takes your for loop and generates some machine code. I don't think there is a "commercially viable" compiler that does not sometimes do things that are not direct. And even then, there is no very simple mapping from CPU instructions to observed performance, so the entire point is a bit moot.

>> Also, it is often not necessary to be "as fast as possible". It is usually more helpful to figure out where the bottleneck is for your code and concentrate optimization effort there, which you can do more effectively if you can save time and effort for the remaining parts of your program by writing simple and obviously correct range-based code, which often will be fast as well.
> 
> It's also often not necessary to be "as slow as possible".

This seems to be quoting an imaginary person. My point is that to get even faster code, you need to spend effort and often get lower maintainability. This is not always a good trade-off, in particular if the optimization does not improve performance a lot and/or the code in question is not executed very often.

> I'm not asking for about generalities but specifics. It's great to make generalizations about how things should be but I would like to know how they are.

That's a bit unspecific.

> Maybe in theory ranges could be more optimal than other semantics but theory never equals practice.
> 

I don't know who this is addressed to. My point was entirely practical.

On 6/5/18 6:28 PM, DigitalDesigns wrote: > Once you do that then I will be able to make an informed decision rather than doing what you really want, which is except your world as authority regardless of the real truth. It's "accept" not "except". You need to *accept* my world as authority. Here, this may help: https://www.vocabulary.com/articles/chooseyourwords/accept-except/ -Steve

On Monday, 4 June 2018 at 18:47:02 UTC, Dennis wrote: > On Monday, 4 June 2018 at 18:11:47 UTC, Steven Schveighoffer wrote: >> BTW, do you have cross-module inlining on? I wonder if that makes a difference if you didn't have it on before. (I'm somewhat speaking from ignorance, as I've heard people talk about this limitation, but am not sure exactly when it's enabled) Cross-module inlining is never implicitly enabled in LDC. Not having it enabled is definitely something that hurts performance of LDC generated code. Enable it with: `-enable-cross-module-inlining`. (May lead to missing symbols during linking when using templates with __FILE__ arguments.) > I don't know much about this either. Clang has link-time optimization with -O4, but looking at the --help of LDC it turns out -O4 is equivalent to -O3 for D. Maybe someone else knows? Clang and LDC treat `-O4` as `-O3`. To enable LTO (cross-module inlining and other big perf gains), you have to use `-flto=full`or `-flto=thin`, but it'll only give cross module inlining for modules that have been compiled with it. Notably: default Phobos/druntime is _not_ compiled with LTO enabled. LDC 1.9.0 release packages on Github ship with a second set of LTO Phobos/druntime libs. With LDC 1.9.0, you can do `-flto=<thin|full> -defaultlib=phobos2-ldc-lto,druntime-ldc-lto`, for maximum druntime/Phobos inlining. -Johan

On Wednesday, 6 June 2018 at 14:23:29 UTC, Steven Schveighoffer wrote: > On 6/5/18 6:28 PM, DigitalDesigns wrote: >> Once you do that then I will be able to make an informed decision rather than doing what you really want, which is except your world as authority regardless of the real truth. > > It's "accept" not "except". You need to *accept* my world as authority. > > Here, this may help: > > https://www.vocabulary.com/articles/chooseyourwords/accept-except/ > > -Steve This is genius!! This made my day! I feel 40 years younger!!! It took me back to 3rd grade English class. I can remember when little Lissa Lou was sitting next to me on a fine Wednesday morning! She had just used a word in the wrong context and out of the blue the gestapo barged in shoved a grenade down her throat then tossed her out the window! It was a fun day! We all had icecream and danced like the little Nazi's we were! That was the good ol days! Thanks!

Forums