September 20, 2013
On Fri, Sep 20, 2013 at 05:13:45PM +0200, Joseph Rushton Wakeling wrote:
> On 20/09/13 16:48, H. S. Teoh wrote:
> >A container should not be confused with a range. That way leads to dragons. :-P  (It's rather unfortunate that built-in arrays conflate the two, it leads to a lot of wrong code that works only with arrays but not with "real" ranges.)
> 
> Built-in arrays are not _always_ ranges.  Consider const(int[]) ... as I found out recently, it's _not_ a range, because you can't popFront on a const entity.

Which makes it even more confusing, since newbies would probably equate arrays (the container) with ranges from their frequent use in range examples.

Perhaps it's more useful to think of T[] not as an array per se, but as a *slice* of the underlying array data which is managed by druntime. I think I'm OK with saying that arrays (i.e. the underlying data) are containers, while slices (what we think of today as "arrays") are ranges.

Of course, the distinction isn't that clear cut, because appending to a "slice" creates a new array and returns a slice of that, so there's still some murky areas here.


T

-- 
Life would be easier if I had the source code. -- YHL
September 20, 2013
On 9/20/13 2:36 AM, bearophile wrote:
> In D it's often better to use std.algorithm/range instead of raw foreach
> loops (despite probably unlike C++ in D a heavy use of ranges leads to a
> lower performance, even if you use the LDC2 compiler).

Why would there be a performance loss?

Andrei

September 20, 2013
On 9/20/13 3:47 AM, Dmitry Olshansky wrote:
> 20-Sep-2013 14:00, Jacob Carlborg пишет:
>> On 2013-09-20 11:37, Szymon Gatner wrote:
>>
>>> If only std algorithms took containers (by that I mean things that
>>> container for accepts too) as arguments (and not iterators)... even in
>>> the form of new functions like foreach_all, transform_all etc.
>>
>> Can't a container be a range as well?
>>
>
> For Christ sake no, no and no. For starters range skips/drops elements
> when iterating, and thusly iteration has its own state that is useless
> for a container otherwise.
>
> The idea of "contange" spreads like virus, no matter how abominable the
> end result is. The fact that slices sometimes look like containers (BTW
> ill-suited for anything beyond primitives/PODish stuff )  must be the
> corner stone of this belief. Strengthened on the motto of trying to make
> user defined types to mimic behavior of built-ins it leads to a school
> of thought that it's fine to blend ownership and access.
>
> TL;DR: Suboptimal, unnatural and error prone are keywords.

Agreed. A container is a range factory.

Andrei

September 20, 2013
On 20/09/13 17:22, H. S. Teoh wrote:
> Which makes it even more confusing, since newbies would probably equate
> arrays (the container) with ranges from their frequent use in range
> examples.

Yes, it's completely non-obvious.  I think the first time I realized the range/container distinction was when I tried experimentally replacing some built-in arrays with std.container.Array and discovered that I couldn't use them in a foreach loop.

> Perhaps it's more useful to think of T[] not as an array per se, but as
> a *slice* of the underlying array data which is managed by druntime. I
> think I'm OK with saying that arrays (i.e. the underlying data) are
> containers, while slices (what we think of today as "arrays") are
> ranges.

It's not a bad description, but I'm not sure that takes into account the const(T[]) case.
September 20, 2013
On 20/09/13 17:28, Andrei Alexandrescu wrote:
> Why would there be a performance loss?

Depends on the particular case, but my experience is that _in practice_ stuff based around range interfaces can often be slower than raw iteration.

I don't think anyone's saying a performance loss is inevitable or unavoidable, but there currently is one.
September 20, 2013
On Friday, 20 September 2013 at 15:09:57 UTC, Dmitry Olshansky wrote:
> 20-Sep-2013 18:13, Szymon Gatner пишет:
>> On Friday, 20 September 2013 at 14:04:06 UTC, Dmitry Olshansky wrote:
>
>>> >> Can't a container be a range as well?
>>> >>
>>>
>>> >For Christ sake no, no and no.
>>>
>>> [... Because that would be ...]
>>>
>>> > TL;DR: Suboptimal, unnatural and error prone are keywords.
>>>
>>> Then your question - Why would it be suboptimal?
>>>
>>> Which your second reply seem to clearly explain: extra state placed
>>> where it doesn't belong. I can't easily correlate your two answers as
>>> they look as if the second one answers questions of the first.
>>> Anyhow we are in agreement here.
>>
>> Mind that "Can't a container be a range as well?" was not from me.
>
> Yup, but it was what I was answering ... when you asked about why would it be suboptimal... A container that is a range at the same time is suboptimal, that's all.
>
>> Still, moving computation over a range from for loop body into the range
>> implementation (like filtering) incurs no performance penalty and has
>> additional benefits of composability and maintainability. Do you mean
>> that is suboptimal?
>
> I made no statement on this I think. It shouldn't be ineffective.
> The only problem with it that I foresee is that in order to take advantage of vectorized SIMD in CPU it could be harder to auto-magically vectorize such code for compiler.
>
> For that matter I think e.g. providing explicitly a range of float4 fells like a better way to go.

Yup, we agree then :) I think I add to confusion also because I think about C++ and not D. I didn't realize that arrays/slices are ranges in D. That is not good indeed.
September 20, 2013
On 9/20/2013 8:54 AM, Joseph Rushton Wakeling wrote:
> On 20/09/13 17:28, Andrei Alexandrescu wrote:
>> Why would there be a performance loss?
>
> Depends on the particular case, but my experience is that _in practice_ stuff
> based around range interfaces can often be slower than raw iteration.
>
> I don't think anyone's saying a performance loss is inevitable or unavoidable,
> but there currently is one.

I know that, at least with dmd's back end, it's because the optimizer was built around the kinds of things that C++ programmers tend to write. The D range/algorithm generates unusual code (for the back end) that the back end doesn't optimize for.

For example, ranges tend to lump several variables together and call them a 'struct'. The back end is not tuned to deal with structs as an aggregate of discreet 'variables', meaning that such variables don't get assigned to registers. Structs are treated as a lump.

This is not a fundamental performance problem with D.

The back end needs to be improved to "dis-aggregate" structs back into discreet variables.
September 20, 2013
On 20/09/13 18:01, Walter Bright wrote:
> I know that, at least with dmd's back end, it's because the optimizer was built
> around the kinds of things that C++ programmers tend to write. The D
> range/algorithm generates unusual code (for the back end) that the back end
> doesn't optimize for.
>
> For example, ranges tend to lump several variables together and call them a
> 'struct'. The back end is not tuned to deal with structs as an aggregate of
> discreet 'variables', meaning that such variables don't get assigned to
> registers. Structs are treated as a lump.
>
> This is not a fundamental performance problem with D.
>
> The back end needs to be improved to "dis-aggregate" structs back into discreet
> variables.

I wouldn't single out DMD for criticism -- I don't know to what extent the underlying reasons overlap, but all the compilers cope less well with range-based material than they might in theory.

The canonical example would be something like,

    foreach (i; iota(10)) { ... }

which in theory shouldn't be any slower than,

    foreach (i; 0 .. 10) { ... }

but in practice is, no matter what the compiler.

September 20, 2013
On 9/20/13 7:55 AM, Craig Dillabaugh wrote:
> Excuse my lack of knowledge on Ranges, so maybe what I am proposing is
> infeasible, for a binary tree for example couldn't you have a
> InOrderRange, PreOrderRange, and PostOrderRange or alternately
> DepthFirstRange, and BreadFirstRange.  From a consumers view those would
> be linear operations?

The tree object would have functions returning by value each of these range types.

Andrei


September 20, 2013
On 9/20/13 8:13 AM, Joseph Rushton Wakeling wrote:
> On 20/09/13 16:48, H. S. Teoh wrote:
>> A container should not be confused with a range. That way leads to
>> dragons. :-P  (It's rather unfortunate that built-in arrays conflate the
>> two, it leads to a lot of wrong code that works only with arrays but not
>> with "real" ranges.)
>
> Built-in arrays are not _always_ ranges.  Consider const(int[]) ... as I
> found out recently, it's _not_ a range, because you can't popFront on a
> const entity.
>

Well you can't call popFront on any const range.

After some time I learned to make peace with the dual nature of built-in slices. Part of that was the failed experiment to introduce the type "new T[]". Another part is the simple realization that built-in slices are not "regular" ranges, they have deeper connections in D's object model, so they're allowed to be a little special.


Andrei