September 09, 2008
Andrei Alexandrescu wrote:

> A slice is a range alright without any extra adaptation. It has some extra functions, e.g. ~=, that are not defined for ranges.

Aren't slices const/readonly/whatnot and thus ~= not possible without copying/allocation?

-- 
Lars Ivar Igesund
blog at http://larsivi.net
DSource, #d.tango & #D: larsivi
Dancing the Tango
September 09, 2008
Lars Ivar Igesund wrote:
> Andrei Alexandrescu wrote:
> 
>> A slice is a range alright without any extra adaptation. It has some
>> extra functions, e.g. ~=, that are not defined for ranges.
> 
> Aren't slices const/readonly/whatnot and thus ~= not possible without
> copying/allocation?

Well there's no change in semantics of slices (meaning T[]) between D1 and D2, so slices mean business as usual. Maybe you are referring to strings, aka invariant(char)[]?

Anyhow, today's ~= behaves really really erratically. I'd get rid of it if I could. Take a look at this:

import std.stdio;

void main(string args[]) {
    auto a = new int[10];
    a[] = 10;
    auto b = a;
    writeln(b);
    a = a[1 .. 5];
    a ~= [ 34, 345, 4324 ];
    writeln(b);
}

The program will print all 10s two times. But if we change a[1 .. 5] with a[0 .. 5] the behavior will be very different! a will grow "over" b, thus stomping over its content.

This is really bad because the behavior of a simple operation ~= depends on the history of the slice on the left hand side, something often extremely difficult to track, and actually impossible if the slice was received as an argument to a function.

IMHO such a faux pas is inadmissible for a modern language.


Andrei
September 09, 2008
Andrei Alexandrescu wrote:

> Lars Ivar Igesund wrote:
>> Andrei Alexandrescu wrote:
>> 
>>> A slice is a range alright without any extra adaptation. It has some extra functions, e.g. ~=, that are not defined for ranges.
>> 
>> Aren't slices const/readonly/whatnot and thus ~= not possible without copying/allocation?
> 
> Well there's no change in semantics of slices (meaning T[]) between D1
> and D2, so slices mean business as usual. Maybe you are referring to
> strings, aka invariant(char)[]?

No, I actually referred to what you say below. My point was that ~= is an unsafe operation on slices (not impossible as I apparently said), and thus you need copy-on-write to be safe from erratic behaviour.

> 
> Anyhow, today's ~= behaves really really erratically. I'd get rid of it if I could. Take a look at this:
> 
> import std.stdio;
> 
> void main(string args[]) {
>      auto a = new int[10];
>      a[] = 10;
>      auto b = a;
>      writeln(b);
>      a = a[1 .. 5];
>      a ~= [ 34, 345, 4324 ];
>      writeln(b);
> }
> 
> The program will print all 10s two times. But if we change a[1 .. 5] with a[0 .. 5] the behavior will be very different! a will grow "over" b, thus stomping over its content.
> 
> This is really bad because the behavior of a simple operation ~= depends on the history of the slice on the left hand side, something often extremely difficult to track, and actually impossible if the slice was received as an argument to a function.
> 
> IMHO such a faux pas is inadmissible for a modern language.
> 
> 
> Andrei

-- 
Lars Ivar Igesund
blog at http://larsivi.net
DSource, #d.tango & #D: larsivi
Dancing the Tango
September 09, 2008
Andrei Alexandrescu, el  9 de septiembre a las 10:30 me escribiste:
> >I think STL refugees can deal with it. I think there is no point on keep compromising D's readability because of C/C++ (you just mentioned enum, another really bad choice to keep C/C++ refugees happy).
> 
> I agree. I just don't think that choosing one name over a synonym name compromises much.
> 
> >I find left and right a little obscure too, it all depends on your mental image of a range. Using front/back or begin/end is much more clearer.
> 
> I'd like to go with:
> 
> r.first
> r.last

Much better, thank you! =)

-- 
Leandro Lucarella (luca) | Blog colectivo: http://www.mazziblog.com.ar/blog/
----------------------------------------------------------------------------
GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145  104C 949E BFB6 5F5A 8D05)
----------------------------------------------------------------------------
De todos los amigos que he tenido, eres el primero.
	-- Bender
September 09, 2008
BCS wrote:
> I was referring to the implementation as visible from the called code's side

opApply isn't going to go away, it will still be there as an alternative.
September 09, 2008
superdan wrote:
> Andrei Alexandrescu Wrote:
> 
>> What we want is a design that tells the truth. And a design that tells
>> the truth is this:
>>
>> r.isEmpty does whatever the hell it takes to make sure whether there's
>> data available or not. It is not const and it could throw an exception.
>>
>> v = r.getNext returns BY VALUE by means of DESTRUCTIVE COPY data that
>> came through the wire, data that the client now owns as soon as getNext
>> returned. There is no extra copy, no extra allocation, and the real
>> thing has happened: data has been read from the outside and user code
>> was made the only owner of it.
> 
> this is really kool n the gang. there's a sore point tho. if i wanna read strings from a file no prob.
> 
> for (auto r = stringSucker(stdin); !r.isEmpty(); )
> {
>     string s = r.getNext();
>     // play with s
> }
> 
> but a new string is allocated every line. that's safe but slow. so i want some more efficient stuff. i should use char[] because string don't deallocate.
> 
> for (auto r = charArraySucker(stdin); !r.isEmpty(); )
> {
>     char[] s = r.getNext();
>     // play with s
> }
> 
> no improvement. same thing a new char[] is alloc each time. maybe i could do
> 
> for (auto r = charArraySucker(stdin); !r.isEmpty(); )
> {
>     char[] s = r.getNext();
>     // play with s
>     delete s;
> }
> 
> would this make stuff faster. maybe. maybe not. and it's not general. what i'd like is some way of telling the range, i'm done with this you can recycle and reuse it. it's a green green world.
> 
> for (auto r = charArraySucker(stdin); !r.isEmpty(); )
> {
>     char[] s = r.getNext();
>     // play with s
>     r.recycle(s);
> }
> 
> sig is recycle(ref ElementType!(R)).

This is a great point. Unfortunately that won't quite work properly. Equally unfortunately, you just revealed an issue with my design.

One goal of range design is that algorithms written for "inferior" ranges should work seamlessly with "superior" ranges. For example, find should work fine with any range, so we should write (with your idea):

R find(R, V)(R r, V v)
{
    ElementType!(R) tmp;
    for (; !r.isEmpty; r.recycle(tmp))
    {
        tmp = r.getNext;
        if (tmp == v) break;
    }
    return r;
}

This looks great but imagine we apply this to a collection of some costly ElementType!(R). Then getNext rightly returns a reference because there's no need to create a copy unless absolutely necessary. But then we realize that our look forces a copy no matter what! The copy was good for the input range. But it's bad for the actual container that wouldn't need to copy anything.

It looks like I need to reconsider the design mentioned by Lionello, in which both input iterators and forward iterators expose separate isEmpty/next/first operations. Then an input range r caches the last read value internally and gives access to it through r.first.

I have an idea on how to disallow at least egregious errors. Consider:

void fill(R, V)(R r, V v)
{
    for (; !r.isEmpty; r.next) r.first = v;
}

If "r.first = v;" compiles, then the following scandalous misuse goes unpunished:

fill(charArraySucker, "bogus");

A way to prevent r.first = v from compiling is to define a one-argument function first:

struct MyRange(T)
{
    private void first(W)(W whatever); // not implemented
    ref T first();
}

Expression r.first will resolve to the second function. Expression r.first = v will translate into r.first(v) so it will resolve to the first function, which will fail the protection test.

Then there remains the problem:

void bump(R, V)(R r)
{
    for (; !r.isEmpty; r.next) ++r.first;
}

bump(charArraySucker); // bogus

Sigh...


Andrei
September 09, 2008
Walter Bright wrote:
> BCS wrote:
>> I was referring to the implementation as visible from the called code's side
> 
> opApply isn't going to go away, it will still be there as an alternative.

I disagree with that as I think that would be one perfect place to simplify the language thus fulfilling bearophile's and many others' wish.

Say we refine ranges to work with foreach to perfection. Then we have:

1. A foreach that sucks

2. A foreach that rocks

The obvious question is, why keep the one that sucks?


Andrei
September 09, 2008
Andrei Alexandrescu wrote:
> Sean Kelly wrote:
>> Andrei Alexandrescu wrote:
>>> Sean Kelly wrote:
>>>> Andrei Alexandrescu wrote: ...
>>>>>
>>>>> After much thought and discussions among Walter, Bartosz and
>>>>> myself, I defined a range design and reimplemented all of
>>>>> std.algorithm and much of std.stdio in terms of ranges alone.
>>>>
>>>> Yup.  This is why I implemented all of Tango's algorithms specifically for arrays from the start--slices represent a
>>>> reasonable approximation of ranges, and this seems far preferable
>>>> to the iterator approach of C++.  Glad to hear that this is what
>>>> you've decided as well.
>>>
>>> That's great to hear, but I should warn you that moving from arrays
>>> to "the lowest range that will do" is not quite easy. Think of std::rotate for example.
>>
>> I'll admit that I find some of the features <algorithm> provides to
>> be pretty weird.  Has anyone ever actually wanted to sort something
>> other than a random-access range in C++?  Or rotate one, for example?
> 
> Great questions. I don't recall having needed to sort a list lately, but rotate is a great function that has an undeservedly odd name. What rotate does is to efficiently transform this:
> 
> a, b, c, d, e, f, A, B, C, D
> 
> into this:
> 
> A, B, C, D, a, b, c, d, e, f
> 
> I use that all the time because it's really a move-to-front operation.

Ah, so it's a bit like partition and select.  I use the two of those constantly, but haven't ever had a need for rotate.  Odd, I suppose, since they're so similar.

> In fact my algorithm2 implementation does not call it rotate anymore, it calls it moveToFront and allows you to move any subrange of a range to the front of that range efficiently. It's a useful operation in a great deal of lookup strategies.
> 
>> These operations are allowed, but to me they fall outside the realm
>> of useful functionality.  I suppose there may be some relation here
>> to Stepanov's idea of a computational basis.  Should an algorithm
>> operate on a range if it cannot do so efficiently?  And even if it
>> does, will anyone actually use it?
> 
> I think it all depends on what one's day-to-day work consists of. I was chatting to Walter about it and he confessed that, although he has a great deal of respect for std.algorithm, he's not using much of it. I told him back that I need 80% of std.algorithm on a daily basis. In fact that's why I wrote it - otherwise I wouldn't have had the time to put into it.

Exactly.  I implemented Tango's Array module for the same reason.  Other than rotate and stable_sort, I think the module has everything from <algorithm>, plus some added bits.

> This is because I make next to no money so I can afford to work on basic research, which is "important" in a long-ranging way. Today's computing is quite disorganized and great energy is expended on gluing together various pieces, protocols, and interfaces. I've worked in that environment quite a lot, and dealing with glue can easily become 90% of a day's work, leaving only little time to get occupied with a real problem, such as making a computer genuinely smarter or at least more helpful towards its user. All too often we put a few widgets on a window and the actual logic driving those buttons - the "smarts", the actual "work" gets drowned by details taking care of making that logic stick to the buttons.

I've never worked in that environment, but I would think that even such positions require the use of algorithms.  If not, then I wouldn't consider them to be software engineering positions.  As for research-- I'd say that's a fairly broad category.  My first salaried position was in R&D for a switched long-distance carrier, for example, but that's applied research as opposed to academic research, which I believe you're describing.  I think there are benefits to each, but the overlap is what truly interests me.

> I mentioned in a talk once that any programmer should know how to multiply two matrices. Why? Because if you don't, you can't tackle a variety of problems that can be easily expressed in terms of matrix multiplication, even though they have nothing to do with algebra (rotating figures, machine learning, fractals, fast series...). A person in the audience said that she never actually needs to multiply two matrices, so why bother? I gave an evasive response, but the reality was that that was a career-limiting state of affairs for her.

Yup.  This is a lot like the argument for Calculus in a CS curriculum. Entry-level software positions rarely require such things and yet I've been surprised at how many times they have proven useful over the years, simply for general problem-solving.  And there's certainly no debate about Linear Algebra--they may as well rename it "math for computer programming."


Sean
September 09, 2008
Andrei Alexandrescu wrote:
> 
> Then there remains the problem:
> 
> void bump(R, V)(R r)
> {
>     for (; !r.isEmpty; r.next) ++r.first;
> }
> 
> bump(charArraySucker); // bogus
> 
> Sigh...

And I suppose you don't want to return a const reference from first() because the user may want to operate on the value, if only on a temporary basis?  Let's say:

P findFirst(P, R, C)( R r, C c )
{
    for( ; !r.isEmpty; r.next )
    {
        // modify the temporary because a new element is expensive
        // to copy-construct
        if( auto p = c.contains( ++r.first ) )
            return p;
    }
    return P.init;
}

Hm... must the implementation prevent stupid mistakes such as your example?  Ideally, yes.  But I don't see a way to do so and yet allow for routines like findFirst() above.


Sean
September 09, 2008
Andrei Alexandrescu wrote:
> Lars Ivar Igesund wrote:
>> Andrei Alexandrescu wrote:
>>
>>> A slice is a range alright without any extra adaptation. It has some
>>> extra functions, e.g. ~=, that are not defined for ranges.
>>
>> Aren't slices const/readonly/whatnot and thus ~= not possible without
>> copying/allocation?
> 
> Well there's no change in semantics of slices (meaning T[]) between D1 and D2, so slices mean business as usual. Maybe you are referring to strings, aka invariant(char)[]?

I do think it's a fair point that ~= could be considered an operation that constructs a new container (an array, in this case) using a range (slice) as an initializer.  The weird issue right now is that there is effectively no difference between a slice and an array insofar as the language or code representation are concerned.  In many instances this is an advantage, but it leads to some issues, like the one you describe below.

> Anyhow, today's ~= behaves really really erratically. I'd get rid of it if I could. Take a look at this:
> 
> import std.stdio;
> 
> void main(string args[]) {
>     auto a = new int[10];
>     a[] = 10;
>     auto b = a;
>     writeln(b);
>     a = a[1 .. 5];
>     a ~= [ 34, 345, 4324 ];
>     writeln(b);
> }
> 
> The program will print all 10s two times. But if we change a[1 .. 5] with a[0 .. 5] the behavior will be very different! a will grow "over" b, thus stomping over its content.
> 
> This is really bad because the behavior of a simple operation ~= depends on the history of the slice on the left hand side, something often extremely difficult to track, and actually impossible if the slice was received as an argument to a function.
> 
> IMHO such a faux pas is inadmissible for a modern language.

This would be easy to fix by making arrays / slices fatter (by adding a capacity field, for example), but I'm still not convinced that's the right thing to do.  However, it may well be preferable to eliminating appending completely.  The obvious alternative would be either to resurrect head const (not gonna happen) or to make append always reallocation (not at all ideal).


Sean