RFC on range design for D2 (page 7) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » Announce » RFC on range design for D2 (page 7)

September 09, 2008

Re: RFC on range design for D2

Posted by Leandro Lucarella
in reply to superdan

Leandro Lucarella

Posted in reply to superdan

superdan, el  9 de septiembre a las 10:12 me escribiste:
> > For how
> > long you will have the memory "alive", it will use "regular" GC semantics
> > (i.e., when nobody points at it anymore)? In that case, leting the
> > programmer leave dangling pointers to data that should be "dead" without
> > crashing, wouldn't make easier to introduce memory leaks?
> 
> such is the peril of gc. clearly meshing scoping with gc ain't gonna be perfect. but i like the next best thing. scarce resources are deallocated quick. memory stays around for longer. no dangling pointers.

I was talking about logical[1] memory leaks, wich are possible even with GC.

int[] a;
if (condition) {
   Array!(int) b;
      a = b.all;
}

If you expect that b memory is freed at the end of the scope, and
a retains it, is a logical memory leak (you probably forgot to null
a before the scope ended). I think this kind of errors should be detected
as soon as possible, as opposed to let a keep using that memory (or leak
it).

[1] http://en.wikipedia.org/wiki/Garbage_collection_(computer_science)#Benefits

-- 
Leandro Lucarella (luca) | Blog colectivo: http://www.mazziblog.com.ar/blog/
----------------------------------------------------------------------------
GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145  104C 949E BFB6 5F5A 8D05)
----------------------------------------------------------------------------
Desde chiquito quería ser doctor
Pero después me enfermé y me hice músico

September 09, 2008

Re: RFC on range design for D2

Posted by Leandro Lucarella
in reply to Andrei Alexandrescu

Leandro Lucarella

Posted in reply to Andrei Alexandrescu

Andrei Alexandrescu, el  9 de septiembre a las 09:50 me escribiste:
> Leandro Lucarella wrote:
> >Andrei Alexandrescu, el  8 de septiembre a las 22:43 me escribiste:
> >>I like the alternate names quite some. One thing, however, is that head and rear are not near-antonyms (which ideally they should be). Maybe front and rear would be an improvement. (STL uses front and back). Also,
> >What about head/tail? You certainly won't confuse STL refugees and functional guys will be at home ;)
> 
> You'll sure confuse the latter. To them, tail is everything except the head, e.g. a[1 .. $].

You are right =/

Anyway, I think it better to confuse some other language guys than compromising D's readability...

-- 
Leandro Lucarella (luca) | Blog colectivo: http://www.mazziblog.com.ar/blog/
----------------------------------------------------------------------------
GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145  104C 949E BFB6 5F5A 8D05)
----------------------------------------------------------------------------
Karma police
arrest this man,
he talks in maths,
he buzzes like a fridge,
he's like a detuned radio.

September 09, 2008

Re: RFC on range design for D2

Posted by Ary Borenszweig
in reply to Andrei Alexandrescu

Ary Borenszweig

Posted in reply to Andrei Alexandrescu

Andrei Alexandrescu a écrit :
> Hello,
> 
> 
> Walter, Bartosz and myself have been hard at work trying to find the right abstraction for iteration. That abstraction would replace the infamous opApply and would allow for external iteration, thus paving the way to implementing real generic algorithms.
> ...
> http://ssli.ee.washington.edu/~aalexand/d/tmp/std_range.html
> 
> 
> Andrei

It looks very nice, though I have a few questions:
- Is std.range's source code somewhere? Or it's just the documentation and then the implementation will follow? Because...
- How do you create a range? In the documentation it says that "Built-in slices T[] are a direct implementation of random-access ranges", so I guess a built-in slice is already a range. But if that is true...
- How is "void next(T)(ref T[] range)" implemented? If I pass a built-in slice to it, how does the template store the state of where in the range are we? Or maybe you'd need to do Range!(...) to create a range?
- What do I do to make a collection implement a range? Do I need to implement the templates in std.range using template conditions?

Sorry if my questions are silly, I don't know much about templated code.

September 09, 2008

Re: RFC on range design for D2

Posted by Andrei Alexandrescu
in reply to Leandro Lucarella

Andrei Alexandrescu

Posted in reply to Leandro Lucarella

Leandro Lucarella wrote:
> Andrei Alexandrescu, el  9 de septiembre a las 07:47 me escribiste:
>> Lionello Lunesu wrote:
>>> "Andrei Alexandrescu" <SeeWebsiteForEmail@erdani.org> wrote in message news:ga46ok$2s77$1@digitalmars.com...
>>>> I put together a short document for the range design. I definitely missed about a million things and have been imprecise about another million, so feedback would be highly appreciated. See:
>>>>
>>>> http://ssli.ee.washington.edu/~aalexand/d/tmp/std_range.html
>>> This is just awesome. Thank you for tackling this issue.
>>> I agree with others that some names are not so obvious. Left/right? How do Arabic speakers feel about this : ) Begin/end seems more intuitive.
>> I don't know of that particular cultural sensitivity. Begin and end are
>> bad choices because they'd confuse the heck out of STL refugees. c.left
>> and c.right are actually STL's c.front() and c.back() or *c.begin() and
>> c.end()[-1], if you allow the notational abuse. But I sure hope some
>> good names will come along.
> 
> I think STL refugees can deal with it. I think there is no point on keep
> compromising D's readability because of C/C++ (you just mentioned enum,
> another really bad choice to keep C/C++ refugees happy).

I agree. I just don't think that choosing one name over a synonym name compromises much.

> I find left and right a little obscure too, it all depends on your mental
> image of a range. Using front/back or begin/end is much more clearer.

I'd like to go with:

r.first
r.last
r.next
r.pop

Not convinced about r.toBegin(s), r.toEnd(s), and r.fromEnd(s) yet, in wake of a realization. I've noticed that the r.fromBegin(s) operation is not needed if we make the appropriate relaxations. r.fromBegin(s) is really s.toEnd(r).

I've updated http://ssli.ee.washington.edu/~aalexand/d/tmp/std_range.html with a drawing (right at the beginning) that illustrates what primitives we need. Maybe this will help us choose even better names.


Andrei

September 09, 2008

Re: RFC on range design for D2

Posted by Andrei Alexandrescu
in reply to Ary Borenszweig

Andrei Alexandrescu

Posted in reply to Ary Borenszweig

Ary Borenszweig wrote:
> Andrei Alexandrescu a écrit :
>> Hello,
>>
>>
>> Walter, Bartosz and myself have been hard at work trying to find the right abstraction for iteration. That abstraction would replace the infamous opApply and would allow for external iteration, thus paving the way to implementing real generic algorithms.
>> ...
>> http://ssli.ee.washington.edu/~aalexand/d/tmp/std_range.html
>>
>>
>> Andrei
> 
> It looks very nice, though I have a few questions:
> - Is std.range's source code somewhere? Or it's just the documentation and then the implementation will follow?

There is an implementation that does not compile :o|.

> Because...
> - How do you create a range? In the documentation it says that "Built-in slices T[] are a direct implementation of random-access ranges", so I guess a built-in slice is already a range.

A slice is a range alright without any extra adaptation. It has some extra functions, e.g. ~=, that are not defined for ranges.

> But if that is true...
> - How is "void next(T)(ref T[] range)" implemented? If I pass a built-in slice to it, how does the template store the state of where in the range are we? Or maybe you'd need to do Range!(...) to create a range?

void next(T)(ref T[] range) { range = range[1 .. $]; }

> - What do I do to make a collection implement a range? Do I need to implement the templates in std.range using template conditions?

Oh, much simpler. You don't need to use templates at all if you know the type in advance.

// assume a collection of ints
// using old names
struct Collection
{
    struct Range
    {
        bool isEmpty() { ... }
        ref int left() { ... }
        void next() { ... }
    }
    Range all() { ... }
}

Collection c;
foreach (auto r = c.all; !r.isEmpty; r.next)
{
    writeln(r.left);
}

The advantage of the above is not that it offers you looping over your collection. The advantage is that your collection now can use many of the algorithms in std.algorithm, and others written to use ranges.

Collection.Range is in intimate connection with Collection because it understands the mechanism of walking the collection.

Your code won't currently compile because returning ref int from left() is not allowed.

Andrei

September 09, 2008

Re: RFC on range design for D2

Posted by Fawzi Mohamed
in reply to Andrei Alexandrescu

Fawzi Mohamed

Posted in reply to Andrei Alexandrescu

It is a nice idea to redesign the iterator and range.
I think that the proposal is not bad, but I have some notes about it, and some things that I would have done differently.

1) The simplest interface input (range) is just
bool isEmpty();
T next();
iterator(T) release();

Thefirst two I fully agree on, the second one I suppose is there to allow resources to be released and possibly transfer the data to another iterator.. is it really needed?

Now I would see this simplest thing (let me call it iterator) as the basic objects for foreach looping.
*all* things on which foreach loops should be iterators.
If an object is not a iterator there should be a standard way to convert it to one (.iterator for example).
So if the compiler gets something that is not a iterator it tries to see if .iterator is implemented and if it is it calls it and iterates on it.
This let many objects have a "default" iterator. Obviously an object could have other methods that return an iterator.

2) All the methods with intersection of iterator in my opinion are difficult to memorize, and rarely used, I would scrap them.
Instead I would add the comparison operation .atSamePlace(iterator!(T)y) that would say if two iterators are at the same place. With it one gets back all the power of pointers, and with a syntax and use that are understandable.
I understand the idea of covering all possibilities, if one wants it with .atSamePlace a template can easily construct all possible intersection iterators. Clearly calling recursively such a template is inefficient, but I would say the then one should use directly a pair of iterators (in the worst case one could make a specialization that implements it more efficiently for the types that support it).

3) copying: I would let the user freely copy and duplicate iterators if needed.

4) input-output
I think that the operations proposed are sound, I like them

5) hierarchy of iterators
I would classify the iterator also along another axis: size
infinite (stream) - finite (but unknown size) - bounded (finite and known size)

The other classification:
forward iterator (what I called iterator until now)
bidirectional range: I understand this, these are basically two iterators one from the beginning and the other from the end that are coupled together. I find it a little bit strange, I would just expect to have a pair of iterators... but I see that it might be useful
bidirectional iterator: this is a doubly linked list, I think that this class of iterators cannot easily be described just as a range, it often needs three points (start,end,actual_pos), I think has its place (and is not explicitly present in your list)
random_iterator: (this could be also called array type or linear indexed type).

So this is what "my" iterator/range would look like :)

Fawzi

September 09, 2008

Re: An alternative to .init

Posted by Andrei Alexandrescu
in reply to downs

Andrei Alexandrescu

Posted in reply to downs

downs wrote:
> Andrei Alexandrescu wrote:
>> What's the deal with destroyed but not deleted? Consider:
>> 
>> int[] a; if (condition) { Array!(int) b; a = b.all; } writeln(a);
>> 
>> This is a misuse of the array in that a range crawling on its back
>> has survived the array itself. What should happen now? Looking at
>> other languages:
>> 
>> 1) All Java objects are unowned, meaning the issue does not appear
>> in the first place, which is an advantage. The disadvantage is that
>> scarce resources must be carefully managed by hand in client code.
>> 
>> 2) C++ makes the behavior undefined because it destroys data AND recycles memory as soon as the array goes out of scope. Mayhem
>> ensues.
>> 
>> We'd like:
>> 
>> 1.5) Allow object ownership but also make the behavior of incorrect
>> code well-defined so it can be reasoned about, reproduced, and
>> debugged.
>> 
>> That's why I think an Array going out of scope should invoke
>> destructors for its data, and then obliterate contents with
>> ElementType.init. That way, an Array!(File) will properly close all
>> files AND put them in a "closed" state. At the same time, the
>> memory associated with the array will NOT be deallocated, so a
>> range surviving the array will never crash unrelated code, but
>> instead will see closed files all over.
>> 
> 
> I don't think this is a good thing, for reasons similar to the
> Error/Exception flaw - specifically, code that works in debug mode
> might end up failing in release mode.
> 
> To explain what I mean by Error/Exception flaw, consider the case of
> an array out of bounds error, wrapped carelessly in a try .. catch
> (Exception) block.
> 
> This would work fine in debug mode, and presumably retry the
> operation until it succeeded.
> 
> In release mode, however, the above would crash.
> 
> This is clearly undesirable, and arises directly from the fact that
> Error is derived from Exception, not the other way around or
> completely separate (as it clearly should be).
> 
> After all, an Error ![is] an Exception, since Exceptions are clearly
> defined as recoverable errors, and the set of unrecoverable errors is
> obviously not a subset of the recoverable ones.
> 
> This leads to my actual point: I suggest an extension of .init: the
> .fail state, indicating data that should not be accessed.
> 
> Any standard library function that encounters data that is
> intentionally in the .fail state should throw an Error.
> 
> For instance, the .fail state for strings could be a deliberately
> invalid UTF8 sequence.
> 
> When this state could reasonably come up in normal operations, it is
> recommended to use values that will readily be visible in a debugger,
> such as the classical 0xDEADBEEF.
> 
> This is imnsho superior to using .init to fill this memory, which
> doesn't tell the debugging programmer much about what exactly
> happened, and furthermore, might cause the program to treat "invalid"
> memory the same as "fresh" memory, if only by accident.

I hear you. I brought up the same exact design briefly with Bartosz last week. We called it T.invalid. He argued in favor of it. I thought it brings more complication than it's worth and was willing to go with T.init for simplicity's sake. Why deal with two empty states instead of one.

One nagging question is, what is T.fail for integral types? For pointers fine, one could be found. For chars, fine too. But for integrals I'm not sure that e.g. T.min or T.max is a credible fail value.


Andrei

September 09, 2008

Re: An alternative to .init

Posted by downs
in reply to Andrei Alexandrescu

downs

Posted in reply to Andrei Alexandrescu

Andrei Alexandrescu wrote:
> downs wrote:
>> Andrei Alexandrescu wrote:
>>> What's the deal with destroyed but not deleted? Consider:
>>>
>>> int[] a; if (condition) { Array!(int) b; a = b.all; } writeln(a);
>>>
>>> This is a misuse of the array in that a range crawling on its back has survived the array itself. What should happen now? Looking at other languages:
>>>
>>> 1) All Java objects are unowned, meaning the issue does not appear in the first place, which is an advantage. The disadvantage is that scarce resources must be carefully managed by hand in client code.
>>>
>>> 2) C++ makes the behavior undefined because it destroys data AND recycles memory as soon as the array goes out of scope. Mayhem ensues.
>>>
>>> We'd like:
>>>
>>> 1.5) Allow object ownership but also make the behavior of incorrect code well-defined so it can be reasoned about, reproduced, and debugged.
>>>
>>> That's why I think an Array going out of scope should invoke destructors for its data, and then obliterate contents with ElementType.init. That way, an Array!(File) will properly close all files AND put them in a "closed" state. At the same time, the memory associated with the array will NOT be deallocated, so a range surviving the array will never crash unrelated code, but instead will see closed files all over.
>>>
>>
>> I don't think this is a good thing, for reasons similar to the Error/Exception flaw - specifically, code that works in debug mode might end up failing in release mode.
>>
>> To explain what I mean by Error/Exception flaw, consider the case of an array out of bounds error, wrapped carelessly in a try .. catch (Exception) block.
>>
>> This would work fine in debug mode, and presumably retry the operation until it succeeded.
>>
>> In release mode, however, the above would crash.
>>
>> This is clearly undesirable, and arises directly from the fact that Error is derived from Exception, not the other way around or completely separate (as it clearly should be).
>>
>> After all, an Error ![is] an Exception, since Exceptions are clearly defined as recoverable errors, and the set of unrecoverable errors is obviously not a subset of the recoverable ones.
>>
>> This leads to my actual point: I suggest an extension of .init: the .fail state, indicating data that should not be accessed.
>>
>> Any standard library function that encounters data that is intentionally in the .fail state should throw an Error.
>>
>> For instance, the .fail state for strings could be a deliberately invalid UTF8 sequence.
>>
>> When this state could reasonably come up in normal operations, it is recommended to use values that will readily be visible in a debugger, such as the classical 0xDEADBEEF.
>>
>> This is imnsho superior to using .init to fill this memory, which doesn't tell the debugging programmer much about what exactly happened, and furthermore, might cause the program to treat "invalid" memory the same as "fresh" memory, if only by accident.
> 
> I hear you. I brought up the same exact design briefly with Bartosz last week. We called it T.invalid. He argued in favor of it. I thought it brings more complication than it's worth and was willing to go with T.init for simplicity's sake. Why deal with two empty states instead of one.
> 
> One nagging question is, what is T.fail for integral types? For pointers fine, one could be found. For chars, fine too. But for integrals I'm not sure that e.g. T.min or T.max is a credible fail value.
> 
> 
> Andrei

For numbers, it should probably be "the same as .init". Not every error condition can be detected, sadly.

It would also be nice if a .fail value could be provided as an extension to typedef somehow .. user defined types will probably have their own possible failure indicators.

September 09, 2008

Re: RFC on range design for D2

Posted by Andrei Alexandrescu
in reply to Fawzi Mohamed

Andrei Alexandrescu

Posted in reply to Fawzi Mohamed

Fawzi Mohamed wrote:
> It is a nice idea to redesign the iterator and range.
> I think that the proposal is not bad, but I have some notes about it, and some things that I would have done differently.
> 
> 1) The simplest interface input (range) is just
> bool isEmpty();
> T next();
> iterator(T) release();

Actually next is getNext, and release returns R (the range type).

> Thefirst two I fully agree on, the second one I suppose is there to allow resources to be released and possibly transfer the data to another iterator.. is it really needed?

Yes. Consider findAdjacent that finds two equal adjacent elements in a collection:

Range findAdjacent(alias pred = "a == b", Range)(Range r)
{
    if (r.isEmpty) return r;
    auto ahead = r;
    ahead.next;
    for (; !ahead.isEmpty; r.next, ahead.next)
        if (binaryFun!(pred)(r.first, ahead.first)) return r;
    }
    return ahead;
}

The whole implementation fundamentally rests on the notion that you can copy a range into another, and that you can iterate the collection independently with two distinct ranges. If that's not true, findAdjacent will execute yielding nonsensical results.

Input iterators are not copyable. With an input iterator "auto ahead = r;" will not compile. But they are movable. So you can relinquish control from one iterator to the other.

> Now I would see this simplest thing (let me call it iterator) as the basic objects for foreach looping.
> *all* things on which foreach loops should be iterators.
> If an object is not a iterator there should be a standard way to convert it to one (.iterator for example).
> So if the compiler gets something that is not a iterator it tries to see if .iterator is implemented and if it is it calls it and iterates on it.
> This let many objects have a "default" iterator. Obviously an object could have other methods that return an iterator.

Fine. So instead of saying:

foreach (e; c.all) { ... }

you can say

foreach (e; c) { ... }

I think that's some dubious savings.

> 2) All the methods with intersection of iterator in my opinion are difficult to memorize, and rarely used, I would scrap them.
> Instead I would add the comparison operation .atSamePlace(iterator!(T)y) that would say if two iterators are at the same place. With it one gets back all the power of pointers, and with a syntax and use that are understandable.

But that comparison operation is not enough to implement anything of substance. Try your hand at a few classic algorithms and you'll see.

> I understand the idea of covering all possibilities, if one wants it with .atSamePlace a template can easily construct all possible intersection iterators. Clearly calling recursively such a template is inefficient, but I would say the then one should use directly a pair of iterators (in the worst case one could make a specialization that implements it more efficiently for the types that support it).
> 
> 3) copying: I would let the user freely copy and duplicate iterators if needed.

I like freedom too. But that kind of freedom is incorrect for input iterators.

> 4) input-output
> I think that the operations proposed are sound, I like them

Then you got to accept the consequences :o).

> 5) hierarchy of iterators
> I would classify the iterator also along another axis: size
> infinite (stream) - finite (but unknown size) - bounded (finite and known size)

Distinguishing such things can be of advantage sometimes, and could be added as a refinement to the five categories if shown useful.

> The other classification:
> forward iterator (what I called iterator until now)
> bidirectional range: I understand this, these are basically two iterators one from the beginning and the other from the end that are coupled together. I find it a little bit strange, I would just expect to have a pair of iterators... but I see that it might be useful
> bidirectional iterator: this is a doubly linked list, I think that this class of iterators cannot easily be described just as a range, it often needs three points (start,end,actual_pos), I think has its place (and is not explicitly present in your list)
> random_iterator: (this could be also called array type or linear indexed type).

I can't understand much of the above, sorry.

> So this is what "my" iterator/range would look like :)

I encourage you to realize your design. Before long you'll find probably even more issues with it than I mentioned above, but you'll be gained in being better equipped to find proper solutions.


Andrei

September 09, 2008

Re: An alternative to .init

Posted by Andrei Alexandrescu
in reply to downs

Andrei Alexandrescu

Posted in reply to downs

downs wrote:
> For numbers, it should probably be "the same as .init". Not every
> error condition can be detected, sadly.

That further dillutes the benefits of T.fail.

> It would also be nice if a .fail value could be provided as an
> extension to typedef somehow .. user defined types will probably have
> their own possible failure indicators.

That further increases the cognitive cost of T.fail.

Not putting you down. I think the notion is good, but I think we need to thoroughly understand its costs and benefits before even raising it to Walter's level of consciousness :o).

Andrei

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation