September 11, 2008
Bill Baxter wrote:
> On Thu, Sep 11, 2008 at 1:31 PM, Benji Smith <dlanguage@benjismith.net> wrote:
>> Parsers & regex engines move both forward and backward, as they try to match
>> characters to a pattern.
>>
>> Really, anything that uses an NFA or DFA to define patterns would benefit
>> from fast bidirectional iteration...
> 
> Good call.  I was about to post something mentioning that Turing
> machines but that seemed too academic.  Same class of thing as
> NFA/DFA/FSM.
> 
> The question is, though, would you really implement those things using
> a linked list?  I would expect most of those suckers work on arrays,
> and so can take advantage of the bidirectional nature of random access
> ranges.

Actually, Perl 6 will (assuming they ever finish it) finally allow regex matching against input streams:

http://dev.perl.org/perl6/doc/design/syn/S05.html

It's a big document. Search for the text "Matching against non-strings"

This kind of thing was one of the main arguments I made in my "Why Strings as Classes" thread, that got everyone's panties in a bunch and that no one else agreed with.

In that thread, I argued that Strings should be objects so that they can implement a CharSequence interface (or something like it). And then all the standard text-processing stuff could be written against that interface, allowing regex engines and parsers to be agnostic about whether they read from an actual in-memory string or from a (file|database|socket) input stream.

With the range proposal on the table, I'd be just as happy if all the D text-processing stuff in the standard libs was implemented against a Range!(T), where T is one of the char types. Especially if ranges can be infinite.

Bill Baxter wrote:
> Hmm, for FSMs you can't really define a good end state.  There may not
> be any particular end state. ... ah, but wait I forgot.  That's the
> beauty of a range -- the end "state" doesn't have to be a "state" per
> se.  It can be any predicate you want it to be.  "Range" is misleading
> in this case.  This is one of those cases where you just have to
> remember "range" means "current value plus stopping criterion".

That's what I was saying earlier.

I think the mechanics are good. And for contiguous, sequential containers, the word "range" is great. For other types of containers, or other selection/iteration scenarios, you can shoehorn your mental model into the "range" metaphor. But it's weird.

--benji
September 11, 2008
Benji Smith <dlanguage@benjismith.net> wrote:
> Bill Baxter wrote:
> > Hmm, for FSMs you can't really define a good end state.  There may not be any particular end state. ... ah, but wait I forgot.  That's the beauty of a range -- the end "state" doesn't have to be a "state" per se.  It can be any predicate you want it to be.  "Range" is misleading in this case.  This is one of those cases where you just have to remember "range" means "current value plus stopping criterion".
> 
> That's what I was saying earlier.
> 
> I think the mechanics are good. And for contiguous, sequential containers, the word "range" is great. For other types of containers, or other selection/iteration scenarios, you can shoehorn your mental model into the "range" metaphor. But it's weird.

It seems to me like a misuse of ranges.  Do you really want to iterate over a state machine?  FSM is a mailbox with a 'message' hole.  You put messages into it and it does things.  How do you iterate over a mailbox?
September 11, 2008
Sergey Gromov wrote:
> Benji Smith <dlanguage@benjismith.net> wrote:
>> Bill Baxter wrote:
>>> Hmm, for FSMs you can't really define a good end state.  There may not
>>> be any particular end state. ... ah, but wait I forgot.  That's the
>>> beauty of a range -- the end "state" doesn't have to be a "state" per
>>> se.  It can be any predicate you want it to be.  "Range" is misleading
>>> in this case.  This is one of those cases where you just have to
>>> remember "range" means "current value plus stopping criterion".
>> That's what I was saying earlier.
>>
>> I think the mechanics are good. And for contiguous, sequential containers, the word "range" is great. For other types of containers, or other selection/iteration scenarios, you can shoehorn your mental model into the "range" metaphor. But it's weird.
> 
> It seems to me like a misuse of ranges.  Do you really want to iterate over a state machine?  FSM is a mailbox with a 'message' hole.  You put messages into it and it does things.  How do you iterate over a mailbox?

Well, no. Not when you put it like that.

The example I posted earlier went something like this:

   MarkovModel<ApplicationState> model = ...;
   for (ApplicationState state : model) {
      state.doStuff();
   }

It's not a bad abstraction. The model handles all of the semantics of calculating the transition probabilities and selecting the next state, so that the foreach loop doesn't have to muss with those details.

Yeah, it's a total misuse of the "range" metaphor, and that's exactly what I'm saying. In Java, where I implemented this project, an "iterator" is a tiny bit of logic for returning objects in a (potentially endless, potentially reversible) sequence, primarily to support looping constructs. Just because there's no underlying range of objects doesn't mean they're not iterable.

Of course, Java iterators are *much* more limited constructs than these new D ranges. But I still think the concept has merit. And, like you said, calling them ranges makes them seem stupid. Because they're not ranges.

--benji
September 11, 2008
Benji Smith <dlanguage@benjismith.net> wrote:
> Sergey Gromov wrote:
> > Benji Smith <dlanguage@benjismith.net> wrote:
> >> Bill Baxter wrote:
> >>> Hmm, for FSMs you can't really define a good end state.  There may not be any particular end state. ... ah, but wait I forgot.  That's the beauty of a range -- the end "state" doesn't have to be a "state" per se.  It can be any predicate you want it to be.  "Range" is misleading in this case.  This is one of those cases where you just have to remember "range" means "current value plus stopping criterion".
> >> That's what I was saying earlier.
> >>
> >> I think the mechanics are good. And for contiguous, sequential containers, the word "range" is great. For other types of containers, or other selection/iteration scenarios, you can shoehorn your mental model into the "range" metaphor. But it's weird.
> > 
> > It seems to me like a misuse of ranges.  Do you really want to iterate over a state machine?  FSM is a mailbox with a 'message' hole.  You put messages into it and it does things.  How do you iterate over a mailbox?
> 
> Well, no. Not when you put it like that.
> 
> The example I posted earlier went something like this:
> 
>     MarkovModel<ApplicationState> model = ...;
>     for (ApplicationState state : model) {
>        state.doStuff();
>     }
> 
> It's not a bad abstraction. The model handles all of the semantics of calculating the transition probabilities and selecting the next state, so that the foreach loop doesn't have to muss with those details.
> 
> Yeah, it's a total misuse of the "range" metaphor, and that's exactly what I'm saying. In Java, where I implemented this project, an "iterator" is a tiny bit of logic for returning objects in a (potentially endless, potentially reversible) sequence, primarily to support looping constructs. Just because there's no underlying range of objects doesn't mean they're not iterable.
> 
> Of course, Java iterators are *much* more limited constructs than these new D ranges. But I still think the concept has merit. And, like you said, calling them ranges makes them seem stupid. Because they're not ranges.

Well, if you get an object out of there on every step, and that object source can exhaust at some point, then the abstraction is correct.

I agree that an input range is actually an arbitrary bounded iterator. But you also must agree that a random access iterator in C++ is actually an unbounded array.  You always can invent a better name for any particular case.  But C++ keeps calling them iterators to pronounce generocity and emphasize interchangeability to some degree.  You may not notice that calling a string pointer an iterator is a bit awkward and misleading, because you get used to it and learned to think that way.

There's no difference with ranges.  Some of them are actual ranges, some are not, some are plain abstractions.  You need to learn to think in ranges to use them naturally.  This is true for any new language you're learning, programming or human, if you want to use them efficiently.
September 11, 2008
Sergey Gromov wrote:
> Benji Smith <dlanguage@benjismith.net> wrote:
>> Sergey Gromov wrote:
>>> Benji Smith <dlanguage@benjismith.net> wrote:
>>>> Bill Baxter wrote:
>>>>> Hmm, for FSMs you can't really define a good end state.  There may not
>>>>> be any particular end state. ... ah, but wait I forgot.  That's the
>>>>> beauty of a range -- the end "state" doesn't have to be a "state" per
>>>>> se.  It can be any predicate you want it to be.  "Range" is misleading
>>>>> in this case.  This is one of those cases where you just have to
>>>>> remember "range" means "current value plus stopping criterion".
>>>> That's what I was saying earlier.
>>>>
>>>> I think the mechanics are good. And for contiguous, sequential containers, the word "range" is great. For other types of containers, or other selection/iteration scenarios, you can shoehorn your mental model into the "range" metaphor. But it's weird.
>>> It seems to me like a misuse of ranges.  Do you really want to iterate over a state machine?  FSM is a mailbox with a 'message' hole.  You put messages into it and it does things.  How do you iterate over a mailbox?
>> Well, no. Not when you put it like that.
>>
>> The example I posted earlier went something like this:
>>
>>     MarkovModel<ApplicationState> model = ...;
>>     for (ApplicationState state : model) {
>>        state.doStuff();
>>     }
>>
>> It's not a bad abstraction. The model handles all of the semantics of calculating the transition probabilities and selecting the next state, so that the foreach loop doesn't have to muss with those details.
>>
>> Yeah, it's a total misuse of the "range" metaphor, and that's exactly what I'm saying. In Java, where I implemented this project, an "iterator" is a tiny bit of logic for returning objects in a (potentially endless, potentially reversible) sequence, primarily to support looping constructs. Just because there's no underlying range of objects doesn't mean they're not iterable.
>>
>> Of course, Java iterators are *much* more limited constructs than these new D ranges. But I still think the concept has merit. And, like you said, calling them ranges makes them seem stupid. Because they're not ranges.
> 
> Well, if you get an object out of there on every step, and that object source can exhaust at some point, then the abstraction is correct.
> 
> I agree that an input range is actually an arbitrary bounded iterator.  But you also must agree that a random access iterator in C++ is actually an unbounded array.  You always can invent a better name for any particular case.  But C++ keeps calling them iterators to pronounce generocity and emphasize interchangeability to some degree.  You may not notice that calling a string pointer an iterator is a bit awkward and misleading, because you get used to it and learned to think that way.
> 
> There's no difference with ranges.  Some of them are actual ranges, some are not, some are plain abstractions.  You need to learn to think in ranges to use them naturally.  This is true for any new language you're learning, programming or human, if you want to use them efficiently.

I agree 100%, and also with Sergey's other post that some abstractions simply don't fit the range charter, or don't fit it naturally, or are not expressive enough for some rich iteration abstraction.

Maybe ranges are lousy for an HMM, but then does look like I want to run a host of generic algorithms against an HMM? That IS the question. Probably not. HMMs have their own algorithms, and I wouldn't think of finding/sorting/partitioning an HMM just as I wouldn't think of applying Viterbi to an array.

What I wanted was to make sure ranges are appropriate as higher-level abstractions that can replace STL-like iterators. My experience shows that they can. Not on 100% of occasions have they been a superior replacement, but I'm looking at a solid 80s at least. Add to that the advantage of better generators (which iterators make unpalatable because of the unsightly dummy end() requirement). When I also add the safety advantage of sinks (no more buffer overruns!!!), I feel we have a huge winner.

Of course, that doesn't mean ranges should be the be-all end-all of iteration.

This discussion reminds me of the "iterator craze" around 2000. People were discovering STL iterators and were trying to define and use the weirdest iterators. I remember distinctly a one-page ad on Dr. Dobb's Journal. They were looking for article writers. They mentioned the upcoming themes (e.g. networking, security, patterns, C++...). There was a _specific_ note: "Not interested in yet another iterator article".

That being said, damn I wish I had the time to make RegEx faster AND operating on input ranges...


Andrei
September 11, 2008
Brad Roberts wrote:

> That just means don't initialize

I know what the semantics of `T= void' is supposed to be, but your remark is only a result of Walters overloading of meanings to keywords.

It does not change the fact, that for all types _one_ more possibility exists to (not)initialize it, than it has legal values.

If there is one more possibility, then there are many more; including the possibility, that the initial value is in fact `void', i.e. illegal as an rvalue.

See
http://www.digitalmars.com/webnews/newsgroups.php?
art_group=digitalmars.D.bugs&article_id=15041
for an example that, D in fact uses `void' as an initilization value.

-manfred

-- 
If life is going to exist in this Universe, then the one thing it cannot afford to have is a sense of proportion. (Douglas Adams)

September 11, 2008
Andrei Alexandrescu wrote:
> What I wanted was to make sure ranges are appropriate as higher-level abstractions that can replace STL-like iterators. My experience shows that they can. Not on 100% of occasions have they been a superior replacement, but I'm looking at a solid 80s at least. Add to that the advantage of better generators (which iterators make unpalatable because of the unsightly dummy end() requirement). When I also add the safety advantage of sinks (no more buffer overruns!!!), I feel we have a huge winner.

I agree.

My quibble with the name "range" is pretty minor, and I don't have any qualm with the semantics.

And "range" is certainly a better name for an iteration metaphor than "opApply".

:-)

--benji
September 11, 2008
First, let me add my support for the range proposal. It is in line with earlier suggestions but makes some crucial additions. I'm also very glad that one of the most influencing forces behind D has returned to those newsgroups.

Andrei Alexandrescu wrote:
> Robert Jacques wrote:
>> On Mon, 08 Sep 2008 20:37:41 -0400, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:
>>> Denis Koroskin wrote:
>>>> 4) We need some way of supporting dollar notation in user containers. The hack of using __dollar is bad (although it works).
>>>
>>> It doesn't work for multiple dimensions. There should be an opDollar(uint dim) that gives the library information on which argument count it occured in. Consider:
>>>
>>> auto x = matrix[$-1, $-1];
>>>
>>> Here the dollar's occurrences have different meanings. A good start would be to expand the above into:
>>>
>>> auto x = matrix[matrix.opDollar(0)-1, matrix.opDollar(1)-1];
>>
>> I'd also add that multiple dimension slicing should be supported. i.e.
>> auto x = matrix[2..5,0..$,3]
>> would become
>> auto x = matrix.opSlice(Slice!(size_t)(2,5),Slice!(size_t)(0,matrix.opDollar(0)),3) 
>>
>> with
>> struct Slice (T) { T start; T end; }
>> Strided slices would also be nice. i.e. matrix[0..$:10] // decimate the array
> 
> Multidimensional slicing can be implemented with staggered indexing:
> 
> matrix[2..5][0..$][3]
> 
> means: first, take a slice 2..5 that returns a matrix range one dimension smaller. Then, for that type take a slice from 0 to $. And so on.

Implementing multidimensional slicing in this way is quite irregular. One would expect m[2..5][1..2] to behave just like s[2..5][1..2] would for a regular array. (e.g. consider s being of the type char[]).

I implemented most of all of this (multidimensional arrays) a little more than two years ago. (Being able to implement it required me to write the patch that made it into DMD 0.166 that made it possible for structs and classes to have template member functions/operators.)

There was basically one Matrix type with rectangular, dense storage and a strided Slice type. I believe the only thing that makes sense when it comes to multidimensional slicing is how I implemented it:

m[i_1, i_2, i_3, ...],

where i_x is either a range or a singular index, and x corresponds to the dimension (1,2,3,...) results in a slice where each dimension indexed by a singular index is collapsed and dimensions indexed by a range is kept.

The resulting syntax is:

m[range(2,5), all, 3, range($-1,$)]

even though the possibility to use $ like this wasn't discovered until much later.

Of course, the optimal syntax would be something like:

m[2..5, 0..$, 3, $-1..$],

which would be easiest to implement by making a..b a value in it self of an integral range type. Intriguingly, integral ranges would just be an implementation of the same range concept you have already presented.

-- 
Oskar
September 11, 2008
"Bill Baxter" wrote
>> Thus one can implement iterators on top of ranges, but I'd argue that
>> ranges
>> are much easier to implement on top of iterators.
>
> Ranges are safer and easier to work with in most cases so it's worth it, or so the argument goes.  You don't buy it?

I can define an iterator and it doesn't mean that it makes ranges any less safe.  Just give me the choice, if I think iterators are a better fit, I might choose iterators.  But having to shoehorn ranges into an iterator form so that I do not wince at the ugliness of my code seems like unnecessary baggage.

I believe that when you are actually using a range of values, a range form is a much better, safer fit.  When you want just a pointer to a single value, then a pointer-form is a better fit.  But I want to be able to construct ranges from pointers.  I want to save pointers.  I want to use pointers to refer to elements in a collection.  I want to use pointers to move one-at-a-time along a node-based container.  I don't want to 'emulate' pointers using ranges.  I don't want the library to resist me doing what I find natural.

This goes back to a lot of the points I've brought up about 'safety' issues in D.  D is a systems language, I like the safety by default, but when I can gain something by breaking the safety, I want to be able to do it efficiently, and without resistance from the compiler.  Like logical const. I've proven it's possible to emulate, but at a performance disadvantage. This is no different, you can emulate iterators, but at a performance (and code bloat) disadvantage.  Granted the disadvantage isn't as big for this as it is for logical const, but the question still remains - if I can do it already, why is it so bad if it's supported natively?

Anyways, I'm going to leave the discussion, I think I've said all I can about my views.  I'm not really good at explaining things anyways.  But I will update dcollections with what I think is the best compromise.  Then I might have a better understanding of how ranges fit into a collection package.  The good news is I don't have to worry about the language not providing iterators, everything is going to be library based, so we can easily try out both ways and see which is easier to use.

-Steve


September 11, 2008
Benji Smith wrote:
> Bill Baxter wrote:
>> Ok, but I have yet to hear an actual use case that demands blazing
>> fast iteration both forwards and backwards.  In your shuffling video
>> there's no way moving the iterator back and forth is going to be the
>> bottleneck.  In my undo/redo stack example it is also far from being
>> on the critical path.    I think it goes back to the fact that going
>> back and forth randomly isn't a property of many algorithms.  In all
>> the examples I can think of it's more a property of how humans
>> interact with data.  And humans are slow compared to how long it takes
>> to update a few extra values.
> 
> Oh!! I thought of one!!
> 
> Parsers & regex engines move both forward and backward, as they try to match characters to a pattern.

They do backtracking, which is different than iterating backward.  I would suggest that a parser should use a stack of forward iterators instead.  That's my $.02, at least.

> Really, anything that uses an NFA or DFA to define patterns would benefit from fast bidirectional iteration...

DFAs can't backtrack, so they don't require backward movement through the input.  NFAs might, depending on the implementation (are you going to use guess-and-backtrack, or parallel execution?) but I would again suggest that a stack (or "TODO list") of forward iterators might work better than backtracking.