Ranges again

April 16, 2014
Ranges again
Posted by John Colvin
Permalink
John Colvin
Permalink
Construction and Initialisation
===============================

As previously discussed, it's annoying having the conflation of construction and initialisation. It leads to two options:

1) construction is initialisation. This is nasty for some ranges where the range is truly destructive, amongst a whole load of other problems.

2) lazy initialisation. This is annoying to implement (hello bugs) and potentially slow.


How about a convention to overcome this:


A new, optional, range primitive named initialize or prime is introduced that does the initialisation.

Constructors are not guaranteed to initialise a range if an initialize/prime member is present.

factory functions (e.g. std.range.iota for std.range.Iota) call the constructor, followed by initialise/prime if present, unless explicitly documented otherwise.

All entities that accept a range expect that range to be pre-initialised unless explicitly documented otherwise.


What we do:

Document it. Publicise it.

Change all phobos ranges that are implemented as private aggregates (struct/class) with public factory functions to the new convention. Note that many will not need any change at all as they have no nasty initialisation needs. This is a non-breaking change.

Consider changing some phobos ranges with public aggregates, where the improvement is sufficient to justify the potential breakage for people using the raw constructors.

Nothing else. All current code with it's various initialisation conventions will function exactly as before, any breakage is strictly controlled on a case-by-case basis for each individual range type.


Open Questions:
Does save always return an initialised range?


Iteration Protocol
==================

Proposal:

A) The   !empty -> front (as many times as you like*) -> popFront -> repeat   sequence is used for all ranges.

B) empty and front must appear to do what they say they do** when used according to the above, but may internally do whatever they want. One exception: Consecutive calls to front are *not* guaranteed to return the same value, as this doesn't play well with random access ranges, in particular std.algorithm.map.***

C) as a consequence of B: empty and front are both independently optional. Of course it is still an error to call front or popFront on an empty range.

D) Ranges are not guaranteed to be internally buffered (e.g. see caveat about front in B), but many will be for performance and/or correctness (see WRT streams below). Inevitably some sort of caching wrapper and a byEfficientChunk primitive are likely to emerge, whether standardised or not.

* not deliberately by design, more just as an accidental side-effect of B.
** non-destructive, does not advance the range, does what it says on the tin.
*** Also, a range might work with data that is being mutated while the range is iterating/being indexed/sliced.

WRT streams:

empty for streams requires (attempting to) read from the stream, which is in turn destructive. Thus, implementations of streams as ranges *must* be buffered, to preserve the illusion of empty being a simple non-destructive check. My understanding is that they will always be buffered on some level anyway for performance.


General notes on everything above
=================================
Ranges are not the solution to every possible iteration problem. They are not a panacea. I believe that attempts to overly generalise them will cause more harm than good. With some sensible conventions we can have a really great tool, but there will always be some situations where a different approach will be more performant, intuitive or safer. Yes, that means you sometimes can't use std.algorithm and std.range, which is sad, but you can't have everything.

Those who want to really squeeze every last bit of performance out of ranges or make use of non-conforming range-like objects can use UDAs to tag various properties of their types, and then have their generic algorithms special-case on those attributes for performance and/or correctness. This enables the concept of ranges to be extended without overly straining the core mechanics.
There might even be a place for some standardised UDAs in std.range and some special-casing in std.algorithm/range.

However we proceed, we need an "I don't satisfy this range requirement even though it looks like I do" indicator. I suggest using a UDA, introduced as standard (in std.range, not the language). E.g. @(std.range.notRandomAccessRange) myRange { ... } would fail isRandomAccessRange even if it contains the necessary primitives.



P.S.
Most of the parts of this have probably been proposed by other people before, which I've probably read and forgotten about. No originality claimed here :)
Forums