November 06, 2012
Le 06/11/2012 02:36, Andrei Alexandrescu a écrit :
> On 11/6/12 3:06 AM, H. S. Teoh wrote:
>> I've already given a number of
>> non-trivial examples of transient forward ranges, and apparently
>> deadalnix has also encountered the same issue.
>
> I'd like to build more of this argument. I pointed out that your range
> is actually defining .dup, not .save.

Yes my code can also fall in the .dup stuff.

If we actually make that difference, our case is moot, but a hell lot of code is broken as well.

For instance, the way map is defined is highly incorrect as each call to .front can pop a newly generated element as well, and I'm pretty that isn't the only one.

We are here back to equality vs identity, and it seems D also have some stuff to fix in that regard (notably struct and AA). Discussing that here only make the problem more confuse and is not helping as long as this question is open in D.
November 06, 2012
Le 06/11/2012 02:40, Andrei Alexandrescu a écrit :
> On 11/6/12 2:44 AM, deadalnix wrote:
>> To be honest, my biggest fear isn't that this proposal is rejected, but
>> that we fallback as default on the input range = transient / forward
>> range = non transient scheme, because we fail to come up with something
>> better, or that the status quo is choosen (as both seems to me worse
>> than the .transient proposal).
>
> I think the simplification of input range = transient and forward range
> = not transient has a lot going for it. It is simple, easy to explain
> and understand, and builds on simple real-life examples (buffered input
> and singly-linked lists). Clearly adding a new notion to the soup makes
> for more expressiveness, but it also makes for more complexity and
> subtlety in support for niche ranges. This should not be neglected.
>

At this point, if transient is out I'd prefer Jonathan's proposal were everything is non transient. This is clearly simpler to use and break less code.

Indeed, the added complexity of .transient exists. The beauty of it is that it is possible to write 100% correct code without even knowing .transient exists. This is why I like this option : the added complexity only exists for the programmer that what to explore the arcane of the language (which include you and me, but not most D users).
November 06, 2012
On Tue, Nov 06, 2012 at 02:03:45AM +0100, Jonathan M Davis wrote: [...]
> The only solutions that I see at this point are
> 
> 1. Exclude transience completely.
> 
> 2. Mark ranges as transient in some way and force algorithms to take this new trait into account.
> 
> 3. Insist that input ranges have transient fronts (save perhaps when it can be statically verified that they aren't based on front's type) and that anything requiring a non-transient front require forward ranges.
> 
> 4. Make all ranges non-transient but provide a way to get at a transient version of it and force algorithms to take this new property into account.
> 
> All 4 solutions have problems. They just have different problems.
[...]

Hmm. Another idea just occurred to me. The basic problem here is that we are conflating two kinds of values, transient and persistent, under a single name .front. What if we explicitly name them? Say, .copyFront for the non-transient value and .refFront for the transient value (the names are unimportant right now, let's consider the semantics of it).

Then an algorithm like std.array.array can be written something like this:

	auto array(R)(R range) {
		auto a = appender!(ElementType!R)();
		while (!range.empty) {
			// Self-documenting: we expect to get a
			// persistent copy of the front value.
			a.put(range.copyFront);
			range.popFront();
		}
		return a.data;
	}

OTOH, an algorithm that is agnostic to transience, like find(), can be
written something like this:

	R find(R,S)(R haystack, S needle) {
		while (!haystack.empty) {
			// Self-documenting: we don't expect a
			// persistent front value.
			if (haystack.refFront == needle)
				return haystack;
			haystack.popFront();
		}
		return haystack;
	}

Of course, this immediately breaks many ranges, because they only have a single .front currently. Which is bad. So we make use of UFCS:

	@property auto copyFront(R)(R range) {
		// Not sure how to do this yet, the idea is that
		// copyFront should be default.
		return range.front;
	}
	@property auto refFront(R)(R range) {
		// By default, .refFront is the same as copyFront.
		return range.copyFront;
	}

ByLine can then do this:

	struct ByLine {
		char[] buf;

		@property auto copyFront() {
			return buf.dup;
		}
		@property auto refFront() {
			return buf;
		}
	}

I haven't worked out all the details yet, but this is the gist of it. While it's still not perfect, it is a slight improvement over the .transient proposal in that no new range types are involved.

There is still the issue of naming: I chose .copyFront and .refFront just for illustration purposes. I'm thinking probably we can make .front == .copyFront, so that it will automatically work with (almost) all current ranges, which should be non-transient. UFCS saves us from having to implement .refFront for everything except actual transient ranges, which should be only a few rare cases.

Of course, this has the disadvantage of needing to update existing algorithms to use .refFront or .copyFront where appropriate -- but then, we have to do that anyway if we're going to support transient ranges at all.

Maybe somebody can improve on this idea?


T

-- 
Long, long ago, the ancient Chinese invented a device that lets them see through walls. It was called the "window".
November 06, 2012
On Monday, November 05, 2012 18:36:53 H. S. Teoh wrote:
> Hmm. Another idea just occurred to me. The basic problem here is that we are conflating two kinds of values, transient and persistent, under a single name .front. What if we explicitly name them? Say, .copyFront for the non-transient value and .refFront for the transient value (the names are unimportant right now, let's consider the semantics of it).

I thought of this a couple days ago but threw it away fairly quickly, because it breaks _everything_. And it definitely affects all ranges for the sake of a few.

- Jonathan M Davis
November 06, 2012
I don't think this whole issue has anything to do with ranges. I think this is an issue of assuming that the symbol = means "copy what's on the right to what's on the left". When in reality, = could mean: (if what's on the right has reference semantics) "make what's on the left reference the same thing that the thing on the right references".

I think all range types should be allowed to return whatever they want from their front property. It's the responsibility of the user of the range to *actually* copy what front returns (if that's what he intends), instead of assuming that = means copy.

In the code below, X marks the bug:

module main;

import std.stdio;
import std.range;

class MyData
{
    int _value;
}

struct MyFwdRange
{
    MyData _myData;

    this(MyData md)
    {
        _myData = md;
    }

    @property MyData front()
    {
        return _myData;
    }

    @property bool empty() const
    {
        return _myData._value > 10;
    }

    void popFront()
    {
        _myData._value++;
    }

    @property MyFwdRange save() const
    {
        auto copy = MyFwdRange();
        copy._myData = new MyData();
        copy._myData._value = _myData._value;
        return copy;
    }
}

void printFirstAndSecond(R)(R range)
    if (isForwardRange!MyFwdRange)
{
    //         X
    auto first = range.front; // That is *not* copying

    range.popFront();
    writeln(first._value, " ", range.front._value);
}

void main()
{
    auto mfr = MyFwdRange(new MyData());

    printFirstAndSecond(mfr); // prints: 1 1 (instead of 0 1)
}
November 06, 2012
On Tue, Nov 06, 2012 at 05:18:18AM +0100, Tommi wrote:
> I don't think this whole issue has anything to do with ranges. I think this is an issue of assuming that the symbol = means "copy what's on the right to what's on the left". When in reality, = could mean: (if what's on the right has reference semantics) "make what's on the left reference the same thing that the thing on the right references".
> 
> I think all range types should be allowed to return whatever they want from their front property. It's the responsibility of the user of the range to *actually* copy what front returns (if that's what he intends), instead of assuming that = means copy.
[...]

The problem is that you can't do this in generic code, because generic code by definition doesn't know how to copy an arbitrary type.  Unless we introduce a standardized deep copy operation to D, like a .clone method that all copyable types implement, this solution isn't really viable.


T

-- 
The computer is only a tool. Unfortunately, so is the user. -- Armaphine, K5
November 06, 2012
On Tue, Nov 06, 2012 at 04:12:17AM +0100, Jonathan M Davis wrote:
> On Monday, November 05, 2012 18:36:53 H. S. Teoh wrote:
> > Hmm. Another idea just occurred to me. The basic problem here is that we are conflating two kinds of values, transient and persistent, under a single name .front. What if we explicitly name them? Say, .copyFront for the non-transient value and .refFront for the transient value (the names are unimportant right now, let's consider the semantics of it).
> 
> I thought of this a couple days ago but threw it away fairly quickly, because it breaks _everything_. And it definitely affects all ranges for the sake of a few.
[...]

If we make .copyFront == .front, then things will still mostly work. That is, all existing non-transient ranges and algorithms that expect non-transient ranges will still work. Thanks to UFCS, only transient ranges (which are "rare") need an explicit definition of .refFront.  If nothing else is done beyond that point, then we are back to the case where all ranges are non-transient, so everything works (just a tad slow).

Then we can update algos that can handle transient values to call .refFront instead of .front, and we have a similar sort of situation with the .transient proposal, where only algorithms that can handle transient values will get them. If indeed we want to support transient ranges correctly, then those algorithms have to be updated anyway, so this isn't much of an additional cost. Plus, we have reduced the amount of code per range to just an additional member in the rare transient range -- no extra range types are needed.

I don't know how much better we can get beyond this, as far as explicit support of transient is concerned. No matter what we do, as long as we acknowledge transient ranges as valid ranges, we have to, one way or another, (1) update all transient ranges to use whichever scheme we decide on, (2) update all transient-capable algorithms to handle transience correctly. We can't do any less than this, from this direction.

The alternative is either to reject all transient ranges, which currently seems to be the least effort option, and also not ideal.


T

-- 
It is of the new things that men tire --- of fashions and proposals and improvements and change. It is the old things that startle and intoxicate. It is the old things that are young. -- G.K. Chesterton
November 06, 2012
On Tue, Nov 06, 2012 at 03:36:54AM +0200, Andrei Alexandrescu wrote:
> On 11/6/12 3:06 AM, H. S. Teoh wrote:
> >I've already given a number of non-trivial examples of transient forward ranges, and apparently deadalnix has also encountered the same issue.
> 
> I'd like to build more of this argument. I pointed out that your range is actually defining .dup, not .save. I think an argument can be made that forward ranges with transient .front do not qualify as forward ranges, seeing as a forward range is modeled after a list that actually has data sitting in memory.
[...]

I suppose you could argue that way. But that still doesn't solve the problem of std.array.array with input ranges. Making std.array.array require forward ranges seems to defeat its purpose, to me. So even if you get rid of transience in forward ranges, you still need to make that distinction with input ranges. Which still requires large-scale code changes to existing Phobos algorithms. And which still requires some sort of complication to the existing range code.


T

-- 
Music critic: "That's an imitation fugue!"
November 06, 2012
On Tuesday, 6 November 2012 at 04:31:56 UTC, H. S. Teoh wrote:
> The problem is that you can't do this in generic code, because generic code by definition doesn't know how to copy an
> arbitrary type.

I'm not familiar with that definition of generic code. But I do feel that there's a pretty big problem with a language design if the language doesn't provide a generic way to make a copy of a variable. To be fair, e.g. C++ doesn't provide that either.

November 06, 2012
On Tue, Nov 06, 2012 at 05:49:44AM +0100, Tommi wrote:
> On Tuesday, 6 November 2012 at 04:31:56 UTC, H. S. Teoh wrote:
> >The problem is that you can't do this in generic code, because generic code by definition doesn't know how to copy an arbitrary type.
> 
> I'm not familiar with that definition of generic code. But I do feel that there's a pretty big problem with a language design if the language doesn't provide a generic way to make a copy of a variable. To be fair, e.g. C++ doesn't provide that either.

OK I worded that poorly. All I meant was that currently, there is no generic way to make a copy of something. It could be construed to be a bug or a language deficiency, but that's how things are currently.

One *could* introduce a new language construct for making a copy of something, of course, but that leads to all sorts of issues about implicit allocation, how to eliminate unnecessary implicit copying, etc.. It's not a simple problem, in spite of the simplicity of stating the problem.


T

-- 
Verbing weirds language. -- Calvin (& Hobbes)