November 06, 2012
On Mon, Nov 05, 2012 at 11:17:08PM -0800, Jonathan M Davis wrote:
> On Tuesday, November 06, 2012 08:49:26 Andrei Alexandrescu wrote:
> > On 11/6/12 4:36 AM, H. S. Teoh wrote:
> > > Hmm. Another idea just occurred to me. The basic problem here is that we are conflating two kinds of values, transient and persistent, under a single name .front. What if we explicitly name them? Say, .copyFront for the non-transient value and .refFront for the transient value (the names are unimportant right now, let's consider the semantics of it).
> > 
> > We could transfer that matter to the type of .front itself, i.e. define a function copy(x) that returns e.g. x for for string and x.dup for char[] etc. There would be problems on e.g. defining copy for structs with pointer and class reference fields etc.
> > 
> > One quite simple approach would be to define (on the contrary) .peekFront, which means "yeah, I'd like to take a peek at the front but I don't plan to store it anywhere". That would entail we define eachLine etc. to return string from .front and char[] from .peekFront, and deprecate byLine.

I like .peekFront. It's easy to understand, and is self-documenting. And this approach is safe by default (all current code uses .front), and can be implemented incrementally (doesn't require updating all of Phobos at once to avoid transience-related bugs in the interim).


> peekFront would work better than copy, because whether front needs to be copied or not doesn't necessarily have much to do with its type.

Yeah, I think trying to guess transience from the return type is risky, and prone to bugs. OTOH, having a generic language-wide way to duplicate a value is something worth considering. I think it will allow us to solve a number of other nagging issues.


[...]
> At least this avoids the need to create more traits to test ranges for, since if we create a free function peekFront, all range types can just assume that it's there and create wrappers for it without caring whether the wrapped range defines it itself or uses the free function. And it's less complicated than the .transient suggestion. Though it _does_ introduce the possibility of front and peekFront returning completely different types, which could complicate things a bit. It might be better to require that they be identical to avoid that problem.

The free function can be made to always conform to the same type as .front:

	ElementType!R peekFront(R)(R range) { return range.front; }

Though it still doesn't prevent a member called .peekFront of a different type. We *could* use template constraints to enforce that though:

	auto myRangeFunc(R)(R range)
		if (is(typeof(range.front)==typeof(range.peekFront)))
	...

This could be put in a template that the other range-checking templates can call.


> For better or worse though, this approach would mean that byLine (or eachLine or whatever) wouldn't be reusing the buffer with foreach like they do now, though I suppose that you could make them have opApply which does the same thing as now (meaning that it effectively uses peekFront), and then any range- based functions would use front until they were updated to use peekFront if appropriate. But then again, maybe we want byLine/eachLine to copy by default, since that's safer, much as it's less efficient, since then we have safe by default but still have an explicit means to be more efficient. That fits in well with our general approach.

I think we should make byLine copy by default. Only if you explicitly call peekFront you get the non-copying behaviour. This fits better with D's motto of being safe by default, and fast if you know what you're doing.


> peekFront may be the way to go, but I think that we need to think through the consequences (like the potential problems caused by front and peekFront returning different types) before we decide on this.
[...]

For simplicity, I think we should just enforce typeof(.front) ==
typeof(.peekFront). We will get needless complications otherwise.


T

-- 
We've all heard that a million monkeys banging on a million typewriters will eventually reproduce the entire works of Shakespeare.  Now, thanks to the Internet, we know this is not true. -- Robert Wilensk
November 06, 2012
On Tuesday, 6 November 2012 at 16:09:59 UTC, H. S. Teoh wrote:
> On Mon, Nov 05, 2012 at 11:17:08PM -0800, Jonathan M Davis wrote:
>> On Tuesday, November 06, 2012 08:49:26 Andrei Alexandrescu wrote:
>> > On 11/6/12 4:36 AM, H. S. Teoh wrote:
>> > One quite simple approach would be to define (on the contrary) .peekFront, which means "yeah, I'd like to take a peek at the front but I don't plan to store it anywhere".



That's not what "peek" means in any of the languages I've seen.
Peek just means "show me the element, but don't pop it off".

It does not have to do anything with how the caller uses the result.

It would be very confusing to have both front and peekFront.
November 06, 2012
Le 06/11/2012 07:49, Andrei Alexandrescu a écrit :
> On 11/6/12 4:36 AM, H. S. Teoh wrote:
>> Hmm. Another idea just occurred to me. The basic problem here is that we
>> are conflating two kinds of values, transient and persistent, under a
>> single name .front. What if we explicitly name them? Say, .copyFront for
>> the non-transient value and .refFront for the transient value (the
>> names are unimportant right now, let's consider the semantics of it).
>
> We could transfer that matter to the type of .front itself, i.e. define
> a function copy(x) that returns e.g. x for for string and x.dup for
> char[] etc. There would be problems on e.g. defining copy for structs
> with pointer and class reference fields etc.
>
> One quite simple approach would be to define (on the contrary)
> .peekFront, which means "yeah, I'd like to take a peek at the front but
> I don't plan to store it anywhere". That would entail we define eachLine
> etc. to return string from .front and char[] from .peekFront, and
> deprecate byLine.
>
>
> Andrei

Is it possible to have the pro and cons of peekFront vs transient ?
November 06, 2012
On 11/06/2012 07:56 AM, Andrei Alexandrescu wrote:
> On 11/6/12 7:48 AM, H. S. Teoh wrote:
>> On Tue, Nov 06, 2012 at 05:49:44AM +0100, Tommi wrote:
>>> On Tuesday, 6 November 2012 at 04:31:56 UTC, H. S. Teoh wrote:
>>>> The problem is that you can't do this in generic code, because
>>>> generic code by definition doesn't know how to copy an arbitrary
>>>> type.
>>>
>>> I'm not familiar with that definition of generic code. But I do feel
>>> that there's a pretty big problem with a language design if the
>>> language doesn't provide a generic way to make a copy of a variable.
>>> To be fair, e.g. C++ doesn't provide that either.
>>
>> OK I worded that poorly. All I meant was that currently, there is no
>> generic way to make a copy of something. It could be construed to be a
>> bug or a language deficiency, but that's how things are currently.
>>
>> One *could* introduce a new language construct for making a copy of
>> something, of course, but that leads to all sorts of issues about
>> implicit allocation, how to eliminate unnecessary implicit copying,
>> etc.. It's not a simple problem, in spite of the simplicity of stating
>> the problem.
>
> Here's where user defined @tributes would help a lot. We'd then define
> @owned to mention that a class reference field inside an object must be
> duplicated upon copy:
>
> class A { ... }
>
> struct B
> {
>     @owned A payload;
>     A another;
>     ...
> }
>
> That way a generic clone() routine could be written.
>
>
> Andrei

That is actually the first thing that I will use the new attribute system for.
November 07, 2012
On Tue, Nov 06, 2012 at 10:03:56PM +0100, deadalnix wrote:
> Le 06/11/2012 07:49, Andrei Alexandrescu a écrit :
> >On 11/6/12 4:36 AM, H. S. Teoh wrote:
> >>Hmm. Another idea just occurred to me. The basic problem here is that we are conflating two kinds of values, transient and persistent, under a single name .front. What if we explicitly name them? Say, .copyFront for the non-transient value and .refFront for the transient value (the names are unimportant right now, let's consider the semantics of it).
> >
> >We could transfer that matter to the type of .front itself, i.e. define a function copy(x) that returns e.g. x for for string and x.dup for char[] etc. There would be problems on e.g. defining copy for structs with pointer and class reference fields etc.
> >
> >One quite simple approach would be to define (on the contrary) .peekFront, which means "yeah, I'd like to take a peek at the front but I don't plan to store it anywhere". That would entail we define eachLine etc. to return string from .front and char[] from .peekFront, and deprecate byLine.
[...]
> Is it possible to have the pro and cons of peekFront vs transient ?

I'll give it a shot, since nobody else is replying:

Both .peekFront and .transient require at least existing transient ranges to be modified, as well as algorithms that can take advantage of them.

- For .peekFront, existing transient ranges' .front is just renamed to
  .peekFront, and we add a new .front which returns the .dup (or
  otherwise copy) of .peekFront.  For .transient, we need to write a
  .transient method that returns a wrapper struct whose .front is
  transient.
   * So .peekFront is slightly simpler in this case.

- For .peekFront, algorithms that can handle transience can simply have
  all occurrences of .front replaced with .peekFront. Whereas for
  .transient, they will need to modified to explicitly call .transient,
  save that range, and use that instead of the passed-in range.
   * So .peekFront is slightly simpler in this case.

- For .peekFront, algorithms that *can't* handle transience will have to
  be rewritten. Same goes for .transient. The redeeming factor in both
  cases is that algorithms that aren't rewritten yet will continue to
  work as before, just a bit slower. They will also begin to work
  correctly with transient ranges even _before_ being rewritten, whereas
  right now they're broken.
   * Here, .peekFront and .transient are equal in terms of effort
     required.

- For .peekFront, the .front of *any* range is guaranteed to be
  non-transient, so we never have to worry about whether a particular
  range's .front is transient or not. User code has no possibility of
  passing a transient range into an algorithm that cannot handle it.
  For .transient, the .front of ranges is non-transient by default, but
  once you call .transient on it, you get a range whose .front is
  transient. There is a small possibility of wrong user code that passes
  a .transient range to an algorithm that can't handle it.
   * So .peekFront is slightly safer on this point.

These are the points that I can think of, off the top of my head. Are there any others?

Either way, both .peekFront and .transient requires rewriting currently transient ranges (which AFAIK is only byLine) and any algorithms that should be able to handle transience (there are a few I can think of, like std.algorithm.joiner, std.array.join, and maybe NWayUnion and writeln & co.). We can't do better than this if we're going to support transience at all.

I note, though, that the list of algorithms that need to be updated is shorter (perhaps significantly so) than the list of currently-broken algorithms that I posted earlier, because many of the algorithms in that list *cannot* be implemented to handle transience. E.g. there is no way to implement uniq on a transient input range since there's no way to save the previous value seen, so there's no way to compare two adjacent values. Likewise, findAdjacent cannot be implemented for transient ranges for the same reason. Once these are eliminated from the list, there should only be a few algorithms left. (And I already have a version of joiner that works correctly with transient ranges.)


T

-- 
Windows: the ultimate triumph of marketing over technology. -- Adrian von Bidder
November 07, 2012
Le 07/11/2012 19:24, H. S. Teoh a écrit :
> On Tue, Nov 06, 2012 at 10:03:56PM +0100, deadalnix wrote:
>> Le 06/11/2012 07:49, Andrei Alexandrescu a écrit :
>>> On 11/6/12 4:36 AM, H. S. Teoh wrote:
>>>> Hmm. Another idea just occurred to me. The basic problem here is
>>>> that we are conflating two kinds of values, transient and
>>>> persistent, under a single name .front. What if we explicitly name
>>>> them? Say, .copyFront for the non-transient value and .refFront for
>>>> the transient value (the names are unimportant right now, let's
>>>> consider the semantics of it).
>>>
>>> We could transfer that matter to the type of .front itself, i.e.
>>> define a function copy(x) that returns e.g. x for for string and
>>> x.dup for char[] etc. There would be problems on e.g. defining copy
>>> for structs with pointer and class reference fields etc.
>>>
>>> One quite simple approach would be to define (on the contrary)
>>> .peekFront, which means "yeah, I'd like to take a peek at the front
>>> but I don't plan to store it anywhere". That would entail we define
>>> eachLine etc. to return string from .front and char[] from
>>> .peekFront, and deprecate byLine.
> [...]
>> Is it possible to have the pro and cons of peekFront vs transient ?
>
> I'll give it a shot, since nobody else is replying:
>
> Both .peekFront and .transient require at least existing transient
> ranges to be modified, as well as algorithms that can take advantage of
> them.
>
> - For .peekFront, existing transient ranges' .front is just renamed to
>    .peekFront, and we add a new .front which returns the .dup (or
>    otherwise copy) of .peekFront.  For .transient, we need to write a
>    .transient method that returns a wrapper struct whose .front is
>    transient.
>     * So .peekFront is slightly simpler in this case.
>

Agreed !

> - For .peekFront, algorithms that can handle transience can simply have
>    all occurrences of .front replaced with .peekFront. Whereas for
>    .transient, they will need to modified to explicitly call .transient,
>    save that range, and use that instead of the passed-in range.
>     * So .peekFront is slightly simpler in this case.
>

.transient seems simpler to me in this case. Adding one line is easier than replacing all occurrences of something.

> - For .peekFront, algorithms that *can't* handle transience will have to
>    be rewritten. Same goes for .transient. The redeeming factor in both
>    cases is that algorithms that aren't rewritten yet will continue to
>    work as before, just a bit slower. They will also begin to work
>    correctly with transient ranges even _before_ being rewritten, whereas
>    right now they're broken.
>     * Here, .peekFront and .transient are equal in terms of effort
>       required.
>

Indeed.

> - For .peekFront, the .front of *any* range is guaranteed to be
>    non-transient, so we never have to worry about whether a particular
>    range's .front is transient or not. User code has no possibility of
>    passing a transient range into an algorithm that cannot handle it.
>    For .transient, the .front of ranges is non-transient by default, but
>    once you call .transient on it, you get a range whose .front is
>    transient. There is a small possibility of wrong user code that passes
>    a .transient range to an algorithm that can't handle it.
>     * So .peekFront is slightly safer on this point.
>
> These are the points that I can think of, off the top of my head. Are
> there any others?
>
> Either way, both .peekFront and .transient requires rewriting currently
> transient ranges (which AFAIK is only byLine) and any algorithms that
> should be able to handle transience (there are a few I can think of,
> like std.algorithm.joiner, std.array.join, and maybe NWayUnion and
> writeln&  co.). We can't do better than this if we're going to support
> transience at all.
>
> I note, though, that the list of algorithms that need to be updated is
> shorter (perhaps significantly so) than the list of currently-broken
> algorithms that I posted earlier, because many of the algorithms in that
> list *cannot* be implemented to handle transience. E.g. there is no way
> to implement uniq on a transient input range since there's no way to
> save the previous value seen, so there's no way to compare two adjacent
> values. Likewise, findAdjacent cannot be implemented for transient
> ranges for the same reason. Once these are eliminated from the list,
> there should only be a few algorithms left. (And I already have a
> version of joiner that works correctly with transient ranges.)
>

OK, overall, .peekFront seems like a good idea. Something I'm afraid with .peekFront is code duplication.

Let's take the joiner example. Joiner's transientness depend on its source transientness. This seems to me like a very common case for transformer ranges. If we choose the .peekFront, the naive thing to do is to implement the same algorithm twice, using front and peekFront, which is code duplication, and usually a bad idea.

Is a solution known here ? Could alias template parameter help ? I'm not sure what is the solution, but if one exist, I'm sold for .peekFront ( .transientFront seems like a better name as peek have other meaning in other languages ).
November 07, 2012
On Wednesday, November 07, 2012 21:31:13 deadalnix wrote:
> OK, overall, .peekFront seems like a good idea. Something I'm afraid with .peekFront is code duplication.
> 
> Let's take the joiner example. Joiner's transientness depend on its source transientness. This seems to me like a very common case for transformer ranges. If we choose the .peekFront, the naive thing to do is to implement the same algorithm twice, using front and peekFront, which is code duplication, and usually a bad idea.

Why would you need to duplicate anything?. If you can implement it using peekFront, then you use peekFront. If you can't, you use front. And anything which uses front when you could have used peekFront will still work. Also, if a free function peekFront which forwards to front is defined, then all range- based functions can use peekFront if they need it regardless of whether a range defines it (it's just that it would do the same as front in the case where the range didn't define it).

- Jonathan M Davis
November 07, 2012
On Wed, Nov 07, 2012 at 10:40:32PM +0100, Jonathan M Davis wrote:
> On Wednesday, November 07, 2012 21:31:13 deadalnix wrote:
> > OK, overall, .peekFront seems like a good idea. Something I'm afraid with .peekFront is code duplication.
> > 
> > Let's take the joiner example. Joiner's transientness depend on its source transientness. This seems to me like a very common case for transformer ranges. If we choose the .peekFront, the naive thing to do is to implement the same algorithm twice, using front and peekFront, which is code duplication, and usually a bad idea.
> 
> Why would you need to duplicate anything?. If you can implement it using peekFront, then you use peekFront. If you can't, you use front. And anything which uses front when you could have used peekFront will still work. Also, if a free function peekFront which forwards to front is defined, then all range- based functions can use peekFront if they need it regardless of whether a range defines it (it's just that it would do the same as front in the case where the range didn't define it).
[...]

Yeah, UFCS will kick in if the range itself doesn't define peekFront, then the free function peekFront will simply call the range's .front instead.

So basically: if an algo needs .front to be persistent, they use .front. If they don't care if .front is persistent, they use .peekFront.


T

-- 
You are only young once, but you can stay immature indefinitely. -- azephrahel
November 09, 2012
Le 07/11/2012 22:40, Jonathan M Davis a écrit :
> On Wednesday, November 07, 2012 21:31:13 deadalnix wrote:
>> OK, overall, .peekFront seems like a good idea. Something I'm afraid
>> with .peekFront is code duplication.
>>
>> Let's take the joiner example. Joiner's transientness depend on its
>> source transientness. This seems to me like a very common case for
>> transformer ranges. If we choose the .peekFront, the naive thing to do
>> is to implement the same algorithm twice, using front and peekFront,
>> which is code duplication, and usually a bad idea.
>
> Why would you need to duplicate anything?. If you can implement it using
> peekFront, then you use peekFront. If you can't, you use front. And anything
> which uses front when you could have used peekFront will still work. Also, if
> a free function peekFront which forwards to front is defined, then all range-
> based functions can use peekFront if they need it regardless of whether a
> range defines it (it's just that it would do the same as front in the case
> where the range didn't define it).
>
> - Jonathan M Davis

OK, seeing your answer and H. S Teoh, I think I badly expressed myself, because none of you understood. Let's try to explain it better.

So back on our joiner range. This range have a source, and will be given to a consumer as follow :
source -> joiner -> consumer .

Now let's consider that source provide its own, transient peekFront. Now joiner can provide both peekFront (based on source.peekFront), which is transcient and front (based on source.front) which is not transcient.

The fact is that both front and peekFront in joiner will have the same implementation, because the transience of joiner depends of the transience of its source (like most tranformer ranges).

Or, with sample code :

struct Joiner {
    R source;
    // Fields and range stuffs.

    @property front() {
        // Some computation using source.front
    }

    @property peekFront() {
        // Exact same computation using source.peekFront
    }
}

I hope this make the code duplication more apparent.

It is impossible for joiner to provide only a version with peekFront, because joiner.front will be transient, and if it doesn't provide the version with .peekFront, it can't take advantage of the improvement that transience can provide.

So my question is, how this duplication can be avoided ? If it can, .peekFront is definitively the way to go. But if it can't, this seems like a really problematic point.
November 09, 2012
On Fri, Nov 09, 2012 at 02:52:20PM +0100, deadalnix wrote: [...]
> OK, seeing your answer and H. S Teoh, I think I badly expressed myself, because none of you understood. Let's try to explain it better.
> 
> So back on our joiner range. This range have a source, and will be
> given to a consumer as follow :
> source -> joiner -> consumer .
> 
> Now let's consider that source provide its own, transient peekFront. Now joiner can provide both peekFront (based on source.peekFront), which is transcient and front (based on source.front) which is not transcient.
> 
> The fact is that both front and peekFront in joiner will have the same implementation, because the transience of joiner depends of the transience of its source (like most tranformer ranges).
> 
> Or, with sample code :
> 
> struct Joiner {
>     R source;
>     // Fields and range stuffs.
> 
>     @property front() {
>         // Some computation using source.front
>     }
> 
>     @property peekFront() {
>         // Exact same computation using source.peekFront
>     }
> }
> 
> I hope this make the code duplication more apparent.
> 
> It is impossible for joiner to provide only a version with peekFront, because joiner.front will be transient, and if it doesn't provide the version with .peekFront, it can't take advantage of the improvement that transience can provide.

Hmm, you're right. At first I thought .front would just be a thin
wrapper around .peekFront (e.g., return peekFront().dup), but this only
works in the source range; joiner wouldn't know how to do that.


> So my question is, how this duplication can be avoided ? If it can, .peekFront is definitively the way to go. But if it can't, this seems like a really problematic point.

My first thought is to use a mixin to generate the function bodies, but that's really ugly and doesn't really avoid duplication in the generated code regardless. :-/ I'll have to think about it more carefully.


T

-- 
It is impossible to make anything foolproof because fools are so ingenious. -- Sammy