Tricky semantics of ranges & potentially numerous Phobos bugs (page 2) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » Tricky semantics of ranges & potentially numerous Phobos bugs (page 2)

October 16, 2012

Re: Tricky semantics of ranges & potentially numerous Phobos bugs

Posted by deadalnix
in reply to H. S. Teoh

deadalnix

Posted in reply to H. S. Teoh

Le 16/10/2012 17:17, H. S. Teoh a écrit :
> On Tue, Oct 16, 2012 at 05:01:57PM +0200, Tommi wrote:
>> On Tuesday, 16 October 2012 at 05:49:11 UTC, Jonathan M Davis wrote:
>>> So, I don't really know what the right answer is, but I _really_
>>> don't like the idea of having to worry about the result of front
>>> changing after a call to popFront in every single function that ever
>>> uses front.
>>
>> Isn't the deeper underlying problem here the fact that there's no
>> generic way to make a deep copy in this language (does any language?)
>> Making a deep copy of front would be a clean solution. I don't know
>> how to implement this generic deep copying functionality though. For
>> example: what would be the semantics of deep-copying a range?
>
> I don't think the problem here is whether we have a deep copy or not,
> but that the semantics of ranges are not defined clearly enough. If we
> clearly state, for example, that whatever is returned by .front must
> persist beyond popFront(), then it would be up to the range to implement
> the deep copy, e.g., return the .dup or .idup'd array instead of a
> reference to an internal buffer.
>

It wouldn't be enough. Does several call to front recompute things ? See http://d.puremagic.com/issues/show_bug.cgi?id=8803

October 16, 2012

Re: Tricky semantics of ranges & potentially numerous Phobos bugs

Posted by Peter Alexander
in reply to deadalnix

Peter Alexander

Posted in reply to deadalnix

On Tuesday, 16 October 2012 at 15:47:49 UTC, deadalnix wrote:
> It wouldn't be enough. Does several call to front recompute things ? See http://d.puremagic.com/issues/show_bug.cgi?id=8803

That's up to the range. You shouldn't rely on it caching anything (map does not, but it used to).

October 16, 2012

Re: Tricky semantics of ranges & potentially numerous Phobos bugs

Posted by Jonathan M Davis

Jonathan M Davis

On Tuesday, October 16, 2012 06:30:53 H. S. Teoh wrote:
> On Mon, Oct 15, 2012 at 10:59:26PM -0700, Jonathan M Davis wrote:
> > On Monday, October 15, 2012 22:48:15 Jonathan M Davis wrote:
> > > So, I don't really know what the right answer is, but I _really_ don't like the idea of having to worry about the result of front changing after a call to popFront in every single function that ever uses front. In the general case, I just don't see how that's tenable. I'd _much_ rather that it be up to the programmer using abnormal ranges such as ByLine to use them correctly.
> > 
> > And actually, it seems to me that issues like this make it look like it was a mistake to make ranges like ByLine ranges in the first place. They should have just defined opApply so that they'd work nicely in foreach but not with range- based functions. They're clearly not going to work with a _lot_ of range-based functions.
> 
> [...]
> 
> But nothing about the definition of a range, as currently defined in std.range, guarantees that whatever value was returned by .front is still valid after popFront() is called.
> 
> I'm not saying that this should be the "correct" behaviour, but the current definition does not prohibit a range from, say, reusing an array to compute its next element. For example, one may have a range that returns the permutations of a given array, in which popFront() permutes the elements in-place. In this case, .front will become invalid once popFront() is called. Many of the current range-based functions will not work correctly in this case.
> 
> Of course, it's arguable whether such ranges should be admissible, but as far as the current definition goes, I don't see anything that states otherwise. If we don't make such things clear, then we're bound to run into pathological cases where bugs will creep in because of unstated assumptions in the code.

There's only so much that the compiler can check for you. Sure, we could say in the docs that front must still be valid after a call to popFront (and perhaps we should), but there's no way to validate that.

There's also no way to validate that front always returns the same value, or that popFront actually pops a value, or that it pops only one value, etc. Pretty much _all_ that we can verify with template constraints is function signatures. So, we can _never_ fully restrict range types to exactly what would be considered correct.

So, if we have expectations about how ranges should/must work, then that should be in the docs somewhere, but the definition of a range from the compiler's perspective can only ever be defined by eponymous templates which can't possibly verify correct behavior, meaning that something like ByLine will always be permitted even if it doesn't work, because it breaks the semantic design of ranges.

- Jonathan M Davis

October 16, 2012

Re: Tricky semantics of ranges & potentially numerous Phobos bugs

Posted by H. S. Teoh

H. S. Teoh

On Tue, Oct 16, 2012 at 07:02:58PM +0200, Jonathan M Davis wrote:
> On Tuesday, October 16, 2012 06:30:53 H. S. Teoh wrote:
[...]
> > I'm not saying that this should be the "correct" behaviour, but the current definition does not prohibit a range from, say, reusing an array to compute its next element. For example, one may have a range that returns the permutations of a given array, in which popFront() permutes the elements in-place. In this case, .front will become invalid once popFront() is called. Many of the current range-based functions will not work correctly in this case.
> > 
> > Of course, it's arguable whether such ranges should be admissible, but as far as the current definition goes, I don't see anything that states otherwise. If we don't make such things clear, then we're bound to run into pathological cases where bugs will creep in because of unstated assumptions in the code.
> 
> There's only so much that the compiler can check for you. Sure, we could say in the docs that front must still be valid after a call to popFront (and perhaps we should), but there's no way to validate that.

I wasn't talking about compiler validation. I was talking about clearly defining, in the docs or otherwise, what exactly a range is, and what is expected of it. Right now, it's rather nebulous what exactly constitutes a range. I thought byLine() was a perfectly valid range, but apparently you think otherwise. The two aren't compatible, since they lead to wrong code that has buggy behaviour when passed something it doesn't expect.


[...]
> So, if we have expectations about how ranges should/must work, then that should be in the docs somewhere, but the definition of a range from the compiler's perspective can only ever be defined by eponymous templates which can't possibly verify correct behavior, meaning that something like ByLine will always be permitted even if it doesn't work, because it breaks the semantic design of ranges.
[...]

So what is (or should be) the semantic design of ranges? Let's work out a precise definition so that we have something to build on.


T

-- 
Three out of two people have difficulties with fractions. -- Dirk Eddelbuettel

October 16, 2012

Re: Tricky semantics of ranges & potentially numerous Phobos bugs

Posted by Jonathan M Davis

Jonathan M Davis

On Tuesday, October 16, 2012 08:17:40 H. S. Teoh wrote:
> On Tue, Oct 16, 2012 at 05:01:57PM +0200, Tommi wrote:
> > On Tuesday, 16 October 2012 at 05:49:11 UTC, Jonathan M Davis wrote:
> > >So, I don't really know what the right answer is, but I _really_ don't like the idea of having to worry about the result of front changing after a call to popFront in every single function that ever uses front.
> > 
> > Isn't the deeper underlying problem here the fact that there's no generic way to make a deep copy in this language (does any language?) Making a deep copy of front would be a clean solution. I don't know how to implement this generic deep copying functionality though. For example: what would be the semantics of deep-copying a range?
> 
> I don't think the problem here is whether we have a deep copy or not, but that the semantics of ranges are not defined clearly enough. If we clearly state, for example, that whatever is returned by .front must persist beyond popFront(), then it would be up to the range to implement the deep copy, e.g., return the .dup or .idup'd array instead of a reference to an internal buffer.

1. There is no generic way to deep copy stuff. So, requiring a deep copy of front just plain doesn't make sense. It would be completely untenable. By doing so, you basically make it impossible to use the result of front beyond a call to popFront in the general case.

2. Often, a deep copy isn't needed in the least, even with a messed up range like ByLine. If it were an array of class objects rather than char, doing a deep copy would needlessly copy all of those class objects as well. It could quickly become a performance nightmare.

- Jonathan M Davis

October 16, 2012

Re: Tricky semantics of ranges & potentially numerous Phobos bugs

Posted by Jonathan M Davis

Jonathan M Davis

On Tuesday, October 16, 2012 10:13:57 H. S. Teoh wrote:
> I wasn't talking about compiler validation. I was talking about clearly defining, in the docs or otherwise, what exactly a range is, and what is expected of it. Right now, it's rather nebulous what exactly constitutes a range. I thought byLine() was a perfectly valid range, but apparently you think otherwise. The two aren't compatible, since they lead to wrong code that has buggy behaviour when passed something it doesn't expect.

ByLine is perfectly valid range insofar as you realize that it's likely to go completely south if you use it in any way that could involve keeping front around after popFront has been called, which means that anything which relies on keeping front around isn't going to work. So, it's a range, but it's essentially an unsafe one (though I'm not sure that it's an un-@safe one).

So, it's fine that ByLine is a range as long as we're willing to put up with it not working with a lot of range-based functions because of its abnormal behavior. But I don't think that it's at all reasonable for range-based functions in general to not be able to rely on front returning the same type every time or on its value disappearing or becoming invalid in some way after a call to popFront. That's completely untenable IMHO.

Ranges _can_ define semantics which violate that, but they have to make it clear that they do so that programmers using them realize that they may not work right with a lot of range-based functions (which potentially makes it so that it they really shouldn't have been ranges in the first place).

> So what is (or should be) the semantic design of ranges? Let's work out a precise definition so that we have something to build on.

As far as front (or back) goes, range-based functions should be able to rely on

1. front returning the exact same value on every call until popFront has been called (though there's no guarantee that front won't have to be recalculated on each call, so assigning the result of front to a local variable is advisable for efficiency if front must be used multiple times before a call to popFront).

2. the result of front continuing to be valid and unchanged after popFront has been called if it was assigned to a variable.

Any range is free to violate this, but because range-based functions are free to rely on it, such ranges risk not working correctly with many range-based functions and must be used with caution.

- Jonathan M Davis

October 16, 2012

Re: Tricky semantics of ranges & potentially numerous Phobos bugs

Posted by Tommi
in reply to Jonathan M Davis

Tommi

Posted in reply to Jonathan M Davis

On Tuesday, 16 October 2012 at 17:19:26 UTC, Jonathan M Davis wrote:
> 1. There is no generic way to deep copy stuff.

Could you elaborate on that a bit more?

How about a deep-assign operator? The compiler could implicitly define it at least for arrays and for structs that don't have mutable indirection:

struct MyStruct
{
    int _value;

    // implicit:
    ref MyStruct opDeepAssign(ref const MyStruct ms)
    {
        this = ms;
        return this;
    }
}

int[] array1 = [1, 2, 3];
int[] array2;
array2.opDeepAssign(array1); // array2 = array1.dup;

And there would be hasDeepAssign!T traits and whatnot.

October 16, 2012

Re: Tricky semantics of ranges & potentially numerous Phobos bugs

Posted by David Gileadi
in reply to Jonathan M Davis

David Gileadi

Posted in reply to Jonathan M Davis

On 10/16/12 10:28 AM, Jonathan M Davis wrote:
> Any range is free to violate this, but because range-based functions are free
> to rely on it, such ranges risk not working correctly with many range-based
> functions and must be used with caution.

As a D dabbler, it seems wrong to me that a built-in range like byLine, which is often used in example code, should have weird side effects with the built-in methods in std.algorithm.  IMO this needs to be fixed one way or another, and a mere documentation change is likely not enough for casual D users.

October 16, 2012

Re: Tricky semantics of ranges & potentially numerous Phobos bugs

Posted by Jonathan M Davis
in reply to David Gileadi

Jonathan M Davis

Posted in reply to David Gileadi

On Tuesday, October 16, 2012 10:58:23 David Gileadi wrote:
> On 10/16/12 10:28 AM, Jonathan M Davis wrote:
> > Any range is free to violate this, but because range-based functions are free to rely on it, such ranges risk not working correctly with many range-based functions and must be used with caution.
> 
> As a D dabbler, it seems wrong to me that a built-in range like byLine, which is often used in example code, should have weird side effects with the built-in methods in std.algorithm. IMO this needs to be fixed one way or another, and a mere documentation change is likely not enough for casual D users.

ByLine needs to do what it does for performance reasons, so its implementation makes a lot of sense, and it really wouldn't make sense for it to do what would be necessary to make it work as a normal range (since that would mean allocating a new buffer for every line), so if the fact that it's likely to be used and then misused by newbies is a big problem, then that's a strong argument for making it just use opApply rather than being a proper range. I certainly don't think that it makes sense to change how ranges work in general just to better support ByLine's weirdness. it would be far too restrictive to do so.

- Jonathan M Davis

October 16, 2012

Re: Tricky semantics of ranges & potentially numerous Phobos bugs

Posted by Jonathan M Davis
in reply to Tommi

Jonathan M Davis

Posted in reply to Tommi

On Tuesday, October 16, 2012 19:51:09 Tommi wrote:
> On Tuesday, 16 October 2012 at 17:19:26 UTC, Jonathan M Davis
> 
> wrote:
> > 1. There is no generic way to deep copy stuff.

There is no standard function for copying anything. dup and idup are array- specific. It's a commonly used function name for a clone function on objects, but that's just a convention and not even necessarily a consistently used one. Some types are value types and so simply making a copy of them does a deep copy, whereas others are reference types and making a copy only does a shallow copy (hence why save was created for ranges). There's no way to look at a type and know how to correctly copy it. The best that could be done would be to standardize on dup, in which case it would be defined it for all built-in stuff via UFCS, and any user-defined stuff that wanted to be generically copied would define it. But we haven't done anything like that.

But really, not even that would be enough, because dup doesn't do a deep copy on arrays. It does something in between a shallow copy and a deep copy. It copies the array itself rather than the pointer, meaning that you get a totally separate array, but it doesn't do a deep copy of the elements, so if you had an array of reference types, anything done to the duped array's elements (other than assigning them new references) will affect the originals.

> How about a deep-assign operator? The compiler could implicitly define it at least for arrays and for structs that don't have mutable indirection:
> 
> struct MyStruct
> {
> int _value;
> 
> // implicit:
> ref MyStruct opDeepAssign(ref const MyStruct ms)
> {
> this = ms;
> return this;
> }
> }
> 
> int[] array1 = [1, 2, 3];
> int[] array2;
> array2.opDeepAssign(array1); // array2 = array1.dup;
> 
> And there would be hasDeepAssign!T traits and whatnot.

Something like that could probably be done, but that's not really all that different from just defining dup for stuff and having a hasDup trait, which can be done without any changes to the language like opDeepAssign would require. And again, it doesn't quite do the job due to how dup on arrays work. We'd probably need to add a function to the standard library (e.g. deepCopy) which would be defined for all built-in types and which user-defined types could define (with something hasDeepCopy being there to check for it), and then code could use that for doing deep copies. So again, it's possible, but it doesn't currently exist.

- Jonathan M Davis

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation