October 16, 2012 Re: Tricky semantics of ranges & potentially numerous Phobos bugs | ||||
---|---|---|---|---|
| ||||
On Tue, Oct 16, 2012 at 09:21:18PM +0200, Jonathan M Davis wrote: [...] > We'd probably need to add a function to the standard library (e.g. deepCopy) which would be defined for all built-in types and which user-defined types could define (with something hasDeepCopy being there to check for it), and then code could use that for doing deep copies. So again, it's possible, but it doesn't currently exist. [...] I think with D's compile-time introspection capabilities, it _should_ be possible to implement a generic deepCopy template that works for any type. Of course, then one has to address tricky issues such as complex data structures that have interlinking parts; a blind recursive deep-copy may not have the desired effect (e.g., if we're deep-copying a graph and there are multiple paths (via references) to the same node, then we shouldn't end up with multiple copies of that node). Some care will also need to be taken to deal with cyclic structures, etc.. And some optimizations can probably be done to avoid copying immutable objects, since that would be a waste of time & memory. Probably some kind of graph traversal algorithm can be used to address these issues, I think, perhaps with the help of an AA or two to recreate the original linking structure in the copy. T -- He who sacrifices functionality for ease of use, loses both and deserves neither. -- Slashdotter |
October 16, 2012 Re: Tricky semantics of ranges & potentially numerous Phobos bugs | ||||
---|---|---|---|---|
| ||||
On Tue, Oct 16, 2012 at 9:41 PM, H. S. Teoh <hsteoh@quickfur.ath.cx> wrote: > On Tue, Oct 16, 2012 at 09:21:18PM +0200, Jonathan M Davis wrote: [...] >> We'd probably need to add a function to the standard library (e.g. deepCopy) which would be defined for all built-in types and which user-defined types could define (with something hasDeepCopy being there to check for it), and then code could use that for doing deep copies. So again, it's possible, but it doesn't currently exist. > [...] > > I think with D's compile-time introspection capabilities, it _should_ be possible to implement a generic deepCopy template that works for any type. I agree. Heck, IIRC I posted a putative deep copy code here some years ago. Today's Phobos and __traits and the recently improved is() expression would make that even easier. After all, any (aggregate) value in D can be decomposed into smaller parts, until the algorithm finds a leaf/terminal: either a value type or an array of value types. > Of course, then one has to address tricky issues such as complex data structures that have interlinking parts; a blind recursive deep-copy may not have the desired effect (e.g., if we're deep-copying a graph and there are multiple paths (via references) to the same node, then we shouldn't end up with multiple copies of that node). Some care will also need to be taken to deal with cyclic structures, etc.. And some optimizations can probably be done to avoid copying immutable objects, since that would be a waste of time & memory. Good idea.Detecting immutable seems easy. Doing a graph traversal is a bit trickier. |
October 16, 2012 Re: Tricky semantics of ranges & potentially numerous Phobos bugs | ||||
---|---|---|---|---|
| ||||
On Tue, Oct 16, 2012 at 11:00:18PM +0200, Philippe Sigaud wrote: > On Tue, Oct 16, 2012 at 9:41 PM, H. S. Teoh <hsteoh@quickfur.ath.cx> wrote: [...] > > I think with D's compile-time introspection capabilities, it _should_ be possible to implement a generic deepCopy template that works for any type. > > I agree. Heck, IIRC I posted a putative deep copy code here some years > ago. Today's Phobos and __traits and the recently improved is() > expression would make that even easier. > After all, any (aggregate) value in D can be decomposed into smaller > parts, until the algorithm finds a leaf/terminal: either a value type > or an array of value types. I just thought of a potential problem. Right now I don't think there's a way to tell whether a reference is a "hard" reference or an "external" reference. For example, if you have a tree in which nodes reference some global table of objects, you want to deep-copy only the tree nodes, not the objects they refer to, otherwise you may end up duplicating the entire global table every time you duplicate a tree. > > Of course, then one has to address tricky issues such as complex data structures that have interlinking parts; a blind recursive deep-copy may not have the desired effect (e.g., if we're deep-copying a graph and there are multiple paths (via references) to the same node, then we shouldn't end up with multiple copies of that node). Some care will also need to be taken to deal with cyclic structures, etc.. And some optimizations can probably be done to avoid copying immutable objects, since that would be a waste of time & memory. > > Good idea. Detecting immutable seems easy. Doing a graph traversal is a bit trickier. I was thinking probably a bool[void*] should work as a way of keeping track of what has already been copied, so a simple depth-first traversal should work. And maybe another T[void*] for looking up cross-edges in the DFS tree. T -- One disk to rule them all, One disk to find them. One disk to bring them all and in the darkness grind them. In the Land of Redmond where the shadows lie. -- The Silicon Valley Tarot |
October 16, 2012 Re: Tricky semantics of ranges & potentially numerous Phobos bugs | ||||
---|---|---|---|---|
| ||||
Posted in reply to Jonathan M Davis | On Tuesday, 16 October 2012 at 17:06:06 UTC, Jonathan M Davis wrote:
> There's also no way to validate that front always returns the same value, or that popFront actually pops a value, or that it pops only one value, etc. Pretty much _all_ that we can verify with template constraints is function signatures. So, we can _never_ fully restrict range types to exactly what would be considered correct.
An option that may be decided on, is if an enum is used to specify certain behavior decisions made by the programmer, then only a few new templates to check for those and return true when those are specifically true. Perhaps an example to use?
enum RangeTypes { normal, mutating, popDisgarded, ... }
template isRangeMutating(T)
if (hasMember!(T, "rangeType")) {
enum isRangeMutating = T.rangeType == RangeType.mutating;
}
//function that cannot accept input range that mutates any members due to
//popFront/popBack
void something(I)(I input)
if (isInputRange!(I) && (!isRangeMutating!(I)))
{ /*...*/ }
|
October 16, 2012 Re: Tricky semantics of ranges & potentially numerous Phobos bugs | ||||
---|---|---|---|---|
| ||||
Posted in reply to Era Scarecrow | On Wednesday, October 17, 2012 00:27:08 Era Scarecrow wrote:
> On Tuesday, 16 October 2012 at 17:06:06 UTC, Jonathan M Davis
>
> wrote:
> > There's also no way to validate that front always returns the same value, or that popFront actually pops a value, or that it pops only one value, etc. Pretty much _all_ that we can verify with template constraints is function signatures. So, we can _never_ fully restrict range types to exactly what would be considered correct.
>
> An option that may be decided on, is if an enum is used to specify certain behavior decisions made by the programmer, then only a few new templates to check for those and return true when those are specifically true. Perhaps an example to use?
>
>
> enum RangeTypes { normal, mutating, popDisgarded, ... }
>
>
> template isRangeMutating(T)
> if (hasMember!(T, "rangeType")) {
> enum isRangeMutating = T.rangeType == RangeType.mutating;
> }
>
>
> //function that cannot accept input range that mutates any
> members due to
> //popFront/popBack
> void something(I)(I input)
> if (isInputRange!(I) && (!isRangeMutating!(I)))
> { /*...*/ }
We arguably have to many traits to check for ranges already. This is just overcomplicating things IMHO. Ranges should not be this complicated. They're pushing it in that regard as it is.
- Jonathan M Davis
|
October 16, 2012 Re: Tricky semantics of ranges & potentially numerous Phobos bugs | ||||
---|---|---|---|---|
| ||||
Jonathan M Davis wrote: > On Tuesday, October 16, 2012 10:58:23 David Gileadi wrote: > > On 10/16/12 10:28 AM, Jonathan M Davis wrote: > > > Any range is free to violate this, but because range-based functions are free to rely on it, such ranges risk not working correctly with many range-based functions and must be used with caution. > > > > As a D dabbler, it seems wrong to me that a built-in range like byLine, which is often used in example code, should have weird side effects with the built-in methods in std.algorithm. IMO this needs to be fixed one way or another, and a mere documentation change is likely not enough for casual D users. > > ByLine needs to do what it does for performance reasons, so its implementation makes a lot of sense, and it really wouldn't make sense for it to do what would be necessary to make it work as a normal range (since that would mean allocating a new buffer for every line), so if the fact that it's likely to be used and then misused by newbies is a big problem, then that's a strong argument for making it just use opApply rather than being a proper range. I certainly don't think that it makes sense to change how ranges work in general just to better support ByLine's weirdness. it would be far too restrictive to do so. I've not really looked at byLine. Why do you have to do an allocation for each line? byChunk seems to not suffer from this problem (see http://d.puremagic.com/issues/show_bug.cgi?id=8085). Does this mean byChunk is less efficient? I want to add that if byLine only implements opApply, then there should be some adapter that turns anything which has an opApply into a range. Then you have to do: byLine().adapter().joiner(). And I also think that the documentation needs to clarify whether popFront may invalidate t, if t was obtained before via auto t = r.front. Jens |
October 16, 2012 Re: Tricky semantics of ranges & potentially numerous Phobos bugs | ||||
---|---|---|---|---|
| ||||
On Wednesday, October 17, 2012 00:56:30 Jens Mueller wrote:
> I've not really looked at byLine. Why do you have to do an allocation
> for each line?
> byChunk seems to not suffer from this problem (see
> http://d.puremagic.com/issues/show_bug.cgi?id=8085). Does this mean
> byChunk is less efficient?
>
> I want to add that if byLine only implements opApply, then there should
> be some adapter that turns anything which has an opApply into a range.
> Then you have to do: byLine().adapter().joiner().
>
> And I also think that the documentation needs to clarify whether popFront may invalidate t, if t was obtained before via auto t = r.front.
ByLine and ByChunk are in the same boat. If they want to function sanely as
ranges, then they need to allocate a new buffer for every line/chunk. Neither
of them do that, because that's inefficient in many cases (particularly when you
don't need to keep a particular line/chunk around for the next iteration).
Rather, they reuse their buffers. That means that when popFront is called, the
previous result of front gets overwritten, which screws over a lot of range-
based algorithms. It's normally perfectly safe to assume that the result of
front stays valid after popFront is called, but that's not the case with
ByLine or ByChunk, which is what makes them so horrible as ranges.
If they use opApply, they still have the same problem insomuch as anyone using them has to understand that they reuse their buffers and that they must therefore dup the buffers if they want to keep anything around after an iteration, but then at least they won't be being passed to range-based functions where they often won't work and will end up with weird, confusing results (e.g. try using std.array.array on ByLine). Providing a means of getting a range from them is fine, but it needs to allocate a new buffer for each element in the range, or there _will_ be problems.
- Jonathan M Davis
|
October 17, 2012 Re: Tricky semantics of ranges & potentially numerous Phobos bugs | ||||
---|---|---|---|---|
| ||||
Posted in reply to Jonathan M Davis | On Tuesday, 16 October 2012 at 23:14:01 UTC, Jonathan M Davis wrote:
>
> - Jonathan M Davis
Crazy idea:
Can't we just have a "byLineDeep" or "byLineSafe", but that returns actual deep copied immutable strings?
This way, "byLineDeep" is 100% safe, no tricks involved. Ranges are not made any more complicated then they already are.
Then, mark "byLine" as "potentially unsafe, but faster. May fail if used in an algorithm. Meant for manual iteratation. use at your own risk". Then, anybody who *knows* what he is doing can go for it.
Of course, I'd have preferred the naming to be the other way around (safe by default, unsafe on demand), but things are as they are (maybe).
----
I know I may not be used to using "byLine" yet, but the sole fact that it returns a "char[]" as opposed to a "string" as been an infinite source of frustration for me.
I'd *gladly* take a 4x penalty hit on something I won't even see anyway, if it means I can forget about having to worry about god damned aliased buffers.
|
October 17, 2012 Re: Tricky semantics of ranges & potentially numerous Phobos bugs | ||||
---|---|---|---|---|
| ||||
Posted in reply to monarch_dodra | monarch_dodra: > Can't we just have a "byLineDeep" or "byLineSafe", but that returns actual deep copied immutable strings? See this old enhancement of mine they have closed: http://d.puremagic.com/issues/show_bug.cgi?id=4474 Bye, bearophile |
October 17, 2012 Re: Tricky semantics of ranges & potentially numerous Phobos bugs | ||||
---|---|---|---|---|
| ||||
Posted in reply to Jonathan M Davis | On 10/16/12 1:59 AM, Jonathan M Davis wrote:
> On Monday, October 15, 2012 22:48:15 Jonathan M Davis wrote:
>> So, I don't really know what the right answer is, but I _really_ don't like
>> the idea of having to worry about the result of front changing after a call
>> to popFront in every single function that ever uses front. In the general
>> case, I just don't see how that's tenable. I'd _much_ rather that it be up
>> to the programmer using abnormal ranges such as ByLine to use them
>> correctly.
>
> And actually, it seems to me that issues like this make it look like it was a
> mistake to make ranges like ByLine ranges in the first place. They should have
> just defined opApply so that they'd work nicely in foreach but not with range-
> based functions. They're clearly not going to work with a _lot_ of range-based
> functions.
I think integration of pure streams within ranges is important and beneficial.
Andrei
|
Copyright © 1999-2021 by the D Language Foundation