View mode: basic / threaded / horizontal-split · Log in · Help
April 30, 2009
Re: RFC: naming for FrontTransversal and Transversal ranges
On Thu, 30 Apr 2009 15:21:52 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail@erdani.org> wrote:
> Robert Jacques wrote:
>> On Thu, 30 Apr 2009 14:40:42 -0400, Rainer Deyke <rainerd@eldwood.com>  
>> wrote:
>>> In addition, there's this: suppose you have a struct containing a
>>> (mutable) array.  When you make a copy of that struct, you will almost
>>> always want to make a copy of the contained array.  Therefore value
>>> semantics should be the default, because it simplifies the most common
>>> use case.
>>  That's what struct ctors/dtors/opAssign are for. Personally, I think  
>> letting the struct designer decide the behaviour is better.
>
> The problem is what to do for the containers in Phobos.
>
> Andrei

Yes. I agree. Sorry, looking back, I interpreted what Rainer was talking  
about as applying to all structs and not as a supporting argument. It  
would have helped my understanding if the argument was concrete, i.e.  
about Rainer's actual experiences, and not a broad over-generalization.

Now, back on topic. I actually make heavy use of what amounts to structs  
containing arrays and don't want a deep copy >90% of the time. In fact, I  
haven't used any of the (my array-my array) deep copy functions and I only  
do copying when I have to convert my arrays to D style arrays or other  
container types. Also, D arrays are structs that contain C style arrays  
and I almost never dup them in final code. When I debug/prototype I do  
tend to use []= and dup more often. So I'd disagree that you'd almost  
always want to make a copy. Second, I don't think we should simplify the  
most common use case, but instead the majority of use cases. This is  
partly semantics, but partly not: everybody uses containers differently.

Also, Andrei, I would had preferred it if you had separated the discussion  
on arrays from Phobos containers. Many languages have arrays and a  
separate container library and I feel the decision of whether D's arrays  
should behave like Phobos' containers would be best left to after the best  
decision for Phobos is made.
April 30, 2009
Re: RFC: naming for FrontTransversal and Transversal ranges
On Thu, 30 Apr 2009 17:23:19 -0400, Rainer Deyke <rainerd@eldwood.com>  
wrote:
> Robert Jacques wrote:
>> On Thu, 30 Apr 2009 14:40:42 -0400, Rainer Deyke <rainerd@eldwood.com>
>> wrote:
>>> In addition, there's this: suppose you have a struct containing a
>>> (mutable) array.  When you make a copy of that struct, you will almost
>>> always want to make a copy of the contained array.  Therefore value
>>> semantics should be the default, because it simplifies the most common
>>> use case.
>>
>> That's what struct ctors/dtors/opAssign are for. Personally, I think
>> letting the struct designer decide the behaviour is better.
>
> I hope you're not suggesting writing ctor/dtor/opAssign for every single
> struct that uses arrays as value types.

I'm wasn't _suggesting_ anything. This is how D currently works, and is  
how any D library you're using that has structs that uses arrays as value  
types, works. Being surprised at my response gives me the feeling that  
you've never needed to code what you're talking about. If you have, your  
concrete example would be helpful as a use case in this discussion.

> It's possible to write a reference wrapper around a value type.  It's
> also possible to write a value wrapper around a reference type.
> However, the former is both easier and more efficient than the latter.

Yes and no. Yes, the deep copy mixin is more complex to write the first  
time, but after that it's easy to use. And second, it's just as efficient  
as the later, since the deep copy mixin generates the same code as a value  
type.
April 30, 2009
Re: RFC: naming for FrontTransversal and Transversal ranges
Robert Jacques wrote:
> Now, back on topic. I actually make heavy use of what amounts to structs 
> containing arrays and don't want a deep copy >90% of the time. In fact, 
> I haven't used any of the (my array-my array) deep copy functions and I 
> only do copying when I have to convert my arrays to D style arrays or 
> other container types. Also, D arrays are structs that contain C style 
> arrays and I almost never dup them in final code. When I debug/prototype 
> I do tend to use []= and dup more often. So I'd disagree that you'd 
> almost always want to make a copy. Second, I don't think we should 
> simplify the most common use case, but instead the majority of use 
> cases. This is partly semantics, but partly not: everybody uses 
> containers differently.

It must be a matter of style. Having used both semantics extensively, I 
personally think one never looks back after using value containers. When 
you fail to pass by reference there's a performance hit and a logic bug 
(changes are not effected). Both are local issues that manifest 
immediately, close to the place of the mistake, and are trivial to fix. 
But undue sharing is a global, long-distance issue - one much harder to 
tackle.

I agree that reference semantics is what people got used to in Java and 
other languages. For entity types, reference semantics make a ton of 
sense, but I think containers are not entity classes most of the time. 
But I also agree that through experience one can design around undue 
sharing. But it should be remembered that it took years to shake off 
security bugs in the JDK caused by undue sharing. (One that comes to 
mind was a sensitive linked list returned by a system API. User code 
could maliciously change it to mess with the system's security manager. 
Solution? They returned a compulsory copy of the list.)

So I'm not sure where this leaves us. I hear two good arguments:

* Some aren't bothered by reference semantics
* We are used to reference semantics from other languages

These are good arguments, but don't quite end the discussion.

> Also, Andrei, I would had preferred it if you had separated the 
> discussion on arrays from Phobos containers. Many languages have arrays 
> and a separate container library and I feel the decision of whether D's 
> arrays should behave like Phobos' containers would be best left to after 
> the best decision for Phobos is made.

*If* we add a new array built-in type, that will fundamentally affect 
Phobos because Phobos will design all containers to be like the pair 
array-slice. If we don't add a new built-in type, then I only have 
Phobos to discuss.

Also something that wasn't discussed that much is the connection of 
whatever design we devise, with the GC. I am mightily excited by the 
possibility to operate without GC or with tight reference counting, and 
I thought many around here would share that excitement. If we go for 
non-gc ways of memory management, that will probably affect container 
design.


Andrei
April 30, 2009
Re: RFC: naming for FrontTransversal and Transversal ranges
Andrei Alexandrescu wrote:
> A design that has one container type and several slice types that crawl
> it in various ways is very compelling, and I think it will be a huge
> selling point for D. It would be odd, not natural, to have arrays as a
> singularly distinct container that is in fact its own range. We get away
> with container == range for arrays partly because arrays are a simple
> structure, but that only blurs thinking and understanding of more
> complex containers. In fact, ranges do predict that any attempt to grow
> a range will cause topological issues in the owning container; that's
> why SList wisely (IMHO :o)) includes a template parameter that tells
> whether its topology is fixed or flexible.

It sounds like you’re suggesting that arrays become like all containers,
and that slices become a kind of range over the array.

Functions that now take arrays (which are possibly slices) and write to
them but don’t shrink or append to them, should therefore be rewritten
to take the range that covers the array.

B.T.W.:  What’s the syntax for a range that accesses the entire
container in index order?

—Joel Salomon
April 30, 2009
Re: RFC: naming for FrontTransversal and Transversal ranges
Rainer Deyke Wrote:

> Michel Fortin wrote:
> > On 2009-04-29 23:01:39 -0400, Rainer Deyke <rainerd@eldwood.com> said:
> >> Andrei Alexandrescu wrote:
> >>> 3. Reference semantics
> >> I'm strongly opposed to this option.  Either of the other options would
> >> be acceptable.
> > 
> > Andrei mentioned a couple of reasons against 3, which are yours?
> 
> I agree with all of Andrei's reasons.
> 
> In addition, there's this: suppose you have a struct containing a
> (mutable) array.  When you make a copy of that struct, you will almost
> always want to make a copy of the contained array.  Therefore value
> semantics should be the default, because it simplifies the most common
> use case.

I don't think copying an array member in a struct is as automatic as you imply. We really should figure out use cases that would or would not benefit from a copy. I'm sure we can find lazy range examples where copying would be a performance killer. Also, how deeply should copying go? Does it make sense to do a shallow copy of an array of reference types? What about passing an array into a function?
May 01, 2009
Re: RFC: naming for FrontTransversal and Transversal ranges
On Thu, 30 Apr 2009 15:57:17 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail@erdani.org> wrote:
> Robert Jacques wrote:
>> On Wed, 29 Apr 2009 20:45:23 -0400, Andrei Alexandrescu  
>> <SeeWebsiteForEmail@erdani.org> wrote:
>>>
>>> It has become increasingly clear we'll need to support arrays in
>>> addition to slices.
>>  No, Andrei it hasn't. A detailed paragraph (or more) explaining we you  
>> think so should be included in the full RFC.
>
> There are several converging pieces of evidence. Each can be debated in  
> separation, yet together they make for a rather strong case.
>
> People have complained about the incredibly bad, unsafe, and slow  
> behavior of appending. Even before the considerable recent slowdown of  
> ~=, there were speed concerns. Worse than that, the completely weird  
> behavior of slices that grow to stomp on other slices cannot be defended.

Actually, this is a problem due to an optimization in the current GC.  
Fixing it is rather trivial: simply store the maximum requested array end  
at either the beginning or end of the memory block.

> The idiom:
>
> array.length = n;
> array.length = 0;
> // append to array up to n...
>
> is rather disingenuous and looks odd to the casual reader.

My view is that this is an argument for a reserve function:
array.reserve = n;
// append to array up to n...

> As predicted by the "range theory" which predicates that a range will  
> never grow, only shrink, slices should always shrink. Growing is a  
> foreign operation for a slice. This is a big problem that goes beyond a  
> need for purity or cleanliness - it's just terrible to have a slice  
> stomping around, and then uncontrollably being rebound elsewhere and so  
> on. People pass slices into functions, functions modify them and also  
> append to them, and depending on the phase of the moon, some updates are  
> being seen in the caller.

I agree this is a problem, but the bugs are bugs in the implementation,  
not design. (see above)
But the bigger [ (x ~ y) may or may not create a new array ] resulting in  
either unseen intended updates or seen unintended updates affects both  
array and slice types.  Note that with the bugs fixed, this would only  
occur when concatenating to the logically complete array, which is why I  
say it affects both an array and slice types equally. Value semantics  
might help though.
I understand where you're coming from with "range theory", though I don't  
personally see that slices are impure: I view (x~y) as creating a  
logically new array and returning a view of it.

> So people have defined various types (such as ArrayCreator or  
> ArrayAppender or whatnot) that take care of that aspect. But really what  
> we're looking at is an Array type.

An Array type doesn't address the needs of ArrayCreator/ArrayAppender. The  
reason ArrayAppender exists is to optimize finding the array capacity.  
While this can be added to an array type, it can just as easily be added  
to a slice type. The main reason this enhancement got shot down long ago,  
was that the D community thought making arrays 3 words long, instead of 2,  
would result in a net performance loss. Personally, I'd love to see some  
real numbers proving or dis-proving this.

> Another issue is that we can get (partly) away with disowned slices  
> because of the garbage collector: whenever a slice is to be rebound,  
> whatever chunk it was bound to is left behind and will be duly  
> collected. If, however, we want to support other programming disciplines  
> that exercise stricter control over memory, the "parent" arrays must  
> become an explicit type that code can keep around and manipulate  
> directly.

I'm a little sad to hear you say this, as both Daniel Keep and I posted  
sample pseudo code on how to do slice types properly in both reference  
counted and manual managed memory systems. To reiterate:

struct T[]{
    T* ptr;
    size_t length;
    MemInfo* memInfo;
}

struct MemInfo {
    size_t refCount; //not included if not reference counting
    size_t capacity;
}

With meminfo being located at the start of the array's memory block. (i.e.  
com style)

> A design that has one container type and several slice types that crawl  
> it in various ways is very compelling, and I think it will be a huge  
> selling point for D. It would be odd, not natural, to have arrays as a  
> singularly distinct container that is in fact its own range. We get away  
> with container == range for arrays partly because arrays are a simple  
> structure, but that only blurs thinking and understanding of more  
> complex containers. In fact, ranges do predict that any attempt to grow  
> a range will cause topological issues in the owning container; that's  
> why SList wisely (IMHO :o)) includes a template parameter that tells  
> whether its topology is fixed or flexible.

I'd disagree with regard to a selling point/compelling. My lab maintains  
an open source C++ array/slice type pair as part of a larger robotics  
package. It's laid outed with an ArrayBase containing all the  
implementation. The Array type simply extends ArrayBase and adds memory  
management. And Slice extends Arraybase, only adding binding capabilities.  
And save for memory allocation, they are identical, i.e. the container and  
range are interchangeable.

> (Last but not least, .NET has arrays and slices. D's slices that  
> actually have an array testis mesh real bad with .NET because in .NET  
> you can't have a disowned slice.)

I've been following the blog. I was surprised that Cristian didn't just  
make all D arrays .NET ArraySegments and be done with it, but there's  
probably issues I don't know about. I'd like to hear more on what exactly  
the issue was.
May 01, 2009
Re: RFC: naming for FrontTransversal and Transversal ranges
Joel C. Salomon wrote:
> Andrei Alexandrescu wrote:
>> A design that has one container type and several slice types that crawl
>> it in various ways is very compelling, and I think it will be a huge
>> selling point for D. It would be odd, not natural, to have arrays as a
>> singularly distinct container that is in fact its own range. We get away
>> with container == range for arrays partly because arrays are a simple
>> structure, but that only blurs thinking and understanding of more
>> complex containers. In fact, ranges do predict that any attempt to grow
>> a range will cause topological issues in the owning container; that's
>> why SList wisely (IMHO :o)) includes a template parameter that tells
>> whether its topology is fixed or flexible.
> 
> It sounds like you’re suggesting that arrays become like all containers,
> and that slices become a kind of range over the array.

Yes. I think that that would be a good design.

> Functions that now take arrays (which are possibly slices) and write to
> them but don’t shrink or append to them, should therefore be rewritten
> to take the range that covers the array.

To clarify: T[] is a slice. It has one array testis in the form of the 
infamous ~=. And it has the problems that I described because of that 
gender ambiguity.

Most functions manipulate ranges (and therefore slices), not full array 
objects. So the addition of arrays would not be disruptive.

> B.T.W.:  What’s the syntax for a range that accesses the entire
> container in index order?

To get the "default" range for a container c, you write c[]. foreach 
automatically calls it. For a T[], asking a = b[] is an identity 
function. For a T[5], it happens to do the right thing.

Allow me to bring the std.range.SListRange example again, because it's a 
very good example outside of arrays. An approach used e.g. in LISP is to 
conflate lists as containers with lists as iterators in containers. So 
when you pass a list into a function it could not only look and change 
at the elements that the list contains, but it could rewire the nodes in 
the list any way it wants. The root of the list itself is a bit special 
because it is usually passed "by value" so the caller will hold the root 
even if the callee would want to even change that. This is not a huge 
trouble for LISP as lists are the focal data structure, but the behavior 
comes off as a bit odd for someone who'd like to manipulate lists and 
other structures in uniform ways.

So there comes the notion that the container is concerned with topology: 
how slots "sit" in memory, how they are arranged with respect to each 
other etc. The exact ways to manipulate topology and the tradeoffs 
involved are container-specific (for example it's cheap to prepend to a 
list but not an array). Ranges, on the other hand, do not allow 
topological changes - they only have uniform primitives for accessing 
the elements in containers, whereas they prevent any topological changes.

So a SListRange!(int, Topology.flexible) would be a container and offer 
topological changes such as insertAfter. When such a list is asked for a 
range (via c[]) it will return a SListRange!(int, Topology.fixed). That 
guy will know how to move about the host list, but would be unable to 
exact any changes to its topology. As such, SListRange!(int, 
Topology.fixed) is as useable as any other forward range.

Does any of this make sense? :o)


Andrei
May 01, 2009
Re: RFC: naming for FrontTransversal and Transversal ranges
Robert Jacques wrote:
> On Thu, 30 Apr 2009 15:57:17 -0400, Andrei Alexandrescu 
> <SeeWebsiteForEmail@erdani.org> wrote:
>> People have complained about the incredibly bad, unsafe, and slow 
>> behavior of appending. Even before the considerable recent slowdown of 
>> ~=, there were speed concerns. Worse than that, the completely weird 
>> behavior of slices that grow to stomp on other slices cannot be defended.
> 
> Actually, this is a problem due to an optimization in the current GC. 
> Fixing it is rather trivial: simply store the maximum requested array 
> end at either the beginning or end of the memory block.

How does an arbitrary slice get to the beginning/end of the memory 
block? I guess that's not a fat pointer anymore, it's an obese pointer.

> But the bigger [ (x ~ y) may or may not create a new array ]

x ~ y always creates a new array.

> I'm a little sad to hear you say this, as both Daniel Keep and I posted 
> sample pseudo code on how to do slice types properly in both reference 
> counted and manual managed memory systems.

No need to be sad. The code is obvious. What I was saying is that people 
actually need to get their hand on the container so they can control its 
scope.

>> A design that has one container type and several slice types that 
>> crawl it in various ways is very compelling, and I think it will be a 
>> huge selling point for D. It would be odd, not natural, to have arrays 
>> as a singularly distinct container that is in fact its own range. We 
>> get away with container == range for arrays partly because arrays are 
>> a simple structure, but that only blurs thinking and understanding of 
>> more complex containers. In fact, ranges do predict that any attempt 
>> to grow a range will cause topological issues in the owning container; 
>> that's why SList wisely (IMHO :o)) includes a template parameter that 
>> tells whether its topology is fixed or flexible.
> 
> I'd disagree with regard to a selling point/compelling. My lab maintains 
> an open source C++ array/slice type pair as part of a larger robotics 
> package. It's laid outed with an ArrayBase containing all the 
> implementation. The Array type simply extends ArrayBase and adds memory 
> management. And Slice extends Arraybase, only adding binding 
> capabilities. And save for memory allocation, they are identical, i.e. 
> the container and range are interchangeable.

How about a hashtable? A tree? Are the ranges crawling them identical 
with the containers? Not at all.

Arrays are probably the worst example to use as an archetype for a 
design because they are very simple. Pretty much any design looks a bit 
baroque when only applied to arrays.

>> (Last but not least, .NET has arrays and slices. D's slices that 
>> actually have an array testis mesh real bad with .NET because in .NET 
>> you can't have a disowned slice.)
> 
> I've been following the blog. I was surprised that Cristian didn't just 
> make all D arrays .NET ArraySegments and be done with it, but there's 
> probably issues I don't know about. I'd like to hear more on what 
> exactly the issue was.

We discussed at length. Without being an expert in .NET, I can't tell 
what the problem is, but I do know that if there was a simpler solution, 
Cristi would have probably found it.


Andrei
May 01, 2009
Re: RFC: naming for FrontTransversal and Transversal ranges
Andrei Alexandrescu:
> Does any of this make sense? :o)

Yes. I think problems aren't finished yet and more thinking is expected, but you are getting somewhere (I can tell it also because I am starting to understand. I am not bright. When even I understand it often means that the design is meaningful).
Regarding the foreach on the items I don't know/understand the syntactic need for the [] to scan.

Bye,
bearophile
May 01, 2009
Re: RFC: naming for FrontTransversal and Transversal ranges
bearophile wrote:
> Andrei Alexandrescu:
>> Does any of this make sense? :o)
> 
> Yes. I think problems aren't finished yet and more thinking is
> expected, but you are getting somewhere (I can tell it also because I
> am starting to understand. I am not bright.

Bright is only one :o).

> When even I understand it
> often means that the design is meaningful). Regarding the foreach on
> the items I don't know/understand the syntactic need for the [] to
> scan.

Oh, that's simple. A while ago people asked, how do I iterate a 
container (assuming the container is distinct from range). So I said, 
here's how I think:

Container!int c;
...
foreach (element; c.all)
{
   ...use element...
}

People said, that sucks! With opApply I don't need to append the .all 
thingie! So I told Walter to automatically call [] against the 
container. So people can now write:

foreach (element; c)
{
   ...use element...
}

All the container has to to is define opSlice() to return its "all" range.


Andrei
1 2 3 4 5 6 7
Top | Discussion index | About this forum | D home