May 01, 2009
On Thu, 30 Apr 2009 20:52:05 -0400, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:
> Robert Jacques wrote:
>> On Thu, 30 Apr 2009 15:57:17 -0400, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:
>>> People have complained about the incredibly bad, unsafe, and slow behavior of appending. Even before the considerable recent slowdown of ~=, there were speed concerns. Worse than that, the completely weird behavior of slices that grow to stomp on other slices cannot be defended.
>>  Actually, this is a problem due to an optimization in the current GC. Fixing it is rather trivial: simply store the maximum requested array end at either the beginning or end of the memory block.
>
> How does an arbitrary slice get to the beginning/end of the memory block? I guess that's not a fat pointer anymore, it's an obese pointer.

Okay, so you have:

struct T[]{
    T* ptr;
    size_t length;
    MemInfo* memInfo;
}

struct MemInfo {
    size_t refCount; //not included if not reference counting
    size_t capacity;
}

And MemInfo is located in the first 2 words of the memory block.
Then the start of the memory block == memInfo*.
The end of the memory block to (cast(byte)memInfo* ) + memInfo.capacity;
T* and T*+length both point to locations between between the memory start and end (i.e. between memInfo* and memInfo* + capacity/(2*size_t) )

In terms of size, it keeps the pointer down to 3 words, which is only slightly chubbier than the current arrays. And given at the minimum you have to store a pointer the the ref count, its no larger than it has to be. (Exception: Manual memory management could with separate array types, could strip off this extra word)

>> But the bigger [ (x ~ y) may or may not create a new array ]
>
> x ~ y always creates a new array.

Opps, sorry. x ~= y.

>> I'm a little sad to hear you say this, as both Daniel Keep and I posted sample pseudo code on how to do slice types properly in both reference counted and manual managed memory systems.
>
> No need to be sad. The code is obvious. What I was saying is that people actually need to get their hand on the container so they can control its scope.

Could you clarify something for me. Is this a low-level implementation argument or a high-level semantic argument you're making?

>>> A design that has one container type and several slice types that crawl it in various ways is very compelling, and I think it will be a huge selling point for D. It would be odd, not natural, to have arrays as a singularly distinct container that is in fact its own range. We get away with container == range for arrays partly because arrays are a simple structure, but that only blurs thinking and understanding of more complex containers. In fact, ranges do predict that any attempt to grow a range will cause topological issues in the owning container; that's why SList wisely (IMHO :o)) includes a template parameter that tells whether its topology is fixed or flexible.
>>  I'd disagree with regard to a selling point/compelling. My lab maintains an open source C++ array/slice type pair as part of a larger robotics package. It's laid outed with an ArrayBase containing all the implementation. The Array type simply extends ArrayBase and adds memory management. And Slice extends Arraybase, only adding binding capabilities. And save for memory allocation, they are identical, i.e. the container and range are interchangeable.
>
> How about a hashtable? A tree? Are the ranges crawling them identical with the containers? Not at all.
> Arrays are probably the worst example to use as an archetype for a design because they are very simple. Pretty much any design looks a bit baroque when only applied to arrays.

My disagreement was limited to the disagree with regard to a selling point/compelling. Otherwise, I agree.
hmm... thinking out loud
Slices of lists, could be lists. But its a recipe for leaks and other issues.
Slices of trees, could be trees. Although these 'slices' couldn't actually walk the tree. (i.e. aren't ranges) And re-balancing causes logical bug.
But hash tables, queues, dequeues, stacks, etc don't have nice logical slices.

>>> (Last but not least, .NET has arrays and slices. D's slices that actually have an array testis mesh real bad with .NET because in .NET you can't have a disowned slice.)
>>  I've been following the blog. I was surprised that Cristian didn't just make all D arrays .NET ArraySegments and be done with it, but there's probably issues I don't know about. I'd like to hear more on what exactly the issue was.
>
> We discussed at length. Without being an expert in .NET, I can't tell what the problem is, but I do know that if there was a simpler solution, Cristi would have probably found it.
>
>
> Andrei

May 01, 2009
Andrei, how do you do shared, lock-free containers as value types?


May 01, 2009
Rainer Deyke wrote:
> Christopher Wright wrote:
>> Rainer Deyke wrote:
>>> Don't forget:
>>>  + Supports timely destruction of contents (i.e. RAII).
>> These are collections. RAII doesn't matter when you have a garbage
>> collector and the only resource you have is memory, unless you're quite
>> strapped for memory.
> 
> It also doesn't matter if the resource deallocation pixie deallocates
> all my non-memory resources the moment they are no longer being used,
> which is about as likely as only having memory resources in arrays.

Which requires the container to know to delete everything that it references when it dies. If it is dealing with reference types, that is not the correct behavior. It seems awkward to me to deal with value types with destructors, but it makes sense for reference counting.
May 01, 2009
Robert Jacques wrote:
> On Thu, 30 Apr 2009 17:23:19 -0400, Rainer Deyke <rainerd@eldwood.com> wrote:
>> It's possible to write a reference wrapper around a value type.  It's also possible to write a value wrapper around a reference type. However, the former is both easier and more efficient than the latter.
> 
> Yes and no. Yes, the deep copy mixin is more complex to write the first time, but after that it's easy to use. And second, it's just as efficient as the later, since the deep copy mixin generates the same code as a value type.

A simple deep-copy struct wrapper around a class type is horribly
inefficient compared to a direct struct type:
  - It uses an extra heap allocation per instance.
  - It uses an extra level of indirection.
  - It uses virtual function calls.
  - It allocates an extra reference to a virtual function table per
instance.

I suppose a very clever wrapper metaprogram or a really clever optimizing compiler might eliminate some of these inefficiencies.


-- 
Rainer Deyke - rainerd@eldwood.com
May 01, 2009
Andrei Alexandrescu Wrote:

> Joel C. Salomon wrote:
> > Andrei Alexandrescu wrote:
> >> A design that has one container type and several slice types that crawl it in various ways is very compelling, and I think it will be a huge selling point for D. It would be odd, not natural, to have arrays as a singularly distinct container that is in fact its own range. We get away with container == range for arrays partly because arrays are a simple structure, but that only blurs thinking and understanding of more complex containers. In fact, ranges do predict that any attempt to grow a range will cause topological issues in the owning container; that's why SList wisely (IMHO :o)) includes a template parameter that tells whether its topology is fixed or flexible.
> > 
> > It sounds like you’re suggesting that arrays become like all containers, and that slices become a kind of range over the array.
> 
> Yes. I think that that would be a good design.
> 
> > Functions that now take arrays (which are possibly slices) and write to
> > them but don’t shrink or append to them, should therefore be rewritten
> > to take the range that covers the array.
> 
> To clarify: T[] is a slice. It has one array testis in the form of the infamous ~=. And it has the problems that I described because of that gender ambiguity.
> 
> Most functions manipulate ranges (and therefore slices), not full array objects. So the addition of arrays would not be disruptive.
> 
> > B.T.W.:  What’s the syntax for a range that accesses the entire container in index order?
> 
> To get the "default" range for a container c, you write c[]. foreach automatically calls it. For a T[], asking a = b[] is an identity function. For a T[5], it happens to do the right thing.
> 
> Allow me to bring the std.range.SListRange example again, because it's a very good example outside of arrays. An approach used e.g. in LISP is to conflate lists as containers with lists as iterators in containers. So when you pass a list into a function it could not only look and change at the elements that the list contains, but it could rewire the nodes in the list any way it wants. The root of the list itself is a bit special because it is usually passed "by value" so the caller will hold the root even if the callee would want to even change that. This is not a huge trouble for LISP as lists are the focal data structure, but the behavior comes off as a bit odd for someone who'd like to manipulate lists and other structures in uniform ways.
> 
> So there comes the notion that the container is concerned with topology: how slots "sit" in memory, how they are arranged with respect to each other etc. The exact ways to manipulate topology and the tradeoffs involved are container-specific (for example it's cheap to prepend to a list but not an array). Ranges, on the other hand, do not allow topological changes - they only have uniform primitives for accessing the elements in containers, whereas they prevent any topological changes.
> 
> So a SListRange!(int, Topology.flexible) would be a container and offer topological changes such as insertAfter. When such a list is asked for a range (via c[]) it will return a SListRange!(int, Topology.fixed). That guy will know how to move about the host list, but would be unable to exact any changes to its topology. As such, SListRange!(int, Topology.fixed) is as useable as any other forward range.
> 
> Does any of this make sense? :o)
> 
> 
> Andrei

fuck man u dance around da shit but never step in it. the clear example is a file. messin' with a file is diff from messin' with ranges that read from da file. forget the lists n shit. file is a real example coz u also have to be sure of lifetime n shit. i mean u wanna close it eh.
May 01, 2009
On Thu, 30 Apr 2009 23:18:13 -0400, Rainer Deyke <rainerd@eldwood.com> wrote:
> Robert Jacques wrote:
>> On Thu, 30 Apr 2009 17:23:19 -0400, Rainer Deyke <rainerd@eldwood.com>
>> wrote:
>>> It's possible to write a reference wrapper around a value type.  It's
>>> also possible to write a value wrapper around a reference type.
>>> However, the former is both easier and more efficient than the latter.
>>
>> Yes and no. Yes, the deep copy mixin is more complex to write the first
>> time, but after that it's easy to use. And second, it's just as
>> efficient as the later, since the deep copy mixin generates the same
>> code as a value type.
>
> A simple deep-copy struct wrapper around a class type is horribly
> inefficient compared to a direct struct type:

Reference semantics don't require classes. D arrays have reference semantics, but are implemented as structs. Phobos could do something similar.

>   - It uses an extra heap allocation per instance.

And during a deep copy, you're often making a lot of heap copies. (N vs N+1 when N>>1)

>   - It uses an extra level of indirection.
>   - It uses virtual function calls.

Not if the class is final.

>   - It allocates an extra reference to a virtual function table per
> instance.

Huh? Isn't that part of the extra heap allocation?

Also, a really important question is: is it possible to implement are shared, lock-free containers as value types?

May 01, 2009
On 2009-04-30 15:57:17 -0400, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> said:

> As predicted by the "range theory" which predicates that a range will never grow, only shrink, slices should always shrink. Growing is a foreign operation for a slice. This is a big problem that goes beyond a need for purity or cleanliness - it's just terrible to have a slice stomping around, and then uncontrollably being rebound elsewhere and so on. People pass slices into functions, functions modify them and also append to them, and depending on the phase of the moon, some updates are being seen in the caller.

I think D slices are a good concept for pointers to a fixed-size array. If you write new int[4], you're creating a fixed-size array of four ints, and getting a slice to the whole. If you concatenate it with some other array, it creates a new one.

As you like to point out, the fun starts when you have arrays of mutable elements and want to resize them. If you have an array of four elements and write array.length = 5, then you're perhaps creating a new one, disconnected from the previous one, and perhaps not. This pose no problem if your slice is the only one pointing to the memory block, otherwise the results are rather unpredictable.

Well, containers are no better at that. If you have some slices of a container and you grow the container you may or may not disconnect all your slices (unless you're creating an indirect slice as {container*, int start, int end} or course).

Which makes me think that perhaps we could introduce a unique type attribute and just forbid growing dynamic arrays unless they are flagged as unique. That'd make "unique T[]" behave as the container and "T[]" as the slice. And perhaps we could build that on top of the concept of uniqueness Bartosz wants to be expressible in D for the concurrency model, as he said in his latest post:

> In traditional type systems there is no way to express the requirement that a message passed by reference must either be a monitor itself or behave like a C++ unique_ptr (an object that leaves no aliases behind). This requirement should be expressible in D, allowing the compiler to check for its violations.

<http://bartoszmilewski.wordpress.com/2009/04/26/multithreading-with-d/>

-- 
Michel Fortin
michel.fortin@michelf.com
http://michelf.com/

May 01, 2009
On 2009-04-30 20:52:05 -0400, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> said:

>>> (Last but not least, .NET has arrays and slices. D's slices that actually have an array testis mesh real bad with .NET because in .NET you can't have a disowned slice.)
>> 
>> I've been following the blog. I was surprised that Cristian didn't just make all D arrays .NET ArraySegments and be done with it, but there's probably issues I don't know about. I'd like to hear more on what exactly the issue was.
> 
> We discussed at length. Without being an expert in .NET, I can't tell what the problem is, but I do know that if there was a simpler solution, Cristi would have probably found it.

As I understand it, the issue is that the .NET API generally doesn't expect ArraySegments, but Array, so Cristi wants T[] to be an array, not an array segment, in order to make the API usable without having to duplicate ArraySegments into Arrays every time. I'm doubtful having a standard container type will help that much, as T[] (the slice) still won't be accepted by the .NET API and the programmer would simply have to always use the container.

That the .NET framework wants to use containers everywhere instead of slices is not a reason for D to bend to .NET concept of arrays and slices.

But...

That makes me think that perhaps my concept of "unique T[]" as the container (see my previous post) might be interesting. Cristi could make the compiler keep "unique T[]" in an Array and implicitly convert it to an ArraySegment as soon as it becomes non-unique. That would mean the .NET API could accept regular D dynamic arrays, as long as they're unique.


-- 
Michel Fortin
michel.fortin@michelf.com
http://michelf.com/

May 01, 2009
On Fri, 01 May 2009 06:26:19 -0400, Michel Fortin <michel.fortin@michelf.com> wrote:

> On 2009-04-30 15:57:17 -0400, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> said:
>
>> As predicted by the "range theory" which predicates that a range will never grow, only shrink, slices should always shrink. Growing is a foreign operation for a slice. This is a big problem that goes beyond a need for purity or cleanliness - it's just terrible to have a slice stomping around, and then uncontrollably being rebound elsewhere and so on. People pass slices into functions, functions modify them and also append to them, and depending on the phase of the moon, some updates are being seen in the caller.
>
> I think D slices are a good concept for pointers to a fixed-size array. If you write new int[4], you're creating a fixed-size array of four ints, and getting a slice to the whole. If you concatenate it with some other array, it creates a new one.
>
> As you like to point out, the fun starts when you have arrays of mutable elements and want to resize them. If you have an array of four elements and write array.length = 5, then you're perhaps creating a new one, disconnected from the previous one, and perhaps not. This pose no problem if your slice is the only one pointing to the memory block, otherwise the results are rather unpredictable.
>
> Well, containers are no better at that. If you have some slices of a container and you grow the container you may or may not disconnect all your slices (unless you're creating an indirect slice as {container*, int start, int end} or course).
>
> Which makes me think that perhaps we could introduce a unique type attribute and just forbid growing dynamic arrays unless they are flagged as unique. That'd make "unique T[]" behave as the container and "T[]" as the slice. And perhaps we could build that on top of the concept of uniqueness Bartosz wants to be expressible in D for the concurrency model, as he said in his latest post:
>
>> In traditional type systems there is no way to express the requirement that a message passed by reference must either be a monitor itself or behave like a C++ unique_ptr (an object that leaves no aliases behind). This requirement should be expressible in D, allowing the compiler to check for its violations.
>
> <http://bartoszmilewski.wordpress.com/2009/04/26/multithreading-with-d/>
>

I think you might of misunderstood the concept of a unique/mobile type. If you take a slice or a reference of an unique, you null the source. Which makes using unique T logically invalid as a container. Also, shared unique T can't be converted to shared T. This incompatibility is the whole point of having a unique type in the first place, and negates your concept of using unique T as containers. And generally, unique T implies shared storage, and can not be used for thread-local arrays.

Lastly, what you're trying to fix is a bug in the implementation of arrays (see my posts in this thread) and not a bug in the design.
May 01, 2009
On Fri, 01 May 2009 06:45:20 -0400, Michel Fortin <michel.fortin@michelf.com> wrote:

> On 2009-04-30 20:52:05 -0400, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> said:
>
>>>> (Last but not least, .NET has arrays and slices. D's slices that actually have an array testis mesh real bad with .NET because in .NET you can't have a disowned slice.)
>>>  I've been following the blog. I was surprised that Cristian didn't just make all D arrays .NET ArraySegments and be done with it, but there's probably issues I don't know about. I'd like to hear more on what exactly the issue was.
>>  We discussed at length. Without being an expert in .NET, I can't tell what the problem is, but I do know that if there was a simpler solution, Cristi would have probably found it.
>
> As I understand it, the issue is that the .NET API generally doesn't expect ArraySegments, but Array, so Cristi wants T[] to be an array, not an array segment, in order to make the API usable without having to duplicate ArraySegments into Arrays every time. I'm doubtful having a standard container type will help that much, as T[] (the slice) still won't be accepted by the .NET API and the programmer would simply have to always use the container.
>
> That the .NET framework wants to use containers everywhere instead of slices is not a reason for D to bend to .NET concept of arrays and slices.

I agree. D shouldn't go borrowing bad design from other languages.