April 26, 2009
On Sat, 25 Apr 2009 19:11:57 -0400, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:

> Robert Jacques wrote:
>> On Sat, 25 Apr 2009 18:04:32 -0400, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:
>> Also, I see a major logical problem with your proposal:
>> Given     A single owner is required for deterministic destruction.
>> Then      The array's owner is responsible for the deterministic destruction.
>> Therefore When the array's owner goes out of scope, it must destruct the array.
>> But Then  How does a function return an array?
>
> My understanding is that you don't know how C++ and D value semantics work. It would be impossible to answer your (albeit good) question without first writing a tutorial on the topic, which unfortunately I don't have the time for.

I know how value semantics work. But
1) You haven't mentioned that T[new] has value semantics, only that you were thinking about it.
2) Struct destructors are always called and don't get optimized away even if the memory copy does. (i.e. I know there's a nice optimization to get out of the extra copy, but I don't know how you'd do it in DMD currently).

>
>
> Andrei

April 26, 2009
On Sat, 25 Apr 2009 19:44:18 -0400, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:
> Christopher Wright wrote:
>> Simple solution: put the array definition in object.d and try implementing arrays with reference counting or manual memory management.
>>  I think stored slices break manual memory management, even with a dedicated slice type; but they should work just fine with refcounting. If you don't want to change the language, object.Array will have to implement the logic for slices and for allocated arrays. It's a bit ugly, and it makes the Array type larger. Also, Array's reference count would need to be accessed by reference.
>
> There are three schemes that are good starting points:
>
> 1. As right now :o).
>
> 2. Using refcounting. Arrays will be 4 words long (begin, end, end-of-store, refcount*) and slices will be 3 words long (begin, end, and owner*)

Or, use (begin, length, store*) for both arrays and slices and have the store contain (start, capacity, refcount).

> 3. Using manual management. Arrays will be 3 words long (begin, end, end-of-store) and slices will be 2 words long (begin, end). This is close to C++/STL. This case is less safe but very efficient.

Why store end-of-store? It's trivial to compute from the length. (If you have separate slice and array types, that is)

> If we manage to integrate them all... that's quite the holy grail. And I think it's entirely possible with only a few changes to the language.
>
>
> Andrei

April 26, 2009

Andrei Alexandrescu wrote:
> It looks we can't make it with only T[]. We need a genuine container type, and T[new] was suggested. It would probably have value semantics.
> 
> T[U] seems to have the same problem. If T[U] is the range, then how do you call the container?
> 
> If we follow through with a no-garbage-collected option for the core types, then we also need to distinguish between a slice and its container. The fact that we (almost) got away with T[] being at the same time the container and the slice was that the garbage collector would collect all unused slices.
> 
> Andrei

I've got a few problems understanding this thread.

1. What exactly do you mean by "genuine container type"?  That says to me there's some ISO standard or something arrays don't conform to.  I saw earlier you specify the lack of a capacity field; is that really it?

   If so, why not call it a capacity-aware array (or capacitive array :P).

2. I don't see why you want these to have value semantics.  What's the motivation here.  Note that "because we need it" isn't a valid answer; a link to an explanation would be great, if one exists.  :)

3. I'm really uncomfortable with the idea that the semantics of slices changes when you switch off the GC.  That means you have a valid program before, you throw -gc, and suddenly you get inscrutable crashes.

As for the names, maybe we've got this backwards.  What about these:

T[]      -- a freshly created array
T[..]    -- a slice of an array
T[U]     -- a freshly created associative array
T[U..]   -- a slice of an associative array

Otherwise, creating a new array would be (new T[new]) which looks
stupid.  That, or it's (new T[x]) and the result type is (T[new]) which
is inconsistent with every other type in the language.

Also, this means static arrays stay as T[n].

On another topic, you mentioned that (using your notation again) T[new] would have four words: begin, end (but I liked length! :( ), end-of-capacity and refcount*.  Could you pull a COM trick and stick the capacity and refcount out the front of the array data?

  [ ? ] [ 4 ] [ 1 ] [ 2 ] [...] [...]
    ^     ^     ^           ^
    |     |     |           |
    |     |   begin        end
    |  capacity
 refcount

That allows T[new] to stay 8 bytes, and it also avoids needing a second allocation for the refcount (which should have the same lifetime as the array itself).

  -- Daniel
April 26, 2009
On Sat, 25 Apr 2009 20:17:00 -0400, Unknown W. Brackets <unknown@simplemachines.org> wrote:

> Let me say it a different way:
>
> split() returns results that are, often, stored or compared.  It is not often that these results are concatenated to or modified.  Possibly, they may not be sliced (further) often either.
>
> Such a case represents a perfect example of an array (or string) that need not have a greater capacity than its length.  Indeed; no property of the array is likely to change - its length, contents, capacity, or even ptr.  All are likely to remain constant until the object is deleted from memory.
>
> So, it would seem to me, this return value (as a quick example) could easily be changed to one that fits the following:
>
> 1. Is an array of data.
> 2. Does not have mutable contents (data.)
> 3. Does not allow its length or capacity to be extended (excuse me for considering this mutation.)
> 4. Is a reference, rather than a static array.
>
> My suggestion is that "char[]" should _be_ a StringBuilder, and "int[]" should _be_ an ArrayBuilder, and we should adopt a separate syntax for something that _isn't_ one of those.
>
> That said, I did just browse some of my D source files - an XML 1.0 parser/tree and a custom rfc-compliant ftp server - and nearly the only concatenation I use is building exception strings.  I'm a little surprised.  Only a couple times do I do anything a builder would be useful for.
>
> One time is on XML child nodes; since I'm building a list of children, it is a "builder" type thing.  However, my XML tree is fully incremental - that means you could traverse it before all of the data has been read from the net.  It would not be practical for me to use a builder that didn't have proper read access for this case.
>
> I really think using a "StringBuilder" class is just like the solution of using a "String" class.  D, in my opinion, proved the latter was not necessary; I hold that the former isn't either.
>
> -[Unknown]

Okay, I understand where you're coming from. Here are some counter-points
1) Immutable strings are often concatenated, which you don't address
2) int[], real[], and basically anything not a string rarely concatenates, but often mutable, which you don't address
3) Rather large thread a while ago concluded the builder overhead was too high for general use.
4) You're proposal makes immutable char[] and char[] have different underlying representations.
April 26, 2009
On Sat, 25 Apr 2009 21:14:28 -0400, Daniel Keep <daniel.keep.lists@gmail.com> wrote:
> Andrei Alexandrescu wrote:
>> It looks we can't make it with only T[]. We need a genuine container
>> type, and T[new] was suggested. It would probably have value semantics.
>>
>> T[U] seems to have the same problem. If T[U] is the range, then how do
>> you call the container?
>>
>> If we follow through with a no-garbage-collected option for the core
>> types, then we also need to distinguish between a slice and its
>> container. The fact that we (almost) got away with T[] being at the same
>> time the container and the slice was that the garbage collector would
>> collect all unused slices.
>>
>> Andrei
>
> I've got a few problems understanding this thread.
>
> 1. What exactly do you mean by "genuine container type"?  That says to
> me there's some ISO standard or something arrays don't conform to.  I
> saw earlier you specify the lack of a capacity field; is that really it?
>
>    If so, why not call it a capacity-aware array (or capacitive array :P).

From a Model-View-Container standpoint, D's arrays mix both View and Container. Many think this is a good thing, but there is an argument for separating the views (slices) from the container (arrays). The whole capacity thing is separate from this issue and exists because 1) it's fixed in a free-list based allocator. 2) it's somewhat slow to query from the GC and 3) concatenating arrays, mainly when building strings, does it a lot.
P.S. Great names.

> 2. I don't see why you want these to have value semantics.  What's the
> motivation here.  Note that "because we need it" isn't a valid answer; a
> link to an explanation would be great, if one exists.  :)

It hasn't been explained so far. (Although there are some arguments for more general application of value semantics in older threads)

> 3. I'm really uncomfortable with the idea that the semantics of slices
> changes when you switch off the GC.  That means you have a valid program
> before, you throw -gc, and suddenly you get inscrutable crashes.

Actually, I'm pretty sure Andrei is proposing changing the semantics for both GC and no GC.

> As for the names, maybe we've got this backwards.  What about these:
>
> T[]      -- a freshly created array
> T[..]    -- a slice of an array
> T[U]     -- a freshly created associative array
> T[U..]   -- a slice of an associative array
>
> Otherwise, creating a new array would be (new T[new]) which looks
> stupid.  That, or it's (new T[x]) and the result type is (T[new]) which
> is inconsistent with every other type in the language.
>
> Also, this means static arrays stay as T[n].
>
> On another topic, you mentioned that (using your notation again) T[new]
> would have four words: begin, end (but I liked length! :( ),
> end-of-capacity and refcount*.  Could you pull a COM trick and stick the
> capacity and refcount out the front of the array data?
>
>   [ ? ] [ 4 ] [ 1 ] [ 2 ] [...] [...]
>     ^     ^     ^           ^
>     |     |     |           |
>     |     |   begin        end
>     |  capacity
>  refcount
>
> That allows T[new] to stay 8 bytes, and it also avoids needing a second
> allocation for the refcount (which should have the same lifetime as the
> array itself).

Or alternatively, both slices and arrays could store a pointer to the (refount, capacity) so you wouldn't need separate types.
April 26, 2009
1. Well, I think that immutable(T[]) ~ immutable(T[]) should be possible.  I thought it was.  Just that you cannot _append_ to immutable(T[]) which completely makes sense.

2. I don't know that it rarely concatenates.  I took Calc, but I am not a mathematician... mostly I use char and object references in my arrays.  Occasionally int[], and maybe real[] but it's not likely.

So, yes, I probably am not properly addressing this.  Frankly I do not know the use-cases where you would mutate such an array.

For objects, if you don't change the array length, it's unlikely you'll change its contents (pointers.)  Unfortunately, the referenced object would need to be mutable in many cases (although this is probably less than one would think offhand for the case of arrays of already-initialized objects.)

3. Okay.  I think for a multi-purpose builder, that's true.  However, speaking only of having an extended capacity (as is currently and as I was quoting), and not other features of a builder, I think what I say does make sense.

4. I don't think so.  Again, I'm not talking about a full-fledged builder (although I think more features of one would be possible without too much difference except fake methods), just about features.

So, to recap, I'm really talking about not over-allocating immutable arrays, but I'm not talking about changing current arrays.  Does that make my comments make more sense?

-[Unknown]


Robert Jacques wrote:
> On Sat, 25 Apr 2009 20:17:00 -0400, Unknown W. Brackets <unknown@simplemachines.org> wrote:
> 
>> Let me say it a different way:
>>
>> split() returns results that are, often, stored or compared.  It is not often that these results are concatenated to or modified.  Possibly, they may not be sliced (further) often either.
>>
>> Such a case represents a perfect example of an array (or string) that need not have a greater capacity than its length.  Indeed; no property of the array is likely to change - its length, contents, capacity, or even ptr.  All are likely to remain constant until the object is deleted from memory.
>>
>> So, it would seem to me, this return value (as a quick example) could easily be changed to one that fits the following:
>>
>> 1. Is an array of data.
>> 2. Does not have mutable contents (data.)
>> 3. Does not allow its length or capacity to be extended (excuse me for considering this mutation.)
>> 4. Is a reference, rather than a static array.
>>
>> My suggestion is that "char[]" should _be_ a StringBuilder, and "int[]" should _be_ an ArrayBuilder, and we should adopt a separate syntax for something that _isn't_ one of those.
>>
>> That said, I did just browse some of my D source files - an XML 1.0 parser/tree and a custom rfc-compliant ftp server - and nearly the only concatenation I use is building exception strings.  I'm a little surprised.  Only a couple times do I do anything a builder would be useful for.
>>
>> One time is on XML child nodes; since I'm building a list of children, it is a "builder" type thing.  However, my XML tree is fully incremental - that means you could traverse it before all of the data has been read from the net.  It would not be practical for me to use a builder that didn't have proper read access for this case.
>>
>> I really think using a "StringBuilder" class is just like the solution of using a "String" class.  D, in my opinion, proved the latter was not necessary; I hold that the former isn't either.
>>
>> -[Unknown]
> 
> Okay, I understand where you're coming from. Here are some counter-points
> 1) Immutable strings are often concatenated, which you don't address
> 2) int[], real[], and basically anything not a string rarely concatenates, but often mutable, which you don't address
> 3) Rather large thread a while ago concluded the builder overhead was too high for general use.
> 4) You're proposal makes immutable char[] and char[] have different underlying representations.
April 26, 2009
(and I know I said "char[] should _be_ a StringBuilder", which probably caused the confusion - sorry.  I was trying to differentiate the two, I worded it poorly for what I actually meant.)

-[Unknown]


Unknown W. Brackets wrote:
> 1. Well, I think that immutable(T[]) ~ immutable(T[]) should be possible.  I thought it was.  Just that you cannot _append_ to immutable(T[]) which completely makes sense.
> 
> 2. I don't know that it rarely concatenates.  I took Calc, but I am not a mathematician... mostly I use char and object references in my arrays.  Occasionally int[], and maybe real[] but it's not likely.
> 
> So, yes, I probably am not properly addressing this.  Frankly I do not know the use-cases where you would mutate such an array.
> 
> For objects, if you don't change the array length, it's unlikely you'll change its contents (pointers.)  Unfortunately, the referenced object would need to be mutable in many cases (although this is probably less than one would think offhand for the case of arrays of already-initialized objects.)
> 
> 3. Okay.  I think for a multi-purpose builder, that's true.  However, speaking only of having an extended capacity (as is currently and as I was quoting), and not other features of a builder, I think what I say does make sense.
> 
> 4. I don't think so.  Again, I'm not talking about a full-fledged builder (although I think more features of one would be possible without too much difference except fake methods), just about features.
> 
> So, to recap, I'm really talking about not over-allocating immutable arrays, but I'm not talking about changing current arrays.  Does that make my comments make more sense?
> 
> -[Unknown]
> 
> 
> Robert Jacques wrote:
>> On Sat, 25 Apr 2009 20:17:00 -0400, Unknown W. Brackets <unknown@simplemachines.org> wrote:
>>
>>> Let me say it a different way:
>>>
>>> split() returns results that are, often, stored or compared.  It is not often that these results are concatenated to or modified.  Possibly, they may not be sliced (further) often either.
>>>
>>> Such a case represents a perfect example of an array (or string) that need not have a greater capacity than its length.  Indeed; no property of the array is likely to change - its length, contents, capacity, or even ptr.  All are likely to remain constant until the object is deleted from memory.
>>>
>>> So, it would seem to me, this return value (as a quick example) could easily be changed to one that fits the following:
>>>
>>> 1. Is an array of data.
>>> 2. Does not have mutable contents (data.)
>>> 3. Does not allow its length or capacity to be extended (excuse me for considering this mutation.)
>>> 4. Is a reference, rather than a static array.
>>>
>>> My suggestion is that "char[]" should _be_ a StringBuilder, and "int[]" should _be_ an ArrayBuilder, and we should adopt a separate syntax for something that _isn't_ one of those.
>>>
>>> That said, I did just browse some of my D source files - an XML 1.0 parser/tree and a custom rfc-compliant ftp server - and nearly the only concatenation I use is building exception strings.  I'm a little surprised.  Only a couple times do I do anything a builder would be useful for.
>>>
>>> One time is on XML child nodes; since I'm building a list of children, it is a "builder" type thing.  However, my XML tree is fully incremental - that means you could traverse it before all of the data has been read from the net.  It would not be practical for me to use a builder that didn't have proper read access for this case.
>>>
>>> I really think using a "StringBuilder" class is just like the solution of using a "String" class.  D, in my opinion, proved the latter was not necessary; I hold that the former isn't either.
>>>
>>> -[Unknown]
>>
>> Okay, I understand where you're coming from. Here are some counter-points
>> 1) Immutable strings are often concatenated, which you don't address
>> 2) int[], real[], and basically anything not a string rarely concatenates, but often mutable, which you don't address
>> 3) Rather large thread a while ago concluded the builder overhead was too high for general use.
>> 4) You're proposal makes immutable char[] and char[] have different underlying representations.
April 26, 2009
Robert Jacques wrote:
> 1) Immutable strings are often concatenated, which you don't address

There are two types of concatenation: in-place (operator ~=) and not
(operator ~).  The former can take advantage of capacity to avoid new
allocation, the latter can not.

When concatenating a lot of immutable strings, it might make sense to make the left side mutable:

char[] tmp = string1.dup;
tmp ~= string2;
tmp ~= string3;
// ...
tmp ~= string99;
string result = tmp.idup;

> 2) int[], real[], and basically anything not a string rarely concatenates, but often mutable, which you don't address

I frequently append individual elements to non-string arrays.


-- 
Rainer Deyke - rainerd@eldwood.com
April 26, 2009
So, how often do you have an int array that you do not intend to append elements to, but yet intend to change the values within it?  Excluding preallocation, of course.

Every time I use an int[], or honestly practically any array, I'm either creating it or reading it.  Nonetheless, I don't do much math with D (I'm a server guy), so it's interesting for me what uses other people have.

-[Unknown]


Rainer Deyke wrote:
>> 2) int[], real[], and basically anything not a string rarely
>> concatenates, but often mutable, which you don't address
> 
> I frequently append individual elements to non-string arrays.
> 
> 
April 26, 2009
On Sat, 25 Apr 2009 08:07:52 -0500, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:

>It looks we can't make it with only T[]. We need a genuine container type, and T[new] was suggested. It would probably have value semantics.
>
>T[U] seems to have the same problem. If T[U] is the range, then how do you call the container?
>
>If we follow through with a no-garbage-collected option for the core types, then we also need to distinguish between a slice and its container. The fact that we (almost) got away with T[] being at the same time the container and the slice was that the garbage collector would collect all unused slices.
>
>
>Andrei

T![] T![U] (kidding)
T[:] T[:U]

FWIW, Qt uses by-value containers with COW and it looks like they are ok for many usages. I'd like to see a std.containers module defining containers with semantics based on a policy:

enum CopyPolicy
{
   Always,  // by value
   OnWrite, // by value with COW
   Never      // by reference
}

Then we could use the most appropriate policy for built-in arrays (I think it should be by-value with COW).

module object;
import std.containers;

template Array(T)
{
    alias CoolArray!(T, CopyPolicy.OnWrite) Array;
}

and still be able to use containers with different policies when necessary.