October 20, 2009
On Tue, 20 Oct 2009 21:01:17 +0400, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:

> grauzone wrote:
>> Steven Schveighoffer wrote:
>>> I still think having an Appender object or struct is a worthwhile thing, the "pre-allocate array then set length to zero" model is a hack at best.
>>  Would that work with Andrei's append cache at all? Setting the length to zero and then appending is like taking a slice of length 0 and then appending.
>>  Maybe introduce a write/readable .capacity property, that magically accesses the cache/GC?
>
> For my money, I'd get rid of that trick:
>
> a.length = 1000;
> a.length = 0;
> for (...) a ~= x;
>
>
> Andrei

I agree it's ugly but that's the best we have in D, and it looks like things are getting even worse...
October 20, 2009
On Tue, Oct 20, 2009 at 10:05 AM, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:
> Bill Baxter wrote:
>>
>> To Andrei, do you really feel comfortable trying to explain this in your book?  It seems like it will be difficult to explain that ~= is sometimes efficient for appending but not necessarily if you're working with a lot of arrays because it actually keeps this cache under the hood that may or may not remember the actual underlying capacity of the array you're appending to, so you should probably use ArrayBuilder if you can, despite the optimization.
>
> I guess I'll try and let you all know.

I can also see this becoming an Effective D tip --
"""
#23  Use ArrayBuilder for appending

For common cases appending to slices is fast.  However the performance
depends on a hidden LRU cache to remember the capacities of the most
recent N arrays.  This works fine until you hit that N limit.
Unfortunately as you compose code together it is easy to overflow that
cache without realizing it, leading to sudden performance drops for no
apparent reason.   Thus we suggest you always use ArrayBuilder when
appending to arrays rather than slices.
"""

Or not.  This is one of those places where some data is really needed.
 It may be that 99.9% of code is only actively appending to 4 arrays
or fewer.  It just seems too tricky that this innocent-looking code:

     int[] i;
     foreach(k; 1..20_000) {
             i ~= some_function(k);
     }

could hit a performance cliff based on how many arrays get used deep in the call chain of some_function().   Granted, cache issues can cause these kinds of cliffs for any kind of code, but I suspect this cliff would be particularly noticeable, given the slowness of allocations.

--bb
October 20, 2009
Andrei Alexandrescu wrote:
> grauzone wrote:
>> Steven Schveighoffer wrote:
>>> I still think having an Appender object or struct is a worthwhile thing, the "pre-allocate array then set length to zero" model is a hack at best.
>>
>> Would that work with Andrei's append cache at all? Setting the length to zero and then appending is like taking a slice of length 0 and then appending.
>>
>> Maybe introduce a write/readable .capacity property, that magically accesses the cache/GC?
> 
> For my money, I'd get rid of that trick:
> 
> a.length = 1000;
> a.length = 0;
> for (...) a ~= x;

Yes, that's what we currently use for setting the capacity of an array. And it looks stupid and non-intuitive; someone new to D might think it's a no-op.

Better way please?

> 
> Andrei
October 20, 2009
On Tue, 20 Oct 2009 12:30:57 -0400, Bill Baxter <wbaxter@gmail.com> wrote:

> On Tue, Oct 20, 2009 at 8:50 AM, Steven Schveighoffer
> <schveiguy@yahoo.com> wrote:
>> On Tue, 20 Oct 2009 11:10:20 -0400, Bill Baxter <wbaxter@gmail.com> wrote:
>>
>>> On Tue, Oct 20, 2009 at 6:25 AM, Steven Schveighoffer
>>> <schveiguy@yahoo.com> wrote:
>>>>
>>>> On Sun, 18 Oct 2009 17:05:39 -0400, Walter Bright
>>>> <newshound1@digitalmars.com> wrote:
>>>>
>>>>> The purpose of T[new] was to solve the problems T[] had with passing T[]
>>>>> to a function and then the function resizes the T[]. What happens with
>>>>> the
>>>>> original?
>>>>>
>>>>> The solution we came up with was to create a third array type, T[new],
>>>>> which was a reference type.
>>>>>
>>>>> Andrei had the idea that T[new] could be dispensed with by making a
>>>>> "builder" library type to handle creating arrays by doing things like
>>>>> appending, and then delivering a finished T[] type. This is similar to
>>>>> what
>>>>> std.outbuffer and std.array.Appender do, they just need a bit of
>>>>> refining.
>>>>>
>>>>> The .length property of T[] would then become an rvalue only, not an
>>>>> lvalue, and ~= would no longer be allowed for T[].
>>>>>
>>>>> We both feel that this would simplify D, make it more flexible, and
>>>>> remove
>>>>> some awkward corner cases like the inability to say a.length++.
>>>>>
>>>>> What do you think?
>>>>
>>>> At the risk of sounding like bearophile -- I've proposed 2 solutions in
>>>> the
>>>> past for this that *don't* involve creating a T[new] type.
>>>>
>>>> 1. Store the allocated length in the GC structure, then only allow
>>>> appending
>>>> when the length of the array being appended matches the allocated length.
>>>>
>>>> 2. Store the allocated length at the beginning of the array, and use a
>>>> bit
>>>> in the array length to determine if it starts at the beginning of the
>>>> block.
>>>>
>>>> The first solution has space concerns, and the second has lots more
>>>> concerns, but can help in the case of having to do a GC lookup to
>>>> determine
>>>> if a slice can be appended (you'd still have to lock the GC to do an
>>>> actual
>>>> append or realloc).  I prefer the first solution over the second.
>>>>
>>>> I like the current behavior *except* for appending.  Most of the time it
>>>> does what you want, and the syntax is beautiful.
>>>>
>>>> In regards to disallowing x ~= y, I'd propose you at least make it
>>>> equivalent to x = x ~ y instead of removing it.
>>>
>>> If you're going to do ~= a lot then you should convert to the dynamic
>>> array type.
>>> If you're not going to do ~= a lot, then you can afford to write out x = x
>>> ~ y.
>>>
>>> The bottom line is that it just doesn't make sense to append onto a
>>> "view" type.  It's really a kind of constness.  Having a view says the
>>> underlying memory locations you are looking at are fixed.  It doesn't
>>> make sense to imply there's an operation that can change those memory
>>> locations (other than shrinking the window to view fewer of them).
>>
>> Having the append operation extend into already allocated memory is an
>> optimization.  In this case, it's an optimization that can corrupt memory.
>>
>> If we can make append extend into already allocated memory *and* not cause
>> corruption, I don't see the downside.  And then there is one less array type
>> to deal with (, create functions that handle, etc.).
>>
>> Besides, I think Andrei's LRU solution is better than mine (and pretty much
>> in line with it).
>>
>> I still think having an Appender object or struct is a worthwhile thing, the
>> "pre-allocate array then set length to zero" model is a hack at best.
>
> But you still have the problem Andrei posted.  Code like this:
>
> void func(int[] x)
> {
>      x ~= 3;
>      x[0] = 42;
> }

depending on what you want, you then rewrite:

> void func(int[] x)
> {
>      x[0] = 42;
>      x ~= 3;
> }

or


> void func(int[] x)
> {
>      x = x ~ 3;
>      x[0] = 42;
> }

Generally when you are appending, you are not also changing the original data, so you don't care whether it's an optimization or not.

Your code is obviously broken anyways, since *nobody* ever sees the 3.

> it'll compile and maybe run just fine, but there's no way to know if
> the caller will see the 42 or not.   Unpredictable behavior like that
> is breeding grounds for subtle bugs.

I'm sure we could spend days coming up with code that introduces subtle bugs.  You can't fix all bugs that people may write.  I don't think your scenario is very likely.

More importantly, the problem with the current appending behavior is this:

void foo(int[] x)
{
  x ~= 3;
  ...
}

That may have just corrupted some data that you don't own, so defensively, you should write:

void foo(int[] x)
{
  x = x ~ 3;
  ...
}

But with Andrei's solution, you cannot possibly corrupt data with this line.  Now, if you then go and set one of the values in the original array (like you did), then you may or may not change the original array.  But as the function takes a mutable array, *you own the array* so it is a mistake to think when you pass in an array that's not const, you should expect it to remain unchanged.  If your goal is to affect the original array, then you should accept a ref argument or not append to it.

-Steve
October 20, 2009
Steven Schveighoffer wrote:
> If your goal is to affect the original array, then you should accept a ref argument or not append to it.

I think that's an entirely reasonable (and easy to explain) stance.

Andrei
October 20, 2009
On Tue, Oct 20, 2009 at 11:30 AM, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:
> Steven Schveighoffer wrote:
>>
>> If your goal is to affect the original array, then you should accept a ref argument or not append to it.
>
> I think that's an entirely reasonable (and easy to explain) stance.

I've definitely spent time tracking down exactly such bugs, where I meant to make the argument a ref but didn't.  If the above is to be the official stance, then I think it should be enforced by the compiler.  Appending to non-ref slice args should be an error.

--bb
October 20, 2009
On Tue, 20 Oct 2009 14:43:16 -0400, Bill Baxter <wbaxter@gmail.com> wrote:

> On Tue, Oct 20, 2009 at 11:30 AM, Andrei Alexandrescu
> <SeeWebsiteForEmail@erdani.org> wrote:
>> Steven Schveighoffer wrote:
>>>
>>> If your goal is to affect the original array, then you should accept a ref
>>> argument or not append to it.
>>
>> I think that's an entirely reasonable (and easy to explain) stance.
>
> I've definitely spent time tracking down exactly such bugs, where I
> meant to make the argument a ref but didn't.  If the above is to be
> the official stance, then I think it should be enforced by the
> compiler.  Appending to non-ref slice args should be an error.

Except when you are passing ownership of an array.  Basically, there are four modes:

T[] x : a) you are passing ownership to the function, and the array might not be deterministically usable after the function returns, e.g. T[] padAndCapitalize(T[]).  Usually such functions' return values are deterministic, and the focus of what you care about.

  -or-
        b) you are lending ownership to the function, only the array elements will be altered, e.g. replace().

ref T[] x : You are lending ownership of the array to the function, but you get ownership back, e.g. push().  Fully deterministic altering.

const(T)[] x : You retain ownership of the array, the function cannot alter it, e.g. toUpper().

So it's not possible to flag on the signature between the first a) and b) modes, we must currently rely on documentation.  But the "append and modify" routines are pretty rare.

Essentially, what you want is a type where the length is const, but the data is mutable.  I think creating such a type should be possible in the library.

But fixing the append problem alone is a huge step forward, since all of these cases could result in corruption of totally unrelated data if the arrays were appended to.  Especially the const version is bad.

-Steve
October 30, 2009
I've just caught up on this thread.  Built-in dynamic arrays have been one of D's major features since the early days.  Now you're planning to remove this feature?

http://www.digitalmars.com/d/1.0/builtin.html
states that the C++ STL has many types that have been created to compensate for the limitations of the built-in array type, and the power of D's built-in arrays largely eliminates the need for these.  So now you're suggesting that we do away with this power, and create these library types that the point was to avoid?

Walter Bright wrote:
<snip>
> The .length property of T[] would then become an rvalue only, not an lvalue, and ~= would no longer be allowed for T[].

I thought you were already moving that functionality into T[new].

I think T[new] versus T[] is actually a good design - it makes for a form of array length constancy as well as possibly getting rid of such nasties as bug 2093.

> We both feel that this would simplify D, make it more flexible, and remove some awkward corner cases like the inability to say a.length++.

I must've missed the discussion - what's wrong with fixing such expressions to work in terms of property setters/getters in the natural way?

Stewart.
1 2 3 4 5 6 7 8 9
Next ›   Last »