June 07, 2009
在 Sun, 07 Jun 2009 12:59:39 +0800,Sean Kelly <sean@invisibleduck.org> 写道:

> Steve Schveighoffer wrote:
>> On Sat, 06 Jun 2009 12:03:03 -0700, Sean Kelly wrote:
>>
>>> bearophile wrote:
>>>> Sean Kelly:
>>>>> Particularly in D2 where append
>>>>> operations on arrays are probably less common as a result of string
>>>>> being invariant.
>>>> They aren't much common maybe because they are currently dead-slow.
>>>> Appending to an immutable string is a common operation. But I guess
>>>> Array appenders will get more common...
>>> Yes but appending to an immutable string is never performed in place,
>>> which is the only time the extra space reserved by newCapacity matters.
>>>   I suspect the memory wasted by newCapacity is more of an issue than
>>> any time savings it provides.
>>  What gave you that idea?
>>  void main()
>> {
>>   auto str1 = "hello".idup;
>>   auto str2 = str1;
>>   str1 ~= "world";
>>   assert(str1.ptr == str2.ptr);
>> }
>
> auto str1 = "hello".idup;
> auto str2 = str3 = str1;
> str2 ~= " world";
> str3 ~= " garbage";
>
> Doesn't seem terribly safe to me.

Oh, file a bug report! you find the bug!

-- 
使用 Opera 革命性的电子邮件客户程序: http://www.opera.com/mail/
June 07, 2009
On 2009-06-07 04:07:45 +0200, Sean Kelly <sean@invisibleduck.org> said:

> Fawzi Mohamed wrote:
>> On 2009-06-06 21:07:40 +0200, Sean Kelly <sean@invisibleduck.org> said:
>> 
>>> Vladimir Panteleev wrote:
>>>> On Sat, 06 Jun 2009 20:19:45 +0300, Sean Kelly <sean@invisibleduck.org>
>>>> wrote:
>> [...]
>>>> 
>>>>> I've been debating for a while whether newCapacity shoulds exist at all.   The GC already tends to leave free space at the end of blocks, so why should the runtime second-guess it?
>>>> 
>>>> Do you mean at the end of pages, or pools?
>>> 
>>> Blocks.  Blocks less than 4k in the current allocator are sized in powers of two, so array appends already get the "double in size" behavior in Java even without newCapacity.  newCapacity just has the potential to throw an array into the next larger block early, thus potentially wasting more space if the array will never be appended to. I think newCapacity isn't supposed to reserve extra space for blocks larger than 4k, but it's been a while since I've looked at it.
>> 
>> at the moment the behavior is exactly the opposite (leaving the small array to the GC handling you just sketched)), only array larger than a page have the special treatment, I think that the idea is that appending to large arrays can be potentially very expensive, so those get the special treatment.
> 
> Hm.  I still think we'd be better off letting the user handle this.  If they're going to be appending and performance is important then they'll use an ArrayAppender anyway.  I'd rather not have extra space tacked onto the end of every array I create "just in case" I decide to append to it.

well it isn't so bad, newCapacity is used only when the use *is* adding in place, and the array is not large enough.

Fawzi

June 07, 2009
bearophile wrote:

>> The following code even crashes LDC during the compilation, I'll ask in the LDC channel:<
> 
> The good ChristianK has already added it: http://www.dsource.org/projects/ldc/ticket/320

And thankfully Frits van Bommel has already fixed it: it consumes about 40kb of heap memory at runtime now.

This seems to be because we don't use the _d_arrayappend functions at all but emit a _d_arraysetlength instead. What's the memory growing behavior of that function? Should this be considered a bug?


June 07, 2009
On 2009-06-07 02:23:47 +0200, Lionello Lunesu <lio@lunesu.remove.com> said:

> Vladimir Panteleev wrote:
>> On Sat, 06 Jun 2009 18:12:58 +0300, Jarrett Billingsley <jarrett.billingsley@gmail.com> wrote:
>> 
>>> On Sat, Jun 6, 2009 at 8:03 AM, Vladimir
>>> Panteleev<thecybershadow@gmail.com> wrote:
>>>> // Works for DMD1/Phobos, DMD1/Tango and DMD2/Phobos
>>>> version(Tango) import tango.io.Console;
>>>> else           import std.stdio;
>>>> 
>>>> struct S
>>>> {
>>>>        ubyte[40_000] data;
>>>> }
>>>> 
>>>> void main()
>>>> {
>>>>        S[] a;
>>>>        a ~= S();
>>>> 
>>>>        // QUESTION: How much memory will this program consume upon reaching
>>>> this point?
>>>>        version(Tango) Cin.copyln();
>>>>        else           readln();
>>>> }
>>>> 
>>> 
>>> There seems to be something wrong with the newCapacity function that
>>> _d_arrayappendcT calls.  From an element size of 20000 (I halved it
>>> just to make the allocation faster) and an array length of 1, it
>>> somehow calculates the new size to be 266686600.  Hm.  That seems a
>>> bit off.
>>> 
>>> It seems this line:
>>> 
>>> long mult = 100 + (1000L * size) / log2plus1(newcap);
>>> 
>>> is to blame.  I don't think large value types were taken into account
>>> here.  The resulting multiplier is 1,333,433, which is hilariously
>>> large.
>> 
>> Bulls-eye. I think mr. Dave Fladebo deserves a math lesson or something, and mr. Walter Bright should review patches that go into the language's runtime more carefully. :P
>> 
>> Can we discuss a suitable replacement? I suggested in #d that we look at how other platforms do it. For example, Java's Vector just doubles its capacity by default ( http://java.sun.com/javase/6/docs/api/java/util/Vector.html ).
>> 
> 
> In my own array classes I'm using Python's: size += max(size>>3, 8);
> Using this, you won't end up wasting a lot of memory. Preallocating is always preferred anyway.
> 
> L.

actually the original function proposed by Dave Fladebo is not so bad, it calculates the average storage requirement per bit (total_size/num_bits) and tries to allocate that space more.
The advantage is that it stays exponential like which means ~O(log(n)) (maybe log(n)*log(log(n))) reallocations needed when continuously adding, but tries to waste less and less space the larger the array is.
I did propose two "reasonable" possibilities, that I give again here in a cleaner way
{{{
           const int b=2;
           version(A){
             const a =1000L;
           } else {
             const a=100L;
           }

           static int log2plusB(size_t c)
           {
               int i=b;
               while(c >>= 1){
                   ++i;
               }
               return i;
           }
           variant(A)
             long mult = 100 + a*b / log2plusB(newcap);
           } else {
             long mult = 100 + a*b / log2plusB(newlength);
           }

           // testing shows 1.02 for large arrays is about the point of diminishing return
           if (mult < 102)
               mult = 102;
           newext = cast(size_t)((newcap * mult) / 100);
           newext += size-(newext % size);
}}}
version A uses the total memory, the other the number of elements.

Increasing a makes the the amount of reserved space larger (it is the maximum amount of extra reserved space in %).
The value of a I choose is different from one version to the other because min(log2(newcap))=~12, min(log2(newlength))=1

Increasing b makes the decrease flatter, and goes closer to treating all sizes with the same extra space. For small b the decrease for larger arrays is faster.

In general I think that the functional form is sound, and it is superior to either fixed "double the size" or "add a constant size". This approach should be used also in other places that need to periodically reallocate.

The question is which values of a and b are optimal, and if it is better to use the total size or the number of elements.
Now if someone would to some meaningful benchmarks (on real applications), then I would be very happy, otherwise I would just take one of my guesses...

Fawzi

June 07, 2009
On 2009-06-07 11:33:06 +0200, Fawzi Mohamed <fmohamed@mac.com> said:

Ehm some small corrections to the constants, so that their meaning is better connected to what one expects in both versions

> actually the original function proposed by Dave Fladebo is not so bad, it calculates the average storage requirement per bit (total_size/num_bits) and tries to allocate that space more.
> The advantage is that it stays exponential like which means ~O(log(n)) (maybe log(n)*log(log(n))) reallocations needed when continuously adding, but tries to waste less and less space the larger the array is.
> I did propose two "reasonable" possibilities, that I give again here in a cleaner way
{{{
           const int b=2;
           const a =100L;
           version(A){
               const minBits=11; // normally page size is at least 2K
           } else {
               const minBits=1;
           }

           static int log2plusB(size_t c)
           {
               int i=b;
               while(c >>= 1){
                   ++i;
               }
               return i;
           }
           variant(A)
             long mult = 100 + a*(minBits+b) / log2plusB(newcap);
           } else {
             long mult = 100 + a*(minBits+b) / log2plusB(newlength);
           }

           // testing shows 1.02 for large arrays is about the point of diminishing return
           if (mult < 102)
               mult = 102;
           newext = cast(size_t)((newcap * mult) / 100);
           newext += size-(newext % size);
}}}
> version A uses the total memory, the other the number of elements.
> 
> Increasing a makes the the amount of reserved space larger (it is the maximum amount of extra reserved space in %).

The value of minBits I choose is different from one version to the other because min(log2(newcap))=~12, min(log2(newlength))=1

> Increasing b makes the decrease flatter, and goes closer to treating all sizes with the same extra space. For small b the decrease for larger arrays is faster.
> 
> In general I think that the functional form is sound, and it is superior to either fixed "double the size" or "add a constant size". This approach should be used also in other places that need to periodically reallocate.
> 
> The question is which values of a and b are optimal, and if it is better to use the total size or the number of elements.
> Now if someone would to some meaningful benchmarks (on real applications), then I would be very happy, otherwise I would just take one of my guesses...
> 
> Fawzi


June 07, 2009
On 2009-06-07 10:47:47 +0200, Christian Kamm <kamm-incasoftware@removethis.de> said:

> bearophile wrote:
> 
>>> The following code even crashes LDC during the compilation, I'll ask in
>>> the LDC channel:<
>> 
>> The good ChristianK has already added it:
>> http://www.dsource.org/projects/ldc/ticket/320
> 
> And thankfully Frits van Bommel has already fixed it: it consumes about 40kb
> of heap memory at runtime now.
> 
> This seems to be because we don't use the _d_arrayappend functions at all
> but emit a _d_arraysetlength instead. What's the memory growing behavior of
> that function? Should this be considered a bug?

I would say that setlength should not allocate extra space, because one should trust that the user knows his needs, whereas if the array grows while appending then it is reasonable to add extra space because the user is probably appending without really knowing/caring about array resizes.
So it is reasonable (and probably good) to try to second guess him, and add extra space.
This way repeatedly appending to that array has a good performance.

Thus yes I would say that it should be considered a bug (that will degrade repeated appending performance).

The solution is to use a better newCapacity function, instead of the flawed one.

Fawzi

June 07, 2009
On Sat, 06 Jun 2009 21:59:39 -0700, Sean Kelly wrote:

> Steve Schveighoffer wrote:
>> On Sat, 06 Jun 2009 12:03:03 -0700, Sean Kelly wrote:
>> 
>>> bearophile wrote:
>>>> Sean Kelly:
>>>>> Particularly in D2 where append
>>>>> operations on arrays are probably less common as a result of string
>>>>> being invariant.
>>>> They aren't much common maybe because they are currently dead-slow. Appending to an immutable string is a common operation. But I guess Array appenders will get more common...
>>> Yes but appending to an immutable string is never performed in place,
>>> which is the only time the extra space reserved by newCapacity
>>> matters.
>>>   I suspect the memory wasted by newCapacity is more of an issue than
>>> any time savings it provides.
>> 
>> What gave you that idea?
>> 
>> void main()
>> {
>>   auto str1 = "hello".idup;
>>   auto str2 = str1;
>>   str1 ~= "world";
>>   assert(str1.ptr == str2.ptr);
>> }
> 
> auto str1 = "hello".idup;
> auto str2 = str3 = str1;
> str2 ~= " world";
> str3 ~= " garbage";
> 
> Doesn't seem terribly safe to me.

Oh, I know.  It's a long-standing issue with immutability, but I think if appending gets fixed as Andrei suggests, this should be fixed as well.  I was just saying that your statement about immutable data never being appended in-place was false.

-Steve
June 07, 2009
On Sun, 07 Jun 2009 15:47:52 +0400, Steve Schveighoffer <schveiguy@yahoo.com> wrote:

> On Sat, 06 Jun 2009 21:59:39 -0700, Sean Kelly wrote:
>
>> Steve Schveighoffer wrote:
>>> On Sat, 06 Jun 2009 12:03:03 -0700, Sean Kelly wrote:
>>>
>>>> bearophile wrote:
>>>>> Sean Kelly:
>>>>>> Particularly in D2 where append
>>>>>> operations on arrays are probably less common as a result of string
>>>>>> being invariant.
>>>>> They aren't much common maybe because they are currently dead-slow.
>>>>> Appending to an immutable string is a common operation. But I guess
>>>>> Array appenders will get more common...
>>>> Yes but appending to an immutable string is never performed in place,
>>>> which is the only time the extra space reserved by newCapacity
>>>> matters.
>>>>   I suspect the memory wasted by newCapacity is more of an issue than
>>>> any time savings it provides.
>>>
>>> What gave you that idea?
>>>
>>> void main()
>>> {
>>>   auto str1 = "hello".idup;
>>>   auto str2 = str1;
>>>   str1 ~= "world";
>>>   assert(str1.ptr == str2.ptr);
>>> }
>>
>> auto str1 = "hello".idup;
>> auto str2 = str3 = str1;
>> str2 ~= " world";
>> str3 ~= " garbage";
>>
>> Doesn't seem terribly safe to me.
>
> Oh, I know.  It's a long-standing issue with immutability, but I think if
> appending gets fixed as Andrei suggests, this should be fixed as well.  I
> was just saying that your statement about immutable data never being
> appended in-place was false.
>
> -Steve

Your proof relies on buggy behavior.
June 07, 2009
On Mon, 08 Jun 2009 00:34:10 +0400, Denis Koroskin wrote:

> On Sun, 07 Jun 2009 15:47:52 +0400, Steve Schveighoffer <schveiguy@yahoo.com> wrote:
> 
>> On Sat, 06 Jun 2009 21:59:39 -0700, Sean Kelly wrote:
>>
>>> Steve Schveighoffer wrote:
>>>> On Sat, 06 Jun 2009 12:03:03 -0700, Sean Kelly wrote:
>>>>
>>>>> bearophile wrote:
>>>>>> Sean Kelly:
>>>>>>> Particularly in D2 where append
>>>>>>> operations on arrays are probably less common as a result of
>>>>>>> string being invariant.
>>>>>> They aren't much common maybe because they are currently dead-slow. Appending to an immutable string is a common operation. But I guess Array appenders will get more common...
>>>>> Yes but appending to an immutable string is never performed in
>>>>> place, which is the only time the extra space reserved by
>>>>> newCapacity matters.
>>>>>   I suspect the memory wasted by newCapacity is more of an issue
>>>>>   than
>>>>> any time savings it provides.
>>>>
>>>> What gave you that idea?
>>>>
>>>> void main()
>>>> {
>>>>   auto str1 = "hello".idup;
>>>>   auto str2 = str1;
>>>>   str1 ~= "world";
>>>>   assert(str1.ptr == str2.ptr);
>>>> }
>>>
>>> auto str1 = "hello".idup;
>>> auto str2 = str3 = str1;
>>> str2 ~= " world";
>>> str3 ~= " garbage";
>>>
>>> Doesn't seem terribly safe to me.
>>
>> Oh, I know.  It's a long-standing issue with immutability, but I think if appending gets fixed as Andrei suggests, this should be fixed as well.  I was just saying that your statement about immutable data never being appended in-place was false.
>>
>> -Steve
> 
> Your proof relies on buggy behavior.

By-design behavior.  See http://www.digitalmars.com/d/2.0/ arrays.html#resize

poorly designed, perhaps, but it's by design.

My point was that there's no special treatment of immutable data (as was suggested by Sean), it suffers from the same issues as mutable appending.

BTW, I'm not in favor of the current behavior, but I do think that if something is fixed for this array allocation issue, it should cover this problem as well.

-Steve
June 07, 2009
Steve Schveighoffer wrote:
> On Mon, 08 Jun 2009 00:34:10 +0400, Denis Koroskin wrote:
> 
>> On Sun, 07 Jun 2009 15:47:52 +0400, Steve Schveighoffer <schveiguy@yahoo.com> wrote:
>>
>>> On Sat, 06 Jun 2009 21:59:39 -0700, Sean Kelly wrote:
>>>
>>>> Steve Schveighoffer wrote:
>>>>> On Sat, 06 Jun 2009 12:03:03 -0700, Sean Kelly wrote:
>>>>>
>>>>> void main()
>>>>> {
>>>>>   auto str1 = "hello".idup;
>>>>>   auto str2 = str1;
>>>>>   str1 ~= "world";
>>>>>   assert(str1.ptr == str2.ptr);
>>>>> }
>>>> auto str1 = "hello".idup;
>>>> auto str2 = str3 = str1;
>>>> str2 ~= " world";
>>>> str3 ~= " garbage";
>>>>
>>>> Doesn't seem terribly safe to me.
>>> Oh, I know.  It's a long-standing issue with immutability, but I think if appending gets fixed as Andrei suggests, this should be fixed as well.  I was just saying that your statement about immutable data never being appended in-place was false.
>>>
>>> -Steve
>> Your proof relies on buggy behavior.
> 
> By-design behavior.  See http://www.digitalmars.com/d/2.0/ arrays.html#resize
> 
> poorly designed, perhaps, but it's by design.
> 
> My point was that there's no special treatment of immutable data (as was suggested by Sean), it suffers from the same issues as mutable appending.
> 
> BTW, I'm not in favor of the current behavior, but I do think that if something is fixed for this array allocation issue, it should cover this problem as well.
> 
> -Steve

This problem was one of the main drivers behind the proposal to formally separate arrays and slices (the T[new] vs T[] part of the D 2.0 talk).

It's kinda fallen down on the todo list, but it's a pretty key usability problem, imho.

Later,
Brad