Jump to page: 1 24  
Page
Thread overview
Array append performance 2
Aug 30, 2008
bearophile
Aug 30, 2008
bearophile
Aug 31, 2008
dsimcha
Aug 31, 2008
Sergey Gromov
Aug 31, 2008
Sergey Gromov
Aug 31, 2008
dsimcha
Aug 31, 2008
bearophile
Aug 31, 2008
Sean Kelly
Aug 31, 2008
Fawzi Mohamed
Aug 31, 2008
bearophile
Aug 31, 2008
Fawzi Mohamed
Aug 31, 2008
bearophile
Sep 01, 2008
Denis Koroskin
Sep 01, 2008
bearophile
Sep 01, 2008
bearophile
Sep 01, 2008
bearophile
Sep 02, 2008
Benji Smith
Sep 02, 2008
bearophile
Sep 03, 2008
Dave
Sep 03, 2008
bearophile
Sep 02, 2008
BCS
Aug 31, 2008
Denis Koroskin
Aug 31, 2008
Christian Kamm
Aug 31, 2008
bearophile
Sep 01, 2008
downs
"Protocol Buffers" for Tango & Phobos ??
Sep 01, 2008
Nick B
Sep 01, 2008
Lars Ivar Igesund
Sep 01, 2008
Brian Price
Sep 01, 2008
Leandro Lucarella
Sep 02, 2008
Brian Price
Sep 03, 2008
Mitja Slenc
August 30, 2008
Until some more basic solution is found (and/or answers from Walter regarding how to solve this append performance problem and regarding the idea of adding a third capacity field to dynamical arrays), someone has suggested to create an array builder class, like the string builder of Java. So I have created one, the code is in the attach. Note that code is raw still, not commented, and it does the bare minimum (append one item and extract the array, I'd like to add generic extend methods, and tye possibility to change the capacity/length), its tests are messy still, etc, once I know it's correct I'll improve it and I'll add it to my libs (of course, I can offer a version of this code to Phobos/Tango too).

It's not easy for me to write such GC-based code, so my questions are:
Is ArrayBuilder!() correct in every situation?
Why are the performance of the benchmark3 so low (a bit worse than normal array append)? Can it be improved?


Extra note: while creating it I have found another bug in DMD I didn't know about:

import std.stdio: writefln;
void main() {
    alias int[2] T;
    writefln(T.init);
}

This program has to print [0,0] instead of 0.


Timings, on a Core Duo @ 2 GHz, 2 GB RAM:

benchmark 1, N=2_000_000:
  Array append: 0.160 s
  ArrayBuilder: 0.026 s
benchmark 1, N=20_000_000:
  Array append: 1.837 s
  ArrayBuilder: 0.325 s
benchmark 1, N=50_000_000:
  Array append: 4.489 s
  ArrayBuilder: 0.731 s
benchmark 1, N=200_000_000:
  Array append: 32.293 s
  ArrayBuilder: Out of memory

benchmark 2, N=200_000:
  Array append: 0.099 s
  ArrayBuilder: 0.004 s
benchmark 2, N=2_000_000:
  Array append: 6.896 s
  ArrayBuilder: 0.050 s

benchmark 3, N=200_000:
  Array append: 0.090 s
  ArrayBuilder: 0.076 s
benchmark 3, N=2_000_000:
  Array append: 1.014 s
  ArrayBuilder: 0.923 s (why is this so slow?)
benchmark 3, N=6_000_000:
  Array append: 5.295 s
  ArrayBuilder: 4.431 s

benchmark 4, N=500_000:
  Array append: 1.109 s
  ArrayBuilder: 0.414 s
benchmark 4, N=1_000_000:
  Array append: 3.103 s

Bye,
bearophile


August 30, 2008
IsReferenceType!() isn't fully correct because it sees fixed size arrays as reference types. I have posted a more correct (and cleaned, and single-template) version here:

http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=75813

Bye,
bearophile
August 31, 2008
Nice work on the reference template.  I had thought that the typeid(T).flags() would be inlined, then constant folded, but CTFE doesn't work on it, so I guess not.  Probably pretty negligible, but if I have a template laying around to do this at compile time, I may as well use it.

Also, to answer your question, you do need to call hasNoPointers() each time you
realloc.

import std.stdio, std.gc, std.process;

void main() {
    uint[] foo = (cast(uint[]) malloc(uint.sizeof))[0..1];
    hasNoPointers(foo.ptr);
    foo = cast(uint[]) realloc(foo.ptr, 100_000 * uint.sizeof)[0..100_000];
    //Commenting next line out causes severe mem leaks.
    hasNoPointers(foo.ptr);
    foreach(ref f; foo) {
        f = cast(uint) test();  //Create false pointers.
    }
    system("pause");  //So mem usage can be checked.
}

uint* test() {
    return (new uint[1_000]).ptr;
}

Oh, one more thing:  What's a static struct as opposed to a regular struct?  I've never seen it used before.  I had assumed that this means a struct composed only of static members, but apparently not.
August 31, 2008
== Quote from bearophile (bearophileHUGS@lycos.com)'s article
> IsReferenceType!() isn't fully correct because it sees fixed size arrays as
reference types. I have posted a more correct (and cleaned, and single-template)
version here:
>
http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=75813
> Bye,
> bearophile

One small issue:  Your IsReference template doesn't work with invariant, and presumably const, types.  For example,

IsReferenceType!(typeof("Foo"[0]));  //Returns true.

The easiest way to fix this is to add a Mutable!():

static if (IsType!(Mutable!(Types[0]), bool, byte, ubyte, short, ushort, int, uint,
                           long, ulong, float, double, real, ifloat, idouble,
                           ireal, cfloat, cdouble, creal, char, dchar, wchar) )

Also, although it may be the least bad way to do this (I can't think of a better one), the process of elimination method this relies on seems kind of brittle because it would have to be modified for any new type that is added.  If anyone knows of a less kludge-y way, please post.
August 31, 2008
dsimcha:

>Nice work on the reference template.<

It comes from several expansion and compression cycles, has requires few hours to be written ;-)


>Also, to answer your question, you do need to call hasNoPointers() each time you realloc.<

Right, I have just read the D docs that say it.
But what's the rationale behind it? If I have a pointer and I want to realloc it, to make its space bigger, I never want to change the fact that the GC scans its pointers or not. Stating it once must suffice.


>Oh, one more thing:  What's a static struct as opposed to a regular struct?<

I have taken a look at the documentation, and I have performed some benchmarks, now I think it does nothing, I have removed that attribute.
I have flagged that struct fields as private because originally that was a class, but a struct is better and faster, so I have removed those "private" too.


>One small issue:  Your IsReference template doesn't work with invariant, and presumably const, types.<

Because I (as most people here, I presume) use D 1. Sorry for not stating it clearly.


> The easiest way to fix this is to add a Mutable!():
> static if (IsType!(Mutable!(Types[0]), bool, byte, [...]

Thank you, I have added a note into my libs, even if they are for D 1.

Hopefully some one else will tell me if the ArrayBuilder is correct (even if this group isn't D.learn).

Bye and thank you,
bearophile
August 31, 2008
dsimcha wrote:
> Nice work on the reference template.  I had thought that the typeid(T).flags()
> would be inlined, then constant folded, but CTFE doesn't work on it, so I guess
> not.  Probably pretty negligible, but if I have a template laying around to do
> this at compile time, I may as well use it.
> 
> Also, to answer your question, you do need to call hasNoPointers() each time you
> realloc.

Not with Tango.


Sean
August 31, 2008
On Sun, 31 Aug 2008 04:58:10 +0400, dsimcha <dsimcha@yahoo.com> wrote:

> Nice work on the reference template.  I had thought that the typeid(T).flags()
> would be inlined, then constant folded, but CTFE doesn't work on it, so I guess
> not.  Probably pretty negligible, but if I have a template laying around to do
> this at compile time, I may as well use it.
>
> Also, to answer your question, you do need to call hasNoPointers() each time you
> realloc.
>
> import std.stdio, std.gc, std.process;
>
> void main() {
>     uint[] foo = (cast(uint[]) malloc(uint.sizeof))[0..1];
>     hasNoPointers(foo.ptr);
>     foo = cast(uint[]) realloc(foo.ptr, 100_000 * uint.sizeof)[0..100_000];
>     //Commenting next line out causes severe mem leaks.
>     hasNoPointers(foo.ptr);
>     foreach(ref f; foo) {
>         f = cast(uint) test();  //Create false pointers.
>     }
>     system("pause");  //So mem usage can be checked.
> }
>
> uint* test() {
>     return (new uint[1_000]).ptr;
> }
>
> Oh, one more thing:  What's a static struct as opposed to a regular struct?  I've
> never seen it used before.  I had assumed that this means a struct composed only
> of static members, but apparently not.

Static doesn't apply to a struct, only to inner classes, IIRC.
August 31, 2008
On 2008-08-31 06:01:26 +0200, Sean Kelly <sean@invisibleduck.org> said:

> dsimcha wrote:
>> Nice work on the reference template.  I had thought that the typeid(T).flags()
>> would be inlined, then constant folded, but CTFE doesn't work on it, so I guess
>> not.  Probably pretty negligible, but if I have a template laying around to do
>> this at compile time, I may as well use it.
>> 
>> Also, to answer your question, you do need to call hasNoPointers() each time you
>> realloc.
> 
> Not with Tango.
> 
> 
> Sean

I would have done it quite differently: keep a linked list (if you want to improve of batches of 10 arrays), with a pointer to the end, append in O(1), and keep the total length.

only upon a call to toArray allocate an array of the requested length and fill it.

An array (even with tricks) is not a good structure to append to, so it seems a bad idea to me to use it.

Fawzi

August 31, 2008
> Extra note: while creating it I have found another bug in DMD I didn't know about:
> 
> import std.stdio: writefln;
> void main() {
>     alias int[2] T;
>     writefln(T.init);
> }
> 
> This program has to print [0,0] instead of 0.

Actually, I don't think the D spec requires typeof(T.init) == T. And T is a valid initializer for T[N]:

int[3] arr = 5; // sets all elements to 5

Christian
August 31, 2008
Christian Kamm:
> Actually, I don't think the D spec requires typeof(T.init) == T.

Maybe the D specs have to be fixed then.


> And T is a valid initializer for T[N]:

Right, but the following program:

import std.stdio: writefln;
void main() {
    alias int[2] T;
    writefln(T.init);

    int[2] T_init;
    writefln(T_init);
}

prints:
0
[0,0]

So if I have a dynamic array of static arrays (where T is the static array) I can't do this: array[0 .. $][] = T.init;

I have to do:

T T_init;
array[0 .. $][] = T_init;

While if T is a dynamic array I can do:

array[0 .. $] = T.init;

So even if they aren't bugs I don't like all such differences much...

Bye,
bearophile
« First   ‹ Prev
1 2 3 4