Thread overview
Java memory efficiency and column-oriented data
Feb 03, 2012
bearophile
Feb 03, 2012
Marco Leise
Feb 03, 2012
bearophile
Feb 03, 2012
Marco Leise
Feb 03, 2012
bcs
Feb 03, 2012
Timon Gehr
Feb 04, 2012
bearophile
February 03, 2012
Through Reddit I've found this good and long slides pack, it's about using Java data structures to increase memory efficiency of programs:

http://domino.research.ibm.com/comm/research_people.nsf/pages/sevitsky.pubs.html/$FILE/oopsla08%20memory-efficient%20java%20slides.pdf

Despite the D situation is different (there are structs as in C#), it will be good to have weak and soft references in Phobos, and to have better memory analysis tools outside Phobos.

The slides have reminded me my desire of a column-oriented "struct array" in Phobos (some time ago someone has written a minimal version for D1).

The usage is simple:


import std.stdio, std.conv;

struct Foo { // an example struct
    int x;
    float y;
    string s;

    this(int xx, float yy) {
        x = xx;
        y = yy;
        s = text(x);
    }

    float sum() {
        return x + y;
    }
}

void main() {
    auto a1 = new Foo[1000]; // normal not parallel array
    foreach (ref Foo f; a1)
        writeln(f.s, " ", f.sum());

    // default usage example of ParallelArray
    // 3 Foo fields stored as 3 separated arrays inside a2
    ParallelArray!Foo a2; // valid
    static assert(a2[0].sizeof == size_t.sizeof * 4); // 3 pointers + 1 length
    a2.length = 1000;
    foreach (ref Foo f; a2) // A f Foo is built on the fly
        writeln(f, " ", f.sum());
    a2[10] = Foo(1, 2, "1");
    foreach (x; a2.x_array) // x_array is a property slice
        writeln(x);
    foreach (y; a2.y_array)
        writeln(y);
    foreach (s; a2.s_array)
        writeln(s);

    // specialized usage example of ParallelArray
    // x,y fields stored as an array, s field as another array
    ParallelArray!(Foo, "x y # s") a3; // valid
    static assert(a3[0].sizeof == size_t.sizeof * 3); // 2 pointers + 1 length
    a3.length = 1000;
    foreach (ref Foo f; a3) // A f Foo is built on the fly
        writeln(f, " ", f.sum());
    a3[10] = Foo(1, 2, "1");
    foreach (xy; a3.x_y_array)
        writeln(xy.x, " ", xy.y);
    foreach (s; a3.s_array)
        writeln(s);

    // float z0 = a3.x_y_array[10].sum(); // invalid code
    ParallelArray!(Foo, "x # y # s") a4; // valid code
    // ParallelArray!(Foo, "x y # s x") a5; // invalid, dupe field x
    // ParallelArray!(Foo, "x # y") a6; // invalid, s field missing
    // so if you give a string with the field names, you need to
    // list them all, and only once each. Other designs are possible
    // but this is the simplest to use and implement.

    float z1 = a3[10].sum(); // a3[10] returns a Foo

    // a3(10) doesn't create a Foo, it just fetches what
    // .sum() needs, so it's faster if you have to call .sum()
    // on many records.
    // so the calls to sum() are implemented at compile-time
    float z2 = a3(10).sum();

    // To keep design simple. ParallelArray can't create 2D arrays
}


Do you like?
I have several usages of such struct in my code.

Bye,
bearophile
February 03, 2012
Am 03.02.2012, 01:21 Uhr, schrieb bearophile <bearophileHUGS@lycos.com>:

> Do you like?
> I have several usages of such struct in my code.
>
> Bye,
> bearophile

From the back of my head I remember thinking of this. I'm not sure if I really needed it, or just split up the struct in two parts.
February 03, 2012
Marco Leise:

>  From the back of my head I remember thinking of this. I'm not sure if I
> really needed it, or just split up the struct in two parts.

Splitting up a struct Foo is possible, but then when you have to pass it around you need to remember to pass all its parts. And things gets more complex if Foo has various methods that use mixed subsets of the fields.

Column-wide arrays allow you to not change the original Foo struct, to use Foo unchanged where you don't need a column-oriented data structure, and allow you most common usages. They allow you to split your Foo in different ways in different parts of your program, using different ParallelArray. They are supposed to give higher scan performance if in a part of a program you need only a subset of the fields. Iterating only on a subset of the fields of an array of records is a common enough task in my D code.

Bye,
bearophile
February 03, 2012
Am 03.02.2012, 03:38 Uhr, schrieb bearophile <bearophileHUGS@lycos.com>:

> Marco Leise:
>
>>  From the back of my head I remember thinking of this. I'm not sure if I
>> really needed it, or just split up the struct in two parts.
>
> Splitting up a struct Foo is possible, but then when you have to pass it around you need to remember to pass all its parts. And things gets more complex if Foo has various methods that use mixed subsets of the fields.
>
> Column-wide arrays allow you to not change the original Foo struct, to use Foo unchanged where you don't need a column-oriented data structure, and allow you most common usages. They allow you to split your Foo in different ways in different parts of your program, using different ParallelArray. They are supposed to give higher scan performance if in a part of a program you need only a subset of the fields. Iterating only on a subset of the fields of an array of records is a common enough task in my D code.
>
> Bye,
> bearophile

Can you post code, that proves the claim and some figures? Pictures say more than thousand words. As long as I think scanning performance will hardly be influenced this just complicates things like

	arr[i].x = 1;  // no return by ref possible, so doesn't do what you want

or

	arr = minimallyInitializedArray!(Foo[])(1_000_000);

:)

-- Marco
February 03, 2012
On 02/02/2012 04:21 PM, bearophile wrote:
> Through Reddit I've found this good and long slides pack, it's about using Java data structures to increase memory efficiency of programs:
>
> http://domino.research.ibm.com/comm/research_people.nsf/pages/sevitsky.pubs.html/$FILE/oopsla08%20memory-efficient%20java%20slides.pdf
>
> Despite the D situation is different (there are structs as in C#), it will be good to have weak and soft references in Phobos, and to have better memory analysis tools outside Phobos.
>
> The slides have reminded me my desire of a column-oriented "struct array" in Phobos (some time ago someone has written a minimal version for D1).
>
> The usage is simple:
>
>
> import std.stdio, std.conv;
>
> struct Foo { // an example struct
>      int x;
>      float y;
>      string s;
>
>      this(int xx, float yy) {
>          x = xx;
>          y = yy;
>          s = text(x);
>      }
>
>      float sum() {
>          return x + y;
>      }
> }
>
> void main() {
>      auto a1 = new Foo[1000]; // normal not parallel array
>      foreach (ref Foo f; a1)
>          writeln(f.s, " ", f.sum());
>
>      // default usage example of ParallelArray
>      // 3 Foo fields stored as 3 separated arrays inside a2
>      ParallelArray!Foo a2; // valid
>      static assert(a2[0].sizeof == size_t.sizeof * 4); // 3 pointers + 1 length
>      a2.length = 1000;
>      foreach (ref Foo f; a2) // A f Foo is built on the fly
>          writeln(f, " ", f.sum());
>      a2[10] = Foo(1, 2, "1");
>      foreach (x; a2.x_array) // x_array is a property slice

Ideally this shouldn't require the property. The "natural" or auto type for iterating a ParallelArray should be a proxy value that defines properties for all the members and looks them up on demand. It would just need two words, a pointer to the parent ParallelArray and an index into it.

>          writeln(x);
>      foreach (y; a2.y_array)
>          writeln(y);
>      foreach (s; a2.s_array)
>          writeln(s);
>
>      // specialized usage example of ParallelArray
>      // x,y fields stored as an array, s field as another array
>      ParallelArray!(Foo, "x y # s") a3; // valid
>      static assert(a3[0].sizeof == size_t.sizeof * 3); // 2 pointers + 1 length
>      a3.length = 1000;
>      foreach (ref Foo f; a3) // A f Foo is built on the fly
>          writeln(f, " ", f.sum());
>      a3[10] = Foo(1, 2, "1");
>      foreach (xy; a3.x_y_array)
>          writeln(xy.x, " ", xy.y);
>      foreach (s; a3.s_array)
>          writeln(s);
>
>      // float z0 = a3.x_y_array[10].sum(); // invalid code
>      ParallelArray!(Foo, "x # y # s") a4; // valid code
>      // ParallelArray!(Foo, "x y # s x") a5; // invalid, dupe field x
>      // ParallelArray!(Foo, "x # y") a6; // invalid, s field missing
>      // so if you give a string with the field names, you need to
>      // list them all, and only once each. Other designs are possible
>      // but this is the simplest to use and implement.
>
>      float z1 = a3[10].sum(); // a3[10] returns a Foo
>
>      // a3(10) doesn't create a Foo, it just fetches what
>      // .sum() needs, so it's faster if you have to call .sum()
>      // on many records.
>      // so the calls to sum() are implemented at compile-time
>      float z2 = a3(10).sum();
>
>      // To keep design simple. ParallelArray can't create 2D arrays
> }
>
>
> Do you like?
> I have several usages of such struct in my code.
>
> Bye,
> bearophile

February 03, 2012
On 02/03/2012 01:21 AM, bearophile wrote:
> Through Reddit I've found this good and long slides pack, it's about using Java data structures to increase memory efficiency of programs:
>
> http://domino.research.ibm.com/comm/research_people.nsf/pages/sevitsky.pubs.html/$FILE/oopsla08%20memory-efficient%20java%20slides.pdf
>
> Despite the D situation is different (there are structs as in C#), it will be good to have weak and soft references in Phobos, and to have better memory analysis tools outside Phobos.
>
> The slides have reminded me my desire of a column-oriented "struct array" in Phobos (some time ago someone has written a minimal version for D1).
>
> The usage is simple:
>
>
> import std.stdio, std.conv;
>
> struct Foo { // an example struct
>      int x;
>      float y;
>      string s;
>
>      this(int xx, float yy) {
>          x = xx;
>          y = yy;
>          s = text(x);
>      }
>
>      float sum() {
>          return x + y;
>      }
> }
>
> void main() {
>      auto a1 = new Foo[1000]; // normal not parallel array
>      foreach (ref Foo f; a1)
>          writeln(f.s, " ", f.sum());
>
>      // default usage example of ParallelArray
>      // 3 Foo fields stored as 3 separated arrays inside a2
>      ParallelArray!Foo a2; // valid
>      static assert(a2[0].sizeof == size_t.sizeof * 4); // 3 pointers + 1 length
>      a2.length = 1000;
>      foreach (ref Foo f; a2) // A f Foo is built on the fly
>          writeln(f, " ", f.sum());
>      a2[10] = Foo(1, 2, "1");
>      foreach (x; a2.x_array) // x_array is a property slice
>          writeln(x);
>      foreach (y; a2.y_array)
>          writeln(y);
>      foreach (s; a2.s_array)
>          writeln(s);
>
>      // specialized usage example of ParallelArray
>      // x,y fields stored as an array, s field as another array
>      ParallelArray!(Foo, "x y # s") a3; // valid
>      static assert(a3[0].sizeof == size_t.sizeof * 3); // 2 pointers + 1 length
>      a3.length = 1000;
>      foreach (ref Foo f; a3) // A f Foo is built on the fly
>          writeln(f, " ", f.sum());
>      a3[10] = Foo(1, 2, "1");
>      foreach (xy; a3.x_y_array)
>          writeln(xy.x, " ", xy.y);
>      foreach (s; a3.s_array)
>          writeln(s);
>
>      // float z0 = a3.x_y_array[10].sum(); // invalid code
>      ParallelArray!(Foo, "x # y # s") a4; // valid code
>      // ParallelArray!(Foo, "x y # s x") a5; // invalid, dupe field x
>      // ParallelArray!(Foo, "x # y") a6; // invalid, s field missing
>      // so if you give a string with the field names, you need to
>      // list them all, and only once each. Other designs are possible
>      // but this is the simplest to use and implement.
>
>      float z1 = a3[10].sum(); // a3[10] returns a Foo
>
>      // a3(10) doesn't create a Foo, it just fetches what
>      // .sum() needs, so it's faster if you have to call .sum()
>      // on many records.
>      // so the calls to sum() are implemented at compile-time
>      float z2 = a3(10).sum();

How would you implement such a thing? Introspection is not (yet) that powerful.

>
>      // To keep design simple. ParallelArray can't create 2D arrays
> }
>
>
> Do you like?
> I have several usages of such struct in my code.
>
> Bye,
> bearophile

I like it and I think it would be a valuable addition to Phobos, as well as a nice showcase for Ds templates and string mixins. You didn't specify what the type of eg. a3.x_y_array is, I think it should be Tuple!(int, "x", float, "y")[]. Furthermore, I'd rather have it named a3.x_y. There should also be a possibility to manually name the parts, as well as a way to specify 'anything that is left':

ParallelArray!(Foo, "pos: x y # rest: ...") pa;

static assert(is(typeof(pa.pos) == Tuple!(int, "x", float, "y")[]));

February 04, 2012
Timon Gehr:

> I like it and I think it would be a valuable addition to Phobos,

Marco Leise has asked for some benchmarks first, and I think he's right.


> as well as a nice showcase for Ds templates and string mixins.

Phobos is meant to be first of all _useful_ :-)


> You didn't specify what the type of eg. a3.x_y_array is, I think it should be Tuple!(int, "x", float, "y")[].

OK.


> Furthermore, I'd rather have it named a3.x_y.

The problem is when you ask for a single field then the name becomes something just like "x" that I think is misleading because it's not the original x, it's a tuple with a single item. I think a name that helps you tell apart the original 'x' field from this new field is better. That's why I have added a suffix.


> There should also be a possibility to manually name the parts,

I think this makes the splitting string a bit too much complex... This data structure is meant to be as much as possible plain, like a POD array. I'd like to not give the option to change the synthetic field names both to keep ParallelArray simpler, and to help its users to find it _easy_ to tell exactly what it spits out.


> as well as a way to specify 'anything that is left':
> 
> ParallelArray!(Foo, "pos: x y # rest: ...") pa;

This was my first design for ParallelArray, but I think it's not a good idea, and not I prefer the simpler design I've shown. Forcing a stricter management of the names in the optional string allows for a stronger static typing: if you don't do that if you later change the original Foo struct, adding a field, it goes in the "rest" part, and you don't know what's in "rest" any more.

But ParallelArray is meant to be a transparent data structure, so it asks you to list all the original fields once and only once (so if you add ore remove a field in Foo, the compiler asks you to modify all ParallelArray of your program defined on Foo that have a string too.  If you don't give a string,  like   ParallelArray!Foo  then it compiles even if you change Foo).


Anyway, if users later think it's really needed, then this ParallelArray feature is addable later with a syntax like:
ParallelArray!(Foo, "x y # ...") pa;
And the syntax for naming new fields is simple as you say:
ParallelArray!(Foo, "pos: x y # rest: ...") pa;
But I suggest to not put this in a first implementation of ParallelArray and keep things simpler first.

Thank you for your comments,
bearophile