Jump to page: 1 2
Thread overview
[Blog post] Why and when you should use SoA
Mar 25, 2016
maik klein
Mar 25, 2016
ZombineDev
Mar 25, 2016
ZombineDev
Mar 26, 2016
maik klein
Mar 27, 2016
ZombineDev
Mar 27, 2016
maik klein
Mar 26, 2016
Alex Parrill
Mar 27, 2016
maik klein
Mar 27, 2016
Alex Parrill
Mar 27, 2016
maik klein
Mar 27, 2016
Simen Kjaeraas
Mar 27, 2016
maik klein
March 25, 2016
Link to the blog post: https://maikklein.github.io/post/soa-d/
Link to the reddit discussion: https://www.reddit.com/r/programming/comments/4buivf/why_and_when_you_should_use_soa/
March 25, 2016
On Friday, 25 March 2016 at 01:07:16 UTC, maik klein wrote:
> Link to the blog post: https://maikklein.github.io/post/soa-d/
> Link to the reddit discussion: https://www.reddit.com/r/programming/comments/4buivf/why_and_when_you_should_use_soa/

Nice article. BTW, I would abstract the container concept from the SOA implementation, in order to allow for pluggable containers which probably simplify the code a little bit.
March 25, 2016
On Friday, 25 March 2016 at 21:22:51 UTC, ZombineDev wrote:
> On Friday, 25 March 2016 at 01:07:16 UTC, maik klein wrote:
>> Link to the blog post: https://maikklein.github.io/post/soa-d/
>> Link to the reddit discussion: https://www.reddit.com/r/programming/comments/4buivf/why_and_when_you_should_use_soa/
>
> Nice article. BTW, I would abstract the container concept from the SOA implementation, in order to allow for pluggable containers which probably simplify the code a little bit.

Another note - you can further simplify the implementation by using Fields!T and .tupleof. For example:

void insertBack(Fields!T fields)
// ....

void insertBack(T t)
{
    if(length == size) grow;

    foreach(idx, field; t.tupleof)
        containers[idx][length] = field;

    length++;
}
March 26, 2016
On Friday, 25 March 2016 at 21:41:13 UTC, ZombineDev wrote:
> On Friday, 25 March 2016 at 21:22:51 UTC, ZombineDev wrote:
>> On Friday, 25 March 2016 at 01:07:16 UTC, maik klein wrote:
>>> Link to the blog post: https://maikklein.github.io/post/soa-d/
>>> Link to the reddit discussion: https://www.reddit.com/r/programming/comments/4buivf/why_and_when_you_should_use_soa/
>>
>> Nice article. BTW, I would abstract the container concept from the SOA implementation, in order to allow for pluggable containers which probably simplify the code a little bit.
>
> Another note - you can further simplify the implementation by using Fields!T and .tupleof. For example:
>
> void insertBack(Fields!T fields)
> // ....
>
> void insertBack(T t)
> {
>     if(length == size) grow;
>
>     foreach(idx, field; t.tupleof)
>         containers[idx][length] = field;
>
>     length++;
> }

Thanks, yes that is simpler.

But I am not sure that I want to have pluggable containers in SOA, mostly because every field would have overhead from the container.

For example array has size, length etc as overhead, but it is also not that much and probably won't matter anyway.

But I also thought about it, maybe sometimes I want to use a map instead of an array for some fields. So I need to have a way of telling which field should get which container.

Maybe something like this:

SOA!(Foo, Array, HashMap, DList);

The current implementation is mostly for experimentation.
March 26, 2016
On Friday, 25 March 2016 at 01:07:16 UTC, maik klein wrote:
> Link to the blog post: https://maikklein.github.io/post/soa-d/
> Link to the reddit discussion: https://www.reddit.com/r/programming/comments/4buivf/why_and_when_you_should_use_soa/

I think structs-of-arrays are a lot more situational than you make them out to be.

You say, at the end of your article, that "SoA scales much better because you can partially access your data without needlessly loading unrelevant data into your cache". But most of the time, programs access struct fields close together in time (i.e. accessing one field of a struct usually means that you will access another field shortly). In that case, you've now split your data across multiple cache lines; not good.

Your ENetPeer example works against you here; the the packetThrottle* variables would be split up into different arrays, but they will likely be checked together when throttling packets. Though admittedly, it's easy to fix; put fields likely to be accessed together in their own struct.

The SoA approach also makes random access more inefficient and makes it harder for objects to have identity. Again, your ENetPeer example works against you; it's common for servers to need to send packets to individual clients rather than broadcasting them. With the SoA approach, you end up accessing a tiny part of multiple arrays, and load several cache lines containing data for ENetPeers that you don't care about (i.e. loading irrelevant data).

I think SoA can be faster if you are commonly iterating over a section of a dataset, but I don't think that's a common occurrence. I definitely think it's unwarranted to conclude that SoAs "scale much better" without noting when they scale better, especially without benchmarks.

I will admit, though, that the template for making the struct-of-arrays is a nice demonstration of D's templates.
March 27, 2016
On Saturday, 26 March 2016 at 23:31:23 UTC, Alex Parrill wrote:
> On Friday, 25 March 2016 at 01:07:16 UTC, maik klein wrote:
>> Link to the blog post: https://maikklein.github.io/post/soa-d/
>> Link to the reddit discussion: https://www.reddit.com/r/programming/comments/4buivf/why_and_when_you_should_use_soa/
>
> I think structs-of-arrays are a lot more situational than you make them out to be.
>
> You say, at the end of your article, that "SoA scales much better because you can partially access your data without needlessly loading unrelevant data into your cache". But most of the time, programs access struct fields close together in time (i.e. accessing one field of a struct usually means that you will access another field shortly). In that case, you've now split your data across multiple cache lines; not good.
>
> Your ENetPeer example works against you here; the the packetThrottle* variables would be split up into different arrays, but they will likely be checked together when throttling packets. Though admittedly, it's easy to fix; put fields likely to be accessed together in their own struct.
>
> The SoA approach also makes random access more inefficient and makes it harder for objects to have identity. Again, your ENetPeer example works against you; it's common for servers to need to send packets to individual clients rather than broadcasting them. With the SoA approach, you end up accessing a tiny part of multiple arrays, and load several cache lines containing data for ENetPeers that you don't care about (i.e. loading irrelevant data).
>
> I think SoA can be faster if you are commonly iterating over a section of a dataset, but I don't think that's a common occurrence. I definitely think it's unwarranted to conclude that SoAs "scale much better" without noting when they scale better, especially without benchmarks.
>
> I will admit, though, that the template for making the struct-of-arrays is a nice demonstration of D's templates.

The next blog post that I am writing will contain a few benchmarks for SoA vs AoS.


> But most of the time, programs access struct fields close together in time (i.e. accessing one field of a struct usually means that you will access another field shortly). In that case, you've now split your data across multiple cache lines; not good.

You can still group the data together if you always access it together.  What you wrote is actually not true for arrays, at least the way you wrote it.

Array!Foo arr

Iterating over 'arr', you will always load the complete Foo struct into memory, unless you hide stuff behind pointers.

> The SoA approach also makes random access more inefficient and makes it harder for objects to have identity.

No it actually makes it much better because you only have to load the relevant stuff into memory.

But you usually don't look at your objects in isolation.

AoS makes sense if you always care about all fields like for example Array!Vector3. You usually access all components of a vector.

What you lose is the general feel of oop.

Vector add(Vector a, Vector b);

Array!Vector vectors;

add(vectors[index1], vectors[index2]);

This really just won't work with SoA, especially if you want to mutate the data behind with a reference. For this you would just use AoS.

Btw I have done a lot of benchmarks and SoA in the worst case was always as fast as SoA.

But once you actually only access partial data, SoA can potentially be much faster.

This is what I mean with scaling

You start with

struct Test{
  int i;
  int j;
}
Array!Test tests;

and you have absolutely no performance problem for 'tests' because it is just so small.

But after a few years Test will have grown much bigger.

struct Test{
  int i;
  int j;
  int[100] junk;
}

If you use SoA you can always add stuff without any performance penalty, that is why I said that it "scales" better.

But as I have said in the blog post, you will not always replace AoS with SoA, but you should replace AoS with SoA where it makes sense.

> I think SoA can be faster if you are commonly iterating over a section of a dataset, but I don't think that's a common occurrence.

This happens in games very often when you use inheritance, your objects just will grow really big the more functionality you add.

Like for example you just want to move all objects based on velocity, so you just care about Position, Velocity. You don't have to load anything else into memory.

An entity component system really is just SoA at its core.
March 27, 2016
On Friday, 25 March 2016 at 01:07:16 UTC, maik klein wrote:
> Link to the blog post: https://maikklein.github.io/post/soa-d/
> Link to the reddit discussion: https://www.reddit.com/r/programming/comments/4buivf/why_and_when_you_should_use_soa/

Neat. I've actually thought about writing exactly this kind of template for the fun of it. Thank you for showing how it'd work.

Btw, your use of Tuple!ArrayTypes for the 'containers' field strikes me as unnecessary, as ArrayTypes on its own would cover all your use cases.

--
  Simen
March 27, 2016
On Sunday, 27 March 2016 at 01:39:44 UTC, Simen Kjaeraas wrote:
> On Friday, 25 March 2016 at 01:07:16 UTC, maik klein wrote:
>> Link to the blog post: https://maikklein.github.io/post/soa-d/
>> Link to the reddit discussion: https://www.reddit.com/r/programming/comments/4buivf/why_and_when_you_should_use_soa/
>
> Neat. I've actually thought about writing exactly this kind of template for the fun of it. Thank you for showing how it'd work.
>
> Btw, your use of Tuple!ArrayTypes for the 'containers' field strikes me as unnecessary, as ArrayTypes on its own would cover all your use cases.
>
> --
>   Simen

Yeah you are right, initially I thought I would use the a "named" tuple, like tuple(5, "field1", 1.0f, "field2"); but it was just unnecessary.

March 27, 2016
On Sunday, 27 March 2016 at 00:42:07 UTC, maik klein wrote:
>> I think SoA can be faster if you are commonly iterating over a section of a dataset, but I don't think that's a common occurrence.
>
> This happens in games very often when you use inheritance, your objects just will grow really big the more functionality you add.
>
> Like for example you just want to move all objects based on velocity, so you just care about Position, Velocity. You don't have to load anything else into memory.
>
> An entity component system really is just SoA at its core.

You can't have a struct-of-arrays with polymorphic data like game objects; the individual objects would have different properties and methods.

If you use a Unity-esque component system, you could potentially pool each object's component into an array... but then whatever component updates you're running likely touch most of the object state anyway (ex. the hypothetical PositionComponent would be updating both its position and velocity).

Also I forgot to mention: Your "Isn’t SoA premature optimization?" section is a textbook YAGNI violation. I might have to refactor my web app to support running across multiple servers and internationalization when it becomes the Next Big Thing, but it more than likely will not become the Next Big Thing, so it's not productive for me to add additional complexity to "make sure my code scales" (and yes, SoA does add complexity, even if you hide it with templates and methods).

Since AoS vs SoA is highly dependent on usage, I'd like to see some performance metrics with real-world access patterns instead of benchmarks that unrealistically only look at part of the data at a time, or use structs that are too small to matter. Of course, actually getting those metrics is the hard part...
March 27, 2016
On Sunday, 27 March 2016 at 02:20:09 UTC, Alex Parrill wrote:
> Also I forgot to mention: Your "Isn’t SoA premature optimization?" section is a textbook YAGNI violation. I might have to refactor my web app to support running across multiple servers and internationalization when it becomes the Next Big Thing, but it more than likely will not become the Next Big Thing, so it's not productive for me to add additional complexity to "make sure my code scales" (and yes, SoA does add complexity, even if you hide it with templates and methods).

Personally I don't think it adds any complexity but everyone has to decide that for him or herself. But it is quite annoying to refactor from AoS to SoA (at least with my implementation).

And you are right for webdev that probably doesn't matter as much but for games you hit that level pretty fast.

It is just too easy for game developers to push the limits. "Oh hey lets see how many AI's I can spawn". Maybe you can have 10 AI's or 100 AI's running around. Or maybe you have a gameserver for a completive game that runs at 100 updates per seconds, wouldn't it be nice to actually have it run at 500 on the same hardware?

Basically as a gamedev you are always resource bound.
« First   ‹ Prev
1 2