Thread overview | |||||||||
---|---|---|---|---|---|---|---|---|---|
|
January 25, 2003 Memory layout in D | ||||
---|---|---|---|---|
| ||||
Hi. By the way, the design of D is very nice! Definately real improvements over Java and C#. It's a pleasure to read about a language designed by an experienced programmer. Here's some thoughts about the way compilers lay out memory. If you're getting the idea from my posts so far that I'm a speed freak, you're right. If an extra 20% of speed doesn't excite you, think about skipping this letter. ... Compilers that are meant to generate fast code (like C) generally give the user control of exact memory layout. D goes further than C and C++ in providing allignment constructs. Compilers or interpreters meant to be easy to use and generate portable code (like Java) usually hide the memory layout from you. While this sounds like the way it should be, I've found that if you give the compiler the flexibility to lay out memory however it wants (with optional help from the user), you generally get faster code using less memory, which is also portable and still easy to use. For example, one optimization I've used the past is automatically reordering fields. Take this D code for example: struct Foo { byte a; int b; byte c, d, e, f; } Foo c[10000000]; Did you cringe? If not, definately go the next news topic. In the array 'c', I'd expect the field 'b' of 'Foo' to cross a word boundary, and require multiple machine code instructions to load or write. Further, the structure is 9 bytes long, so I'd expect the compiler to pad it to 12 or 16. Yuk. Ok, ok... any real programmer would fix the layout by hand. For the less experienced programmers, however, it's nice for the compiler to automatically generate: struct Foo { int b; byte a, c, d, e, f; byte padding[3]; } This is kind of minor stuff. However, imagine you wanted to push the limits of memory and speed. Suppose that the critical code in an application is: total = 0; for(i = 0; i < 10000000; i++) { total ^= c[i].d; } Please forgive the use of the large constant. It's just to illistrate that there are a LOT of Foo objects in memory stored mostly contiguously (as many good memory managers would do). The critical code here accesses only the 'd' field. Using a sub-set of fields is really common in inner loops. But, all these other fields are wasting space in the data cashe. If instead, the compiler generated: struct Foo { int b; byte a, c, e, f; } Foo c[10000000]; byte c_d[10000000]; ... total = 0; for(i = 0; i < 10000000; i++) { total ^= c_d[i]; } Then, the cache would fill up with the 'd' fields, and leave out the other stuff. I've measured this effect in the past, and it can easily be a 2x speed improvement of critical code when there are a lot of objects to be processed (many megabytes worth). Also, the wasted memory (padding) in the structure Foo is gone. We can simultaneously improve speed and memory. The compiler now has to keep track of multiple objects to represent the original structure, but it can do that if memory layout is abstract. The code generators we use where I work do some of this kind of optimization. Really. We'd never have the patience to pull out individual fields and manage their memory by hand. With our tools, we get portable, fast, and memory efficent code. The key is simply removing the notion memory layout from the language. The language gets simplified, but the compiler gets more complicated. The code gets faster and leaner, and inexperienced programmers have one less thing to worry about. Bill Cox |
January 25, 2003 Re: Memory layout in D | ||||
---|---|---|---|---|
| ||||
Posted in reply to Bill Cox | Hi.
Usually in these cases i'd overload a storage class and stored data class to store data in a "broken" manner. You're right, it'd better be done at compilation. But have you thought how complex it is to implement?
However, structs are data arrangement mechanisms. And as such (by philosophy) cannot be changed. This could be done with classes, for which storage is not reliably defined anyway. The requierement is then, that a class may not necessarily have a run-time representation. One can requiere, that only classes implementing interfaces have a run-time representation, and only to the extent necessary. (?)
BTW, in classes data should be sorted largest-first and aligned already.
-i.
Bill Cox wrote:
> Hi.
>
> By the way, the design of D is very nice! Definately real improvements over Java and C#. It's a pleasure to read about a language designed by an experienced programmer.
>
> Here's some thoughts about the way compilers lay out memory. If you're getting the idea from my posts so far that I'm a speed freak, you're right. If an extra 20% of speed doesn't excite you, think about skipping this letter.
>
> ...
>
> Compilers that are meant to generate fast code (like C) generally give the user control of exact memory layout. D goes further than C and C++ in providing allignment constructs.
>
> Compilers or interpreters meant to be easy to use and generate portable code (like Java) usually hide the memory layout from you.
>
> While this sounds like the way it should be, I've found that if you give the compiler the flexibility to lay out memory however it wants (with optional help from the user), you generally get faster code using less memory, which is also portable and still easy to use.
>
> For example, one optimization I've used the past is automatically reordering fields. Take this D code for example:
>
> struct Foo {
> byte a;
> int b;
> byte c, d, e, f;
> }
>
> Foo c[10000000];
>
> Did you cringe? If not, definately go the next news topic. In the array 'c', I'd expect the field 'b' of 'Foo' to cross a word boundary, and require multiple machine code instructions to load or write. Further, the structure is 9 bytes long, so I'd expect the compiler to pad it to 12 or 16. Yuk. Ok, ok... any real programmer would fix the layout by hand. For the less experienced programmers, however, it's nice for the compiler to automatically generate:
>
> struct Foo {
> int b;
> byte a, c, d, e, f;
> byte padding[3];
> }
>
> This is kind of minor stuff. However, imagine you wanted to push the limits of memory and speed. Suppose that the critical code in an application is:
>
> total = 0;
> for(i = 0; i < 10000000; i++) {
> total ^= c[i].d;
> }
>
> Please forgive the use of the large constant. It's just to illistrate that there are a LOT of Foo objects in memory stored mostly contiguously (as many good memory managers would do).
>
> The critical code here accesses only the 'd' field. Using a sub-set of fields is really common in inner loops. But, all these other fields are wasting space in the data cashe. If instead, the compiler generated:
>
> struct Foo {
> int b;
> byte a, c, e, f;
> }
>
> Foo c[10000000];
> byte c_d[10000000];
>
> ...
> total = 0;
> for(i = 0; i < 10000000; i++) {
> total ^= c_d[i];
> }
>
> Then, the cache would fill up with the 'd' fields, and leave out the other stuff. I've measured this effect in the past, and it can easily be a 2x speed improvement of critical code when there are a lot of objects to be processed (many megabytes worth).
>
> Also, the wasted memory (padding) in the structure Foo is gone. We can simultaneously improve speed and memory. The compiler now has to keep track of multiple objects to represent the original structure, but it can do that if memory layout is abstract.
>
> The code generators we use where I work do some of this kind of optimization. Really. We'd never have the patience to pull out individual fields and manage their memory by hand. With our tools, we get portable, fast, and memory efficent code.
>
> The key is simply removing the notion memory layout from the language. The language gets simplified, but the compiler gets more complicated. The code gets faster and leaner, and inexperienced programmers have one less thing to worry about.
>
> Bill Cox
>
|
January 26, 2003 Re: Memory layout in D | ||||
---|---|---|---|---|
| ||||
Posted in reply to Bill Cox | Some great ideas there. The idea in D is that with structs, the programmer controls the layout. With classes, the compiler controls it and lays it out for maximum performance, doing things like reordering as you suggest. "Bill Cox" <bill@viasic.com> wrote in message news:3E327627.6000109@viasic.com... > Hi. > > By the way, the design of D is very nice! Definately real improvements over Java and C#. It's a pleasure to read about a language designed by an experienced programmer. > > Here's some thoughts about the way compilers lay out memory. If you're getting the idea from my posts so far that I'm a speed freak, you're right. If an extra 20% of speed doesn't excite you, think about skipping this letter. > > ... > > Compilers that are meant to generate fast code (like C) generally give the user control of exact memory layout. D goes further than C and C++ in providing allignment constructs. > > Compilers or interpreters meant to be easy to use and generate portable code (like Java) usually hide the memory layout from you. > > While this sounds like the way it should be, I've found that if you give the compiler the flexibility to lay out memory however it wants (with optional help from the user), you generally get faster code using less memory, which is also portable and still easy to use. > > For example, one optimization I've used the past is automatically reordering fields. Take this D code for example: > > struct Foo { > byte a; > int b; > byte c, d, e, f; > } > > Foo c[10000000]; > > Did you cringe? If not, definately go the next news topic. In the array 'c', I'd expect the field 'b' of 'Foo' to cross a word boundary, and require multiple machine code instructions to load or write. Further, the structure is 9 bytes long, so I'd expect the compiler to pad it to 12 or 16. Yuk. Ok, ok... any real programmer would fix the layout by hand. For the less experienced programmers, however, it's nice for the compiler to automatically generate: > > struct Foo { > int b; > byte a, c, d, e, f; > byte padding[3]; > } > > This is kind of minor stuff. However, imagine you wanted to push the limits of memory and speed. Suppose that the critical code in an application is: > > total = 0; > for(i = 0; i < 10000000; i++) { > total ^= c[i].d; > } > > Please forgive the use of the large constant. It's just to illistrate that there are a LOT of Foo objects in memory stored mostly contiguously (as many good memory managers would do). > > The critical code here accesses only the 'd' field. Using a sub-set of fields is really common in inner loops. But, all these other fields are wasting space in the data cashe. If instead, the compiler generated: > > struct Foo { > int b; > byte a, c, e, f; > } > > Foo c[10000000]; > byte c_d[10000000]; > > ... > total = 0; > for(i = 0; i < 10000000; i++) { > total ^= c_d[i]; > } > > Then, the cache would fill up with the 'd' fields, and leave out the other stuff. I've measured this effect in the past, and it can easily be a 2x speed improvement of critical code when there are a lot of objects to be processed (many megabytes worth). > > Also, the wasted memory (padding) in the structure Foo is gone. We can simultaneously improve speed and memory. The compiler now has to keep track of multiple objects to represent the original structure, but it can do that if memory layout is abstract. > > The code generators we use where I work do some of this kind of optimization. Really. We'd never have the patience to pull out individual fields and manage their memory by hand. With our tools, we get portable, fast, and memory efficent code. > > The key is simply removing the notion memory layout from the language. The language gets simplified, but the compiler gets more complicated. The code gets faster and leaner, and inexperienced programmers have one less thing to worry about. > > Bill Cox > |
January 26, 2003 Re: Memory layout in D | ||||
---|---|---|---|---|
| ||||
Posted in reply to Walter | Sounds good!
-- Bill Cox
Walter wrote:
> Some great ideas there. The idea in D is that with structs, the programmer
> controls the layout. With classes, the compiler controls it and lays it out
> for maximum performance, doing things like reordering as you suggest.
>
> "Bill Cox" <bill@viasic.com> wrote in message
> news:3E327627.6000109@viasic.com...
>
>>Hi.
>>
>>By the way, the design of D is very nice! Definately real improvements
>>over Java and C#. It's a pleasure to read about a language designed by
>>an experienced programmer.
>>
>>Here's some thoughts about the way compilers lay out memory. If you're
>>getting the idea from my posts so far that I'm a speed freak, you're
>>right. If an extra 20% of speed doesn't excite you, think about
>>skipping this letter.
>>
>>...
>>
>>Compilers that are meant to generate fast code (like C) generally give
>>the user control of exact memory layout. D goes further than C and C++
>>in providing allignment constructs.
>>
>>Compilers or interpreters meant to be easy to use and generate portable
>>code (like Java) usually hide the memory layout from you.
>>
>>While this sounds like the way it should be, I've found that if you give
>>the compiler the flexibility to lay out memory however it wants (with
>>optional help from the user), you generally get faster code using less
>>memory, which is also portable and still easy to use.
>>
>>For example, one optimization I've used the past is automatically
>>reordering fields. Take this D code for example:
>>
>>struct Foo {
>> byte a;
>> int b;
>> byte c, d, e, f;
>>}
>>
>>Foo c[10000000];
>>
>>Did you cringe? If not, definately go the next news topic. In the
>>array 'c', I'd expect the field 'b' of 'Foo' to cross a word boundary,
>>and require multiple machine code instructions to load or write.
>>Further, the structure is 9 bytes long, so I'd expect the compiler to
>>pad it to 12 or 16. Yuk. Ok, ok... any real programmer would fix the
>>layout by hand. For the less experienced programmers, however, it's
>>nice for the compiler to automatically generate:
>>
>>struct Foo {
>> int b;
>> byte a, c, d, e, f;
>> byte padding[3];
>>}
>>
>>This is kind of minor stuff. However, imagine you wanted to push the
>>limits of memory and speed. Suppose that the critical code in an
>>application is:
>>
>>total = 0;
>>for(i = 0; i < 10000000; i++) {
>> total ^= c[i].d;
>>}
>>
>>Please forgive the use of the large constant. It's just to illistrate
>>that there are a LOT of Foo objects in memory stored mostly contiguously
>>(as many good memory managers would do).
>>
>>The critical code here accesses only the 'd' field. Using a sub-set of
>>fields is really common in inner loops. But, all these other fields are
>>wasting space in the data cashe. If instead, the compiler generated:
>>
>>struct Foo {
>> int b;
>> byte a, c, e, f;
>>}
>>
>>Foo c[10000000];
>>byte c_d[10000000];
>>
>>...
>>total = 0;
>>for(i = 0; i < 10000000; i++) {
>> total ^= c_d[i];
>>}
>>
>>Then, the cache would fill up with the 'd' fields, and leave out the
>>other stuff. I've measured this effect in the past, and it can easily
>>be a 2x speed improvement of critical code when there are a lot of
>>objects to be processed (many megabytes worth).
>>
>>Also, the wasted memory (padding) in the structure Foo is gone. We can
>>simultaneously improve speed and memory. The compiler now has to keep
>>track of multiple objects to represent the original structure, but it
>>can do that if memory layout is abstract.
>>
>>The code generators we use where I work do some of this kind of
>>optimization. Really. We'd never have the patience to pull out
>>individual fields and manage their memory by hand. With our tools, we
>>get portable, fast, and memory efficent code.
>>
>>The key is simply removing the notion memory layout from the language.
>>The language gets simplified, but the compiler gets more complicated.
>>The code gets faster and leaner, and inexperienced programmers have one
>>less thing to worry about.
>>
>>Bill Cox
>>
>
>
>
|
January 27, 2003 Re: Memory layout in D | ||||
---|---|---|---|---|
| ||||
Posted in reply to Walter | "Walter" <walter@digitalmars.com> wrote in message news:b0vmev$472$1@digitaldaemon.com... > Some great ideas there. The idea in D is that with structs, the programmer controls the layout. With classes, the compiler controls it and lays it out > for maximum performance, doing things like reordering as you suggest. > Following this line of thought, then - structs cannot have virtual methods - structs can use inheritance but cannot implement an interface - structs can have bitfields (and then perhaps classes should _not_ be allowed to have them; bitfields often imply low level control) Would it make sense to rename them to 'aggregate' or something, to distinguish them from the conventional C/C++ semantics? |
January 29, 2003 Re: Memory layout in D | ||||
---|---|---|---|---|
| ||||
Posted in reply to Jeroen van Bemmel | "Jeroen van Bemmel" <anonymous@somewhere.com> wrote in message news:b12m0r$1lgg$1@digitaldaemon.com... > > "Walter" <walter@digitalmars.com> wrote in message news:b0vmev$472$1@digitaldaemon.com... > > Some great ideas there. The idea in D is that with structs, the programmer > > controls the layout. With classes, the compiler controls it and lays it > out > > for maximum performance, doing things like reordering as you suggest. > > > Following this line of thought, then > - structs cannot have virtual methods > - structs can use inheritance but cannot implement an interface > - structs can have bitfields (and then perhaps classes should _not_ be > allowed to have them; bitfields often imply low level control) > > Would it make sense to rename them to 'aggregate' or something, to distinguish them from the conventional C/C++ semantics? They do have conventional C semantics, in fact, they are deliberately set up to match the layout of what the host C/C++ compiler would do. |
January 30, 2003 Re: Memory layout in D | ||||
---|---|---|---|---|
| ||||
Posted in reply to Ilya Minkov | "Ilya Minkov" <midiclub@8ung.at> wrote in message news:b0u5s7$2dro$1@digitaldaemon.com... > > BTW, in classes data should be sorted largest-first It's not always good - some processors prefer opposite order. For example, Hitachi SuperH processors (quite popular architecture in the embedded systems, also used in WinCE machines) have memory load / store instructions with 4bit const. offset, scaled by the data size. Which means it can access with the single instruction quite limited memory range: 8bit : R_base + [0..15] 16bit : R_base + [0..30] 32bit : R_base + [0..60] In case if class members are sorted smallest-first, it's more likely to get any class member without additional code to calculate its address. |
Copyright © 1999-2021 by the D Language Foundation