classes, structs and allocation (page 3) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » D » classes, structs and allocation (page 3)

February 01, 2004

Re: classes, structs and allocation

Posted by Sean L. Palmer
in reply to Blas Rodriguez Somoza

Sean L. Palmer

Posted in reply to Blas Rodriguez Somoza

My advice is the same as Matthew's, to post the code that is doing the benchmarkings or at least be very detailed about the specific techniques used to do the measurements.

Until that happens, we can't comment on the "results" that were posted, as they are meaningless statistics by themselves.

Sean


Blas Rodriguez Somoza wrote:
| Sean L. Palmer wrote:
|
|| Good advice, Matthew.
||
|| Benchmarking is a very complicated endeavor, fraught with pitfalls
|| to trap the unwary into thinking they have some results, when
|| actually they've been measuring something completely different such
|| as how the order of running the tests affects the memory cache.
|| Profiling (finding the slow parts of an app) has many similar, but
|| different, issues.
||
|| The bottom line is, if you don't know how to profile or benchmark,
|| don't advertise that you do;  it's just misleading to yourself and
|| others.
||
|| Sean
||
|
| Until now it seems nobody excluding Jon give any advice about the
| results or argue the results are wrong with reasons.
|
| Perhaps I have some strange habbits but when I post this question and
| data I expect some advice and not a flame war.
|
| At least in other projects in the net this is the usual way to work
| when someone make a question and give code and numbers about it.
|
| I know how to profile/benchmark and I did it in several os and
| languages in my previous 22 years working in this business. (F.I
| after my work in the Firebird jdbc driver it performs some others of
| magnitude better).
|
|
|| "Matthew" <matthew.hat@stlsoft.dot.org> wrote in message
|| news:bvg4kv$114g$1@digitaldaemon.com...
||
||| I'll have more to say when I've had a chance to dig into the code,
||| but the first thing I can say is that you need to make a choice as
||| to whether to measure (i) the cost of an app such as the one you
||| have, including startup and shutdown costs, and represent the
||| results as such, rather than as a comparison of memory allocation
||| times, or (ii) you need to ensure that you do "warm-up" loops so
||| that you're measuring the memory infrastructures as they are likely
||| to behave in a real system someway through its lifetime, rather
||| than just the performance of a newly initiated application.
|||
||| Other things:
|||
||| You appear to measure the elapsed time for a single execution with
||| the program. Do you then do several runs and take averages? Do you
||| discard a proportion of the lowest and highest times?
|||
||| If you're interested in measuring the total app time, then consider
||| using the ptime utility - from
||| http://synesis.com.au/r_systools.html - to
||
|| control
||
||| execution, elide extreme cases and calculate averages.
|||
||| If you're interested purely in the memory time, then you should
||| include warmups in the app, and calculate averages of the resultant
||| normal-time loops.
|||
||| If you don't take these measures, then how do we know whether the
||| memory times you report reflect the fundamentals of the languages'
||| memory allocation, or the OS virtual memory? They may even be
||| reflective of the order in which you ran your test programs.
|||
||| Of course, you may have taken some/all of these measures, and just
||| not mentioned it in your post. If that's the case, can you let us
||| know how it was done?
|||
||| Cheers
|||
||| Matthew
|||
|||
||| "Blas Rodriguez Somoza" <blas@puertareal.com> wrote in message
||| news:bvcl8a$19hv$1@digitaldaemon.com...
|||
|||| Matthew wrote:
||||
||||| Please post your code
|||||
||||| "Blas Rodriguez Somoza" <blas@puertareal.com> wrote in message
||||| news:bvc49s$dej$1@digitaldaemon.com...
|||||
|||||
|||||| Hello
||||||
||||||   Nice language, I think it will have a bright future if it
|||||| performance   is near C++.
||||||
||||||   I'm new to D and I'm trying to benchmark a small application (a
||
|| GLR
||
|||||| parser) with C++, Java and D.
||||||
||||||   First at all a question, it is possible to use bulk class
|||||| allocation?. If I'm not wrong MyClass[] mc = new MyClass[num]
||
|| allocates
||
|||||| (in C terms) an array of pointers to classes instead an array of
|||
||| classes.
|||
||||||   Since the application expends half of the time allocating
|||||| objects (in the java version) I create a test to compare object
|||||| allocation performance between languages.
||||||
|||||| The test allocates 13 arrays with 64000 elements for each one of
|||||| four struct types, the allocated memory in java is around 124
|||||| MB. The times used to run the test are
||||||
||||||   C++ 109 ms (including initialization)
||||||   D   400 ms
||||||   Java 1050 ms (objects allocated one by one since there is not
||||||                   bulk allocation)
||||||
||||||   Is this a good result?, I expect D times will be slightly over
|||||| C++ ones but 4x seems too much.
||||||
||||||   The code is available for anyone who wants to review it, but
|||||| it is
||
|| a
||
|||||| very simple one, only struct definition and array allocation.
||||||
|||||| Regards
|||||| Blas Rodriguez Somoza
|||||
|||||
|||||
|||| I make some more tests and I found some interesting results.
||||
|||| Instead of allocating a element 800000 array, I try allocating
|||| A.- in one array
|||| B.- in 8 arrays (100000)
|||| C.- in 13 arrays (12*64000 + 32000)
||||
|||| The resulting times (A/B/C) are:
||||
|||| C++  - 171 / 171 / 171  ms
|||| D    - 344 / 593 / 484  ms (arrays of pointers as **)
||||      - 547 / 1078/ 1000 ms (arrays of pointers as *[])
|||| Java - 1703 ms (only tested in one array)
||||
|||| It seems C++ gives a constant performance for any array size
|||| whether D performs worse with more arrays and apparently with no
|||| power of 2 sized arrays
||||
|||| Hope it helps.
||||
|||| Regards
|||| Blas Rodriguez Somoza

February 01, 2004

Re: classes, structs and allocation

Posted by Sean L. Palmer
in reply to Burton Radons

Sean L. Palmer

Posted in reply to Burton Radons

Why, how are we to know he didn't just write really bad test code, or do something badly wrong in the profiling?  Are we supposed to all go out and write our own profile code just to make sure his assertion is valid?  We don't all have such copious amounts of spare time.

Sean

Burton Radons wrote:
| Blas Rodriguez Somoza wrote:
|
|| Until now it seems nobody excluding Jon give any advice about the
|| results or argue the results are wrong with reasons.
|
| I agree, you're being mistreated.  Your tests show a significant
| difference.  It's like whining for a recount when the vote shows 80%
| to 20%.

February 02, 2004

Re: classes, structs and allocation

Posted by Manfred Nowak
in reply to Matthew

Manfred Nowak

Posted in reply to Matthew

Matthew wrote:

[...]
> If you're interested in measuring the total app time, then consider using the ptime utility - from http://synesis.com.au/r_systools.html
[...]

Tried it on WIN98SE:
<protocol>
$ time awka_out

real    0m4.120s
user    0m0.000s
sys     0m0.000s

$ ptime awka_out
time: elapsed: 1841381146707466ms, kernel: 0ms, user: 0ms
</protocol>

Obviously wrong.

So long.

February 02, 2004

Re: classes, structs and allocation

Posted by Matthew
in reply to Manfred Nowak

Matthew

Posted in reply to Manfred Nowak

he he. That's a feature cleverly disguised as a bug.

Win98 does not support the GetThreadTimes() and GetProcessTimes() functions.

Clearly I need to make the program handle this gracefully.

Thanks for the heads-up.

Cheers

Matthew


"Manfred Nowak" <svv1999@hotmail.com> wrote in message news:bvk5m7$1iru$1@digitaldaemon.com...
> Matthew wrote:
>
> [...]
> > If you're interested in measuring the total app time, then consider using the ptime utility - from http://synesis.com.au/r_systools.html
> [...]
>
> Tried it on WIN98SE:
> <protocol>
> $ time awka_out
>
> real    0m4.120s
> user    0m0.000s
> sys     0m0.000s
>
> $ ptime awka_out
> time: elapsed: 1841381146707466ms, kernel: 0ms, user: 0ms
> </protocol>
>
> Obviously wrong.
>
> So long.

February 02, 2004

Re: classes, structs and allocation

Posted by Blas Rodriguez Somoza
in reply to Sean L. Palmer

Blas Rodriguez Somoza

Posted in reply to Sean L. Palmer


Sean L. Palmer wrote:
> My advice is the same as Matthew's, to post the code that is doing the
> benchmarkings or at least be very detailed about the specific techniques
> used to do the measurements.
> 
> Until that happens, we can't comment on the "results" that were posted, as
> they are meaningless statistics by themselves.
> 
> Sean
> 

The code is included in my second message in this thread.

> 
> Blas Rodriguez Somoza wrote:
> | Sean L. Palmer wrote:
> |
> || Good advice, Matthew.
> ||
> || Benchmarking is a very complicated endeavor, fraught with pitfalls
> || to trap the unwary into thinking they have some results, when
> || actually they've been measuring something completely different such
> || as how the order of running the tests affects the memory cache.
> || Profiling (finding the slow parts of an app) has many similar, but
> || different, issues.
> ||
> || The bottom line is, if you don't know how to profile or benchmark,
> || don't advertise that you do;  it's just misleading to yourself and
> || others.
> ||
> || Sean
> ||
> |
> | Until now it seems nobody excluding Jon give any advice about the
> | results or argue the results are wrong with reasons.
> |
> | Perhaps I have some strange habbits but when I post this question and
> | data I expect some advice and not a flame war.
> |
> | At least in other projects in the net this is the usual way to work
> | when someone make a question and give code and numbers about it.
> |
> | I know how to profile/benchmark and I did it in several os and
> | languages in my previous 22 years working in this business. (F.I
> | after my work in the Firebird jdbc driver it performs some others of
> | magnitude better).
> |
> |
> || "Matthew" <matthew.hat@stlsoft.dot.org> wrote in message
> || news:bvg4kv$114g$1@digitaldaemon.com...
> ||
> ||| I'll have more to say when I've had a chance to dig into the code,
> ||| but the first thing I can say is that you need to make a choice as
> ||| to whether to measure (i) the cost of an app such as the one you
> ||| have, including startup and shutdown costs, and represent the
> ||| results as such, rather than as a comparison of memory allocation
> ||| times, or (ii) you need to ensure that you do "warm-up" loops so
> ||| that you're measuring the memory infrastructures as they are likely
> ||| to behave in a real system someway through its lifetime, rather
> ||| than just the performance of a newly initiated application.
> |||
> ||| Other things:
> |||
> ||| You appear to measure the elapsed time for a single execution with
> ||| the program. Do you then do several runs and take averages? Do you
> ||| discard a proportion of the lowest and highest times?
> |||
> ||| If you're interested in measuring the total app time, then consider
> ||| using the ptime utility - from
> ||| http://synesis.com.au/r_systools.html - to
> ||
> || control
> ||
> ||| execution, elide extreme cases and calculate averages.
> |||
> ||| If you're interested purely in the memory time, then you should
> ||| include warmups in the app, and calculate averages of the resultant
> ||| normal-time loops.
> |||
> ||| If you don't take these measures, then how do we know whether the
> ||| memory times you report reflect the fundamentals of the languages'
> ||| memory allocation, or the OS virtual memory? They may even be
> ||| reflective of the order in which you ran your test programs.
> |||
> ||| Of course, you may have taken some/all of these measures, and just
> ||| not mentioned it in your post. If that's the case, can you let us
> ||| know how it was done?
> |||
> ||| Cheers
> |||
> ||| Matthew
> |||
> |||
> ||| "Blas Rodriguez Somoza" <blas@puertareal.com> wrote in message
> ||| news:bvcl8a$19hv$1@digitaldaemon.com...
> |||
> |||| Matthew wrote:
> ||||
> ||||| Please post your code
> |||||
> ||||| "Blas Rodriguez Somoza" <blas@puertareal.com> wrote in message
> ||||| news:bvc49s$dej$1@digitaldaemon.com...
> |||||
> |||||
> |||||| Hello
> ||||||
> ||||||   Nice language, I think it will have a bright future if it
> |||||| performance   is near C++.
> ||||||
> ||||||   I'm new to D and I'm trying to benchmark a small application (a
> ||
> || GLR
> ||
> |||||| parser) with C++, Java and D.
> ||||||
> ||||||   First at all a question, it is possible to use bulk class
> |||||| allocation?. If I'm not wrong MyClass[] mc = new MyClass[num]
> ||
> || allocates
> ||
> |||||| (in C terms) an array of pointers to classes instead an array of
> |||
> ||| classes.
> |||
> ||||||   Since the application expends half of the time allocating
> |||||| objects (in the java version) I create a test to compare object
> |||||| allocation performance between languages.
> ||||||
> |||||| The test allocates 13 arrays with 64000 elements for each one of
> |||||| four struct types, the allocated memory in java is around 124
> |||||| MB. The times used to run the test are
> ||||||
> ||||||   C++ 109 ms (including initialization)
> ||||||   D   400 ms
> ||||||   Java 1050 ms (objects allocated one by one since there is not
> ||||||                   bulk allocation)
> ||||||
> ||||||   Is this a good result?, I expect D times will be slightly over
> |||||| C++ ones but 4x seems too much.
> ||||||
> ||||||   The code is available for anyone who wants to review it, but
> |||||| it is
> ||
> || a
> ||
> |||||| very simple one, only struct definition and array allocation.
> ||||||
> |||||| Regards
> |||||| Blas Rodriguez Somoza
> |||||
> |||||
> |||||
> |||| I make some more tests and I found some interesting results.
> ||||
> |||| Instead of allocating a element 800000 array, I try allocating
> |||| A.- in one array
> |||| B.- in 8 arrays (100000)
> |||| C.- in 13 arrays (12*64000 + 32000)
> ||||
> |||| The resulting times (A/B/C) are:
> ||||
> |||| C++  - 171 / 171 / 171  ms
> |||| D    - 344 / 593 / 484  ms (arrays of pointers as **)
> ||||      - 547 / 1078/ 1000 ms (arrays of pointers as *[])
> |||| Java - 1703 ms (only tested in one array)
> ||||
> |||| It seems C++ gives a constant performance for any array size
> |||| whether D performs worse with more arrays and apparently with no
> |||| power of 2 sized arrays
> ||||
> |||| Hope it helps.
> ||||
> |||| Regards
> |||| Blas Rodriguez Somoza
> 
>

February 02, 2004

Re: classes, structs and allocation

Posted by Blas Rodriguez Somoza
in reply to Burton Radons

Blas Rodriguez Somoza

Posted in reply to Burton Radons

Hello Burtons

Burton Radons wrote:

> Blas Rodriguez Somoza wrote:
> 
> [snip]
> 
>> Until now it seems nobody excluding Jon give any advice about the results or argue the results are wrong with reasons.
>>
>> Perhaps I have some strange habbits but when I post this question and data I expect some advice and not a flame war.
> 
> 
> I agree, you're being mistreated.  Your tests show a significant difference.  It's like whining for a recount when the vote shows 80% to 20%.
> 

Thanks

> I would expect that it is mostly in the garbage collector.  When making an allocation, if it runs out of memory it has allocated it will run a full garbage collection (the relevant code is in "/dmd/src/phobos/internal/gc/gcx.d").  I believe you said before that you were getting exponential increases in time as you added more allocations, and that would account for that.  I don't know much about the garbage collector so I can't speculate any further.
> 
> This could also partially be due to the contracts in Phobos, which often have heavy redundancy, such as how array.dup uses a memcmp to ensure that the result is correct.  I don't know if there are burdensome contracts involved with allocation.  One way to be sure this is not a factor is to recompile Phobos with the "-release" compiler switch provided, as well as your own code.
> 

I run the test again taking those ideas into account.

I disable the gc but it seems that don't change the results.

Rebuilding phobos seems to help a bit but the change is below 10%.

Due to the limited timer resolution I can't measure exactly the change.


> The Java performance is actually really impressive.  I doubt DMD would be able to compare with it if it performed the same, rather than similar, operations.  In fact, I suspect Java might even out-perform your C++ compiler in a same-operation comparison.
> 
> [snip]
> 

I did the test and C++ allocating objects one by one and with initialization takes 840 ms approximately < Java (1703 ms), but when Java is compiled to native (with excelsior jet) the time go down to 480.


Regards
Blas Rodriguez Somoza

February 06, 2004

Re: classes, structs and allocation

Posted by Manfred Nowak
in reply to Matthew

Manfred Nowak

Posted in reply to Matthew

"Matthew" wrote
[...]
> Do you then do several runs and take averages? Do you discard a proportion of the lowest and highest times?

What mathematical model allows for deleting some of the real observations in taking execution times?

I know, that in social sciences they do exclude observations, when there is a strong believe, that the observation is introduced in the purpose of undermining the investigation.

But in execution timing there could not be such believe.

Moreover, in D there is a garbage collector. And this piece of code might decide to spring into action just before the "productive" code of the program wants to end, thereby introducing a prolonged execution time.

Excluding this observation is then "beautifying" the real outcome of the timing result.

If the usage of memory is at a critical point, then it might be, that in 50% of the timings the gc runs and in the other 50% it does not run, thereby giving you a polarized result. What is the meaning of an average then?

Because the timings should follow a Gaussian distribution shouldn't one take enough observations to pass the normality test and only then conclude, that there is an average?

Currently I do not know, wether there is a test for the assumption that a distribution is built up of two or even more independent Gaussian distributions.

If the normality test is failed, there still exist the possibilty to expose a span for the average by sorting and then deleting the upper and lower sixth of the observations. The minimum and maximum of the remaining set of observations are then lower and upper bounds for the average.

So long.

June 12, 2004

Re: classes, structs and allocation

Posted by Walter
in reply to Blas Rodriguez Somoza

Walter

Posted in reply to Blas Rodriguez Somoza

What you're seeing is the current D garbage collector allocator could use some performance tuning. Such will happen over time; right now the emphasis has been getting it correct and reliable. There's no inherent reason why it should be any slower than C++. Note that the Java vendors have been likely tuning the Java gc for 10 years now.

Secondly, C's malloc() is available to all D programs. You can always attain
exactly the allocation performance of C by using std.c.stdlib.malloc().

"Blas Rodriguez Somoza" <blas@puertareal.com> wrote in message news:bvc49s$dej$1@digitaldaemon.com...
> Hello
>
>     Nice language, I think it will have a bright future if it
> performance   is near C++.
>
>     I'm new to D and I'm trying to benchmark a small application (a GLR
> parser) with C++, Java and D.
>
>     First at all a question, it is possible to use bulk class
> allocation?. If I'm not wrong MyClass[] mc = new MyClass[num] allocates
> (in C terms) an array of pointers to classes instead an array of classes.
>
>     Since the application expends half of the time allocating objects
> (in the java version) I create a test to compare object allocation
> performance between languages.
>
> The test allocates 13 arrays with 64000 elements for each one of four struct types, the allocated memory in java is around 124 MB. The times used to run the test are
>
>     C++ 109 ms (including initialization)
>     D   400 ms
>     Java 1050 ms (objects allocated one by one since there is not bulk
>                     allocation)
>
>     Is this a good result?, I expect D times will be slightly over C++
> ones but 4x seems too much.
>
>     The code is available for anyone who wants to review it, but it is a
> very simple one, only struct definition and array allocation.
>
> Regards
> Blas Rodriguez Somoza

June 12, 2004

Re: classes, structs and allocation

Posted by The Dr ... who?
in reply to Walter

The Dr ... who?

Posted in reply to Walter

Exactly. Get it right, then get it fast. As long as there aren't theoretical flaws in the design of the GC architecture, it seems premature to be worrying about its performance at the moment.

"Walter" <newshound@digitalmars.com> wrote in message news:caegbn$1j8g$1@digitaldaemon.com...
> What you're seeing is the current D garbage collector allocator could use some performance tuning. Such will happen over time; right now the emphasis has been getting it correct and reliable. There's no inherent reason why it should be any slower than C++. Note that the Java vendors have been likely tuning the Java gc for 10 years now.
>
> Secondly, C's malloc() is available to all D programs. You can always attain
> exactly the allocation performance of C by using std.c.stdlib.malloc().
>
> "Blas Rodriguez Somoza" <blas@puertareal.com> wrote in message news:bvc49s$dej$1@digitaldaemon.com...
> > Hello
> >
> >     Nice language, I think it will have a bright future if it
> > performance   is near C++.
> >
> >     I'm new to D and I'm trying to benchmark a small application (a GLR
> > parser) with C++, Java and D.
> >
> >     First at all a question, it is possible to use bulk class
> > allocation?. If I'm not wrong MyClass[] mc = new MyClass[num] allocates
> > (in C terms) an array of pointers to classes instead an array of classes.
> >
> >     Since the application expends half of the time allocating objects
> > (in the java version) I create a test to compare object allocation
> > performance between languages.
> >
> > The test allocates 13 arrays with 64000 elements for each one of four struct types, the allocated memory in java is around 124 MB. The times used to run the test are
> >
> >     C++ 109 ms (including initialization)
> >     D   400 ms
> >     Java 1050 ms (objects allocated one by one since there is not bulk
> >                     allocation)
> >
> >     Is this a good result?, I expect D times will be slightly over C++
> > ones but 4x seems too much.
> >
> >     The code is available for anyone who wants to review it, but it is a
> > very simple one, only struct definition and array allocation.
> >
> > Regards
> > Blas Rodriguez Somoza
>
>

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation