December 08, 2006
Walter Bright wrote:
> zz wrote:
> 
>>  From my point of view these tests are not really nessesary on my side since I still continue using D and I belive that someday the memory stuff will be optimized.
> 
> 
> One thing that is happening is that the C++ code allocates memory, it never frees it. The same with the D code.

Not until it's out of scope (in this case main).

> What happens with a garbage collector is it gets new chunks of memory from the operating system as it needs it. But before it does, it runs a collection cycle.

Would something like dlmalloc by Doug Lea make a difference in D, someone suggested trying it out so I tested it using DMC on some C code and there was a big difference the two compiled versions (Numbers are below and I was supprised).

> So in the D version, it will be running probably several collection cycles (accomplishing nothing), while the C++ version does not. This will make the gc version slower.
> 
> To get better numbers, one can add calls to gc.disable() and gc.enable() to tell the gc that there is no point to running a collection cycle for this section of code.

The above is something that, i'll not do (In DMC i can get better numbers by just changing my allocator how can I go about it in D).

Zz

-----------------------------------------------
- Default memory allocator
dmc -o+all test.c
Capacity C: 1048576  Count: 1000000

ContextSwitches - 19954
First level fills = 0
Second level fills = 0

ETime(   0:00:06.562 ) UTime(   0:00:06.015 ) KTime(   0:00:00.375 )
ITime(   0:00:03.828 )


-----------------------------------------------
dmc -DREPLACE_SYSTEM_ALLOCATOR -o+all test.c
Capacity C: 1048576  Count: 1000000

ContextSwitches - 3092
First level fills = 0
Second level fills = 0

ETime(   0:00:01.250 ) UTime(   0:00:00.968 ) KTime(   0:00:00.250 )
ITime(   0:00:00.796 )

Note: I used dlmalloc Version pre-2.8.4 Wed Aug  2 14:13:56 2006 which comes with nedmalloc.
The current version is 2.8.3 and ptmalloc3 comes with an earlier version of the one used above.
December 08, 2006
zz wrote:
> Walter Bright wrote:
>> zz wrote:
>>
>>>  From my point of view these tests are not really nessesary on my side since I still continue using D and I belive that someday the memory stuff will be optimized.
>>
>>
>> One thing that is happening is that the C++ code allocates memory, it never frees it. The same with the D code.
> 
> Not until it's out of scope (in this case main).
> 

With that in mind, I wrapped your original code in a loop for 10 iterations and decreased the inner loop to 100000.

C++ (NedAlloc):
Total Element count: 1000000
5.438000

D:
Total Element count: 1000000
1.360000

Because it's not doing all of it's allocation in one tight loop, maybe this better 'emulates' a 'typical' service or client application? Point is, have you and your coworker from the OP actually tried D in a couple of real-world application examples?

Quote: "While he liked that language and said the he might actually use it to prototype idea's, he will not use it in production code due to the performance."

That sounds like me two years ago (and D hasn't gotten all that much faster since then) <g>

Don't take this as some sort of attack -- you bring up some very good points. And I agree that the GC performance should be looked at (and that great performance is critical), but I'm wondering if D is getting a fair shake in your shop?

Even if you had to spend 10% development time to hoist some allocations out of loops with D, if you save 20% developing the first-cut you'll end up better off ;)

>> What happens with a garbage collector is it gets new chunks of memory from the operating system as it needs it. But before it does, it runs a collection cycle.
> 
> Would something like dlmalloc by Doug Lea make a difference in D, someone suggested trying it out so I tested it using DMC on some C code and there was a big difference the two compiled versions (Numbers are below and I was supprised).
> 
>> So in the D version, it will be running probably several collection cycles (accomplishing nothing), while the C++ version does not. This will make the gc version slower.
>>
>> To get better numbers, one can add calls to gc.disable() and gc.enable() to tell the gc that there is no point to running a collection cycle for this section of code.
> 
> The above is something that, i'll not do (In DMC i can get better numbers by just changing my allocator how can I go about it in D).
> 
> Zz
> 
> -----------------------------------------------
> - Default memory allocator
> dmc -o+all test.c
> Capacity C: 1048576  Count: 1000000
> 
> ContextSwitches - 19954
> First level fills = 0
> Second level fills = 0
> 
> ETime(   0:00:06.562 ) UTime(   0:00:06.015 ) KTime(   0:00:00.375 )
> ITime(   0:00:03.828 )
> 
> 
> -----------------------------------------------
> dmc -DREPLACE_SYSTEM_ALLOCATOR -o+all test.c
> Capacity C: 1048576  Count: 1000000
> 
> ContextSwitches - 3092
> First level fills = 0
> Second level fills = 0
> 
> ETime(   0:00:01.250 ) UTime(   0:00:00.968 ) KTime(   0:00:00.250 )
> ITime(   0:00:00.796 )
> 
> Note: I used dlmalloc Version pre-2.8.4 Wed Aug  2 14:13:56 2006 which comes with nedmalloc.
> The current version is 2.8.3 and ptmalloc3 comes with an earlier version of the one used above.
December 08, 2006
zz wrote:
> Walter Bright wrote:
>> zz wrote:
>>
>>>  From my point of view these tests are not really nessesary on my side since I still continue using D and I belive that someday the memory stuff will be optimized.
>>
>>
>> One thing that is happening is that the C++ code allocates memory, it never frees it. The same with the D code.
> 
> Not until it's out of scope (in this case main).

True, but it doesn't free any memory inside the loop.

>> What happens with a garbage collector is it gets new chunks of memory from the operating system as it needs it. But before it does, it runs a collection cycle.
> 
> Would something like dlmalloc by Doug Lea make a difference in D, someone suggested trying it out so I tested it using DMC on some C code and there was a big difference the two compiled versions (Numbers are below and I was supprised).

You can call any C function from D, so if you want to explicitly manage memory using dlmalloc, that is certainly possible. D allows overriding new/delete on a per class/per struct basis, but such won't be garbage collected if you do so.

>> So in the D version, it will be running probably several collection cycles (accomplishing nothing), while the C++ version does not. This will make the gc version slower.
>>
>> To get better numbers, one can add calls to gc.disable() and gc.enable() to tell the gc that there is no point to running a collection cycle for this section of code.
> 
> The above is something that, i'll not do (In DMC i can get better numbers by just changing my allocator how can I go about it in D).

I'm not sure why you wouldn't want to do it, it is much more localized in effect than swapping out global operators new/delete.

The gc in D is pluggable, but nobody has written a different one to plug  in yet.
December 08, 2006
Walter Bright wrote:
> zz wrote:
>>  From my point of view these tests are not really nessesary on my side since I still continue using D and I belive that someday the memory stuff will be optimized.
> 
> One thing that is happening is that the C++ code allocates memory, it never frees it. The same with the D code.
> 
> What happens with a garbage collector is it gets new chunks of memory from the operating system as it needs it. But before it does, it runs a collection cycle.
> 
> So in the D version, it will be running probably several collection cycles (accomplishing nothing), while the C++ version does not. This will make the gc version slower.
> 
> To get better numbers, one can add calls to gc.disable() and gc.enable() to tell the gc that there is no point to running a collection cycle for this section of code.

Has anyone looked into analysis of D code to automatically insert frees/deletes in pertinent places, so that GC is required less often? The good thing about allowing this sort of analysis is that no harm is done if you don't find anything (since GC will still guarantee it is eventually collected), but it could yield benefits in certain places where deterministic deletion is possible.

Cheers,

Reiner
December 08, 2006
Dave wrote:
> zz wrote:
> 
>> Walter Bright wrote:
>>
>>> zz wrote:
>>>
>>>>  From my point of view these tests are not really nessesary on my side since I still continue using D and I belive that someday the memory stuff will be optimized.
>>>
>>>
>>>
>>> One thing that is happening is that the C++ code allocates memory, it never frees it. The same with the D code.
>>
>>
>> Not until it's out of scope (in this case main).
>>
> 
> With that in mind, I wrapped your original code in a loop for 10 iterations and decreased the inner loop to 100000.
> 
> C++ (NedAlloc):
> Total Element count: 1000000
> 5.438000
> 
> D:
> Total Element count: 1000000
> 1.360000
> 
Cant' argue with that:
In this case D is faster I get:
C++ NedAlloc = 2.859
D = 1.421

But if you leave the original count and put in the outer loop for 10, D's performance becomes really bad.

c++ (NedMalloc) = 00:26.796
D = 02:23.375

> Quote: "While he liked that language and said the he might actually use it to prototype idea's, he will not use it in production code due to the performance."
> 
> That sounds like me two years ago (and D hasn't gotten all that much faster since then) <g>

Sill uses D to prototype idea's, but this was someone else who is new.

> Don't take this as some sort of attack -- you bring up some very good points. And I agree that the GC performance should be looked at (and that great performance is critical), but I'm wondering if D is getting a fair shake in your shop?

On my side I've used it in small stuff that is running at clients sites, We handle a lot of data that comes from mainframes and AS/400 and it you need to process the text reports from those platforms a lot of string processing needs to be done, in one case I recall we needed to convert the data (over 1 gig of raw data) into XML and I was asked by my boss to see how we can get it done and who should write it, I one sitting I did a prototype in D and that was what was sent to the client (I don't think it could have been done faster), for this case and some others we never looked at performance since:
a) the client was happy just to get something.
b) most but not all large jobs involving processing are run overnight.

While it's only 2 of us who use D from time to time, there is a lot of respect for D at work but sometimes people put you to the test with 1 to 1 examples and from there they might make their desicions on whether to use D or not.

> Even if you had to spend 10% development time to hoist some allocations out of loops with D, if you save 20% developing the first-cut you'll end up better off ;)

Yes, but this was a 1 to 1 test more on the line my gun is bigger than yours.

Zz
December 08, 2006
zz wrote:
> Dave wrote:
>> zz wrote:
>>
>>> Walter Bright wrote:
>>>
>>>> zz wrote:
>>>>
>>>>>  From my point of view these tests are not really nessesary on my side since I still continue using D and I belive that someday the memory stuff will be optimized.
>>>>
>>>>
>>>>
>>>> One thing that is happening is that the C++ code allocates memory, it never frees it. The same with the D code.
>>>
>>>
>>> Not until it's out of scope (in this case main).
>>>
>>
>> With that in mind, I wrapped your original code in a loop for 10 iterations and decreased the inner loop to 100000.
>>
>> C++ (NedAlloc):
>> Total Element count: 1000000
>> 5.438000
>>
>> D:
>> Total Element count: 1000000
>> 1.360000
>>
> Cant' argue with that:
> In this case D is faster I get:
> C++ NedAlloc = 2.859
> D = 1.421
> 
> But if you leave the original count and put in the outer loop for 10, D's performance becomes really bad.
> 
> c++ (NedMalloc) = 00:26.796
> D = 02:23.375

Try explicitly calling _gc.fullCollect() between iterations of the inner loop, and disabling the GC explicitly there.  Manipulation of the GC for performance-critical areas is an important and intentional feature.


Sean
December 08, 2006
Sean Kelly wrote:
> zz wrote:
>> Dave wrote:
>>> zz wrote:
>>>
>>>> Walter Bright wrote:
>>>>
>>>>> zz wrote:
>>>>>
>>>>>>  From my point of view these tests are not really nessesary on my side since I still continue using D and I belive that someday the memory stuff will be optimized.
>>>>>
>>>>>
>>>>>
>>>>> One thing that is happening is that the C++ code allocates memory, it never frees it. The same with the D code.
>>>>
>>>>
>>>> Not until it's out of scope (in this case main).
>>>>
>>>
>>> With that in mind, I wrapped your original code in a loop for 10 iterations and decreased the inner loop to 100000.
>>>
>>> C++ (NedAlloc):
>>> Total Element count: 1000000
>>> 5.438000
>>>
>>> D:
>>> Total Element count: 1000000
>>> 1.360000
>>>
>> Cant' argue with that:
>> In this case D is faster I get:
>> C++ NedAlloc = 2.859
>> D = 1.421
>>
>> But if you leave the original count and put in the outer loop for 10, D's performance becomes really bad.
>>
>> c++ (NedMalloc) = 00:26.796
>> D = 02:23.375
> 
> Try explicitly calling _gc.fullCollect() between iterations of the inner loop, and disabling the GC explicitly there.  Manipulation of the GC for performance-critical areas is an important and intentional feature.
> 

With that in mind... What if some of the GC API were built-in to the language? This could be one of those areas that would set D apart as a lower-level, performance orientated language. If a GC implementation didn't support something, it would be stubbed for portability. Likewise anything not covered explicitly by a built-in could be covered by an import.

Specifically:

gcFullCollect()
gcGenCollect()
gcDisable()
gcEnable()

or some such.

?

> 
> Sean
December 08, 2006
Dave wrote:
> Sean Kelly wrote:
>> zz wrote:
>>> Dave wrote:
>>>> zz wrote:
>>>>
>>>>> Walter Bright wrote:
>>>>>
>>>>>> zz wrote:
>>>>>>
>>>>>>>  From my point of view these tests are not really nessesary on my side since I still continue using D and I belive that someday the memory stuff will be optimized.
>>>>>>
>>>>>>
>>>>>>
>>>>>> One thing that is happening is that the C++ code allocates memory, it never frees it. The same with the D code.
>>>>>
>>>>>
>>>>> Not until it's out of scope (in this case main).
>>>>>
>>>>
>>>> With that in mind, I wrapped your original code in a loop for 10 iterations and decreased the inner loop to 100000.
>>>>
>>>> C++ (NedAlloc):
>>>> Total Element count: 1000000
>>>> 5.438000
>>>>
>>>> D:
>>>> Total Element count: 1000000
>>>> 1.360000
>>>>
>>> Cant' argue with that:
>>> In this case D is faster I get:
>>> C++ NedAlloc = 2.859
>>> D = 1.421
>>>
>>> But if you leave the original count and put in the outer loop for 10, D's performance becomes really bad.
>>>
>>> c++ (NedMalloc) = 00:26.796
>>> D = 02:23.375
>>
>> Try explicitly calling _gc.fullCollect() between iterations of the inner loop, and disabling the GC explicitly there.  Manipulation of the GC for performance-critical areas is an important and intentional feature.
>>
> 
> With that in mind... What if some of the GC API were built-in to the language? This could be one of those areas that would set D apart as a lower-level, performance orientated language. If a GC implementation didn't support something, it would be stubbed for portability. Likewise anything not covered explicitly by a built-in could be covered by an import.
> 
> Specifically:
> 
> gcFullCollect()
> gcGenCollect()
> gcDisable()
> gcEnable()
> 
> or some such.

Phobos already has std.gc for exactly this purpose.  Or are you suggesting some of these features should be added as keywords, or perhaps automatically available, similar to the declaration of Object and Exception?


Sean
December 08, 2006
Sean Kelly wrote:
> Dave wrote:
>> Sean Kelly wrote:
>>> Try explicitly calling _gc.fullCollect() between iterations of the inner loop, and disabling the GC explicitly there.  Manipulation of the GC for performance-critical areas is an important and intentional feature.
>>>
>>
>> With that in mind... What if some of the GC API were built-in to the language? This could be one of those areas that would set D apart as a lower-level, performance orientated language. If a GC implementation didn't support something, it would be stubbed for portability. Likewise anything not covered explicitly by a built-in could be covered by an import.
>>
>> Specifically:
>>
>> gcFullCollect()
>> gcGenCollect()
>> gcDisable()
>> gcEnable()
>>
>> or some such.
> 
> Phobos already has std.gc for exactly this purpose.  Or are you suggesting some of these features should be added as keywords, or perhaps automatically available, similar to the declaration of Object and Exception?
> 

Yes, that way those optimization 'hints' would be codified into the language itself (like register in C or inline in C++).

The goal would be to lower the barrier as much as possible ('encourage the use of') and give D something codified that most other languages don't have.

You know, taking this one step further for convenience and safety could give us:

Replace:
>> gcDisable()
>> gcEnable()

with a built-in for:
    std.gc.disable;
    scope(exit) std.gc.enable;

usage:

void foo()
{
    scopeGcDisable;
    for(...){ ... }
}

> 
> Sean
December 10, 2006
Pragma wrote:
> Something I ran into that the group might enjoy:
> 
> From: http://www.techreview.com/InfoTech/17831/
> 
> "Bjarne Stroustrup, the inventor of the C++ programming language, defends his legacy and examines what's wrong with most software code."
> 
> As always, Slashdot has some colorful coverage on this:
> 
> http://it.slashdot.org/it/06/12/05/0045234.shtml
> 
> "MIT's Technology Review has a Q&A with C++ inventor Bjarne Stroustrup. Highlights include Bjarne's answers on the trade-offs involved in the design of C++, and how they apply today, and his thoughts on the solution to the problems. From the interview: 'Software developers have become adept at the difficult art of building reasonably reliable systems out of unreliable parts. The snag is that often we do not know exactly how we did it.'"
> 

Here is the follow up interview

http://www.techreview.com/InfoTech/17868/page1/

http://developers.slashdot.org/article.pl?sid=06/12/09/220218