October 12, 2008
Christopher Wright wrote:
> Jarrett Billingsley wrote:
>> On Sun, Oct 12, 2008 at 4:48 AM, Walter Bright
>> <newshound1@digitalmars.com> wrote:
>>> Bill Baxter wrote:
>>>> Python's strategy of introducing new features is the best I've seen by
>>>> far.
>>>> And with that they have made big changes in the language without ever
>>>> creating any major split in the user base, as far as I can tell.
>>> Most of the split in the user base is a result of Tango being D1 only (which
>>> I hope will be resolved shortly).
>>
>> Then address the issues of stack vs. heap delegates.
> 
> That is not required for Tango to function properly with D2. It's a performance optimization only. A significant one, but no more.

That's not entirely true.

Tango makes some promises about heap allocations, suddenly the code has to be rewritten for these still to hold true. That's more than just an optimization.

IIRC there's also other issues (with const). Might be wrong here, I have little interest in the whole D2 thing, a D1.5 variant is much more appealing to me.
October 12, 2008
Frank Benoit wrote:
> Christopher Wright schrieb:
>> Jarrett Billingsley wrote:
>>> Then address the issues of stack vs. heap delegates.
>> That is not required for Tango to function properly with D2. It's a
>> performance optimization only. A significant one, but no more.
> 
> I disagree here.
> If you use lambdas for filters in inner loops, that is a main
> performance hit. It is more than just "optimization", it is a reason to
> change the design. It is a reason not to use D2. It is a D2 show-stopper.
> Even worse, no error message is showing you the problematic places, D2
> just makes your heap explode.
> 
> int findMyTypeXYZ( MyType[] a, int criteria ){
> 	int match = convCrit( criteria );
> 	return tango.core.Array.findIf( a, delegate bool(T e){ return e.attrib
> is match; });
> }
> 
> First this run without allocation, now it does allocate.
> Image how this behave if called very often :(
> And even worse, there is no way to manually delete the allocated stack
> frames. So there is no way around the GC runs.
> Finally I would end up in rewriting that code, not using a delegate
> based design. But IMHO this is one of the important D features.

Well, how much of a performance hit is it?

If the average application will run half as fast, then this is a significant issue that should be addressed soon. But it will not prevent me from using Tango with D2 in the interim.

If it's a 100x performance hit, then I wouldn't expect a port of Tango yet. But I suspect that it's more like a 10-15% performance hit at most. (I am talking about a typical application. I'm sure you could get a benchmark that runs much slower by choosing library calls that make extensive use of delegates and using those constantly.) I don't know; nobody has given any ballpark figures, much less benchmarks.

I'm going to try porting some of my code to D2/Tango. I'll post the results.
October 12, 2008
Christopher Wright schrieb:
> Frank Benoit wrote:
>> Christopher Wright schrieb:
>>> Jarrett Billingsley wrote:
>>>> Then address the issues of stack vs. heap delegates.
>>> That is not required for Tango to function properly with D2. It's a performance optimization only. A significant one, but no more.
>>
>> I disagree here.
>> If you use lambdas for filters in inner loops, that is a main
>> performance hit. It is more than just "optimization", it is a reason to
>> change the design. It is a reason not to use D2. It is a D2 show-stopper.
>> Even worse, no error message is showing you the problematic places, D2
>> just makes your heap explode.
>>
>> int findMyTypeXYZ( MyType[] a, int criteria ){
>>     int match = convCrit( criteria );
>>     return tango.core.Array.findIf( a, delegate bool(T e){ return
>> e.attrib
>> is match; });
>> }
>>
>> First this run without allocation, now it does allocate.
>> Image how this behave if called very often :(
>> And even worse, there is no way to manually delete the allocated stack
>> frames. So there is no way around the GC runs.
>> Finally I would end up in rewriting that code, not using a delegate
>> based design. But IMHO this is one of the important D features.
> 
> Well, how much of a performance hit is it?
> 
> If the average application will run half as fast, then this is a significant issue that should be addressed soon. But it will not prevent me from using Tango with D2 in the interim.
> 
> If it's a 100x performance hit, then I wouldn't expect a port of Tango yet. But I suspect that it's more like a 10-15% performance hit at most. (I am talking about a typical application. I'm sure you could get a benchmark that runs much slower by choosing library calls that make extensive use of delegates and using those constantly.) I don't know; nobody has given any ballpark figures, much less benchmarks.
> 
> I'm going to try porting some of my code to D2/Tango. I'll post the results.

How can you guess anything here?

Old behavior == just the direct call, no extra cost
New behavior == same direct call plus a heap allocation
                with unbound cost

There is nothing like a typical application. There is nothing like a typical calling rate to full closures. Either you use them or you don't. It depends on your code. So even if you port some code and do some tests, there is no significant result for my code.

There _can_ be 100x performance hit. Code that was allocation free is now no more. That is super bad.
October 12, 2008
Frank Benoit wrote:
> How can you guess anything here?
>
> Old behavior == just the direct call, no extra cost
> New behavior == same direct call plus a heap allocation
>                 with unbound cost

There are plenty of randomized algorithms with potentially unbounded cost that are still used. In practice, potentially unbounded things work in a reasonable amount of time or get replaced.

The major reason that a heap allocation incurs a potentially unbounded cost is that allocation can trigger a collection. And a collection is potentially unbounded in duration because of user-defined destructors. If you write no destructors, then the cost of garbage collection is bounded by the amount of allocated memory. If your destructors complete in constant time, same thing.

It's like saying "The cost of invoking a method is unbounded." Technically, this is true; but if you're trying to benchmark the runtime or the language, that truth isn't the one in which you are interested.

> There is nothing like a typical application.

This argument is a bit solipsist, and not remotely useful. You simply lack the appropriate data to determine which library types and functions are used with what frequency.

> There is nothing like a
> typical calling rate to full closures. Either you use them or you don't.

Again, you simply lack sufficient data. If an application uses closures, it might create one every minute or one every second or a thousand per second, on average. You can track this.

> It depends on your code. So even if you port some code and do some
> tests, there is no significant result for my code.
>
> There _can_ be 100x performance hit. Code that was allocation free is
> now no more. That is super bad.

Well then, how about a microbenchmark?

I tried creating a delegate in a tight loop and executing it -- 10 million iterations took 0.506 seconds using d1. Using closures in d2, it took 3.958 seconds. This is doing nothing but allocating and using closures.

Using more local variables would result in more allocation, but with 5 integers instead of 1, the time only increased to 5.484 seconds. That's still 1.8 million delegate allocations per second. (If you have a very large struct on the stack, though, this cost can increase significantly, but that's a potential performance issue in itself.)

Considering these results, if overzealous heap activity due to delegates is the main issue you have to worry about, your application must use a very large number of closures.

While this does violate Tango's contracts about heap allocation, once scope delegates are available, it won't take much work to replace "delegate" with "scope delegate" everywhere in Tango.


The benchmarks:
---
// d1
int count = 0;

void foo (void delegate(int) dg)
{
    dg (1);
    dg (2);
}

void main ()
{
    uint max = 10_000_000;
    for (int j = 0; j < max; j++)
    {
        foo (makeDelegate);
    }
}

int x = 2;
void delegate(int) makeDelegate ()
{
    return (int i)
        {
            count += i;
            count += x;
        };
}
---

---
// d2
int count = 0;

void foo (void delegate(int) dg)
{
    dg (1);
    dg (2);
}

void main ()
{
    uint max = 10_000_000;
    for (int j = 0; j < max; j++)
    {
        foo (makeDelegate);
    }
}

void delegate(int) makeDelegate ()
{
    int x = 2;
/* // Uncomment to see the effects of additional local variables.
    int a = 2 * x;
    int b = 2 + a;
    int c = (2 - 6 * a) + b;
    int d = x * (3 + c);
*/
    return (int i)
        {
            count += i;
            count += x;
/*
            count += a;
            count += b;
            count += c;
            count += d;
*/
        };
}
---

The test machine:
Dell Optiplex GX110
Processor: Pentium III clocked at 1GHz (997.473 MHz according to /proc/cpuinfo)
Memory: 128MB; unknown type
October 12, 2008
On a Core Duo, 2 GHz, lot of RAM:

Timigs, best of 3:
  n = 10_000_000:
    D1: 0.14 s  (5.9 X)
    D2: 0.83 s  (1.0 X)
  n = 100_000_000:
    D1: 1.18 s  (6.8 X)  (~1.0 MB RAM)
    D2: 8.04 s  (1.0 X)  (~2.0 MB RAM)

Exe sizes:
  D1: 167.964 bytes
  D2: 165.404 bytes

Bye,
bearophile
October 12, 2008
Christopher Wright schrieb:
> Frank Benoit wrote:
>> How can you guess anything here?
>> 
>> Old behavior == just the direct call, no extra cost New behavior ==
>>  same direct call plus a heap allocation with unbound cost
> 
> There are plenty of randomized algorithms with potentially unbounded cost that are still used. In practice, potentially unbounded things work in a reasonable amount of time or get replaced.
> 

You said it is just a thing of optimization.
That would be true if the runtime would have been increased by a defined
value, like double runtime. But it has changed from deterministic
behavior to undeterministic behavior.
=> that means you can no longer use that in code that is made for high
data rate or low latency. And optimizations will not change that.

There is also no sense in doing benchmarks, because you are
measuring the GC performance, nothing more. And I already know, that I
need the guarantee for no heap allocation.

My point is, the delegate are a D1 tool that can be used in high performance code. Shortly after the full closure feature was announced (Nov 07), it was shown that this brakes D1 runtime behavior. Still this is not fixed.

> While this does violate Tango's contracts about heap allocation, once
>  scope delegates are available, it won't take much work to replace
> "delegate" with "scope delegate" everywhere in Tango.

I understand this as a confirmation, that with actual D2 you cannot port
tango and expect it to work as it does with D1.
Even an optimization cannot help. IMO the only two possible solutions are
1. Redesign the code -or-
2. Wait for scoped delegate





October 12, 2008
On Sun, 12 Oct 2008 23:32:15 +0400, Frank Benoit <keinfarbton@googlemail.com> wrote:

> Christopher Wright schrieb:
>> Frank Benoit wrote:
>>> How can you guess anything here?
>>>
>>> Old behavior == just the direct call, no extra cost New behavior ==
>>>  same direct call plus a heap allocation with unbound cost
>>
>> There are plenty of randomized algorithms with potentially unbounded
>> cost that are still used. In practice, potentially unbounded things
>> work in a reasonable amount of time or get replaced.
>>
>
> You said it is just a thing of optimization.
> That would be true if the runtime would have been increased by a defined
> value, like double runtime. But it has changed from deterministic
> behavior to undeterministic behavior.
> => that means you can no longer use that in code that is made for high
> data rate or low latency. And optimizations will not change that.
>
> There is also no sense in doing benchmarks, because you are
> measuring the GC performance, nothing more. And I already know, that I
> need the guarantee for no heap allocation.
>
> My point is, the delegate are a D1 tool that can be used in high
> performance code. Shortly after the full closure feature was announced
> (Nov 07), it was shown that this brakes D1 runtime behavior. Still this
> is not fixed.
>
>> While this does violate Tango's contracts about heap allocation, once
>>  scope delegates are available, it won't take much work to replace
>> "delegate" with "scope delegate" everywhere in Tango.
>
> I understand this as a confirmation, that with actual D2 you cannot port
> tango and expect it to work as it does with D1.
> Even an optimization cannot help. IMO the only two possible solutions are
> 1. Redesign the code -or-
> 2. Wait for scoped delegate
>

Scoped delegates hopefully will be implemented some day, so I'd stick with #2 in a long run.
October 12, 2008
Frank Benoit wrote:
> Christopher Wright schrieb:
>> Frank Benoit wrote:
>>> How can you guess anything here?
>>>
>>> Old behavior == just the direct call, no extra cost New behavior ==
>>>  same direct call plus a heap allocation with unbound cost
>> There are plenty of randomized algorithms with potentially unbounded cost that are still used. In practice, potentially unbounded things work in a reasonable amount of time or get replaced.
>>
> 
> You said it is just a thing of optimization.
> That would be true if the runtime would have been increased by a defined
> value, like double runtime. But it has changed from deterministic
> behavior to undeterministic behavior.
> => that means you can no longer use that in code that is made for high
> data rate or low latency. And optimizations will not change that.
> 
> There is also no sense in doing benchmarks, because you are
> measuring the GC performance, nothing more. And I already know, that I
> need the guarantee for no heap allocation.
> 
> My point is, the delegate are a D1 tool that can be used in high
> performance code. Shortly after the full closure feature was announced
> (Nov 07), it was shown that this brakes D1 runtime behavior. Still this
> is not fixed.
> 
>> While this does violate Tango's contracts about heap allocation, once
>>  scope delegates are available, it won't take much work to replace "delegate" with "scope delegate" everywhere in Tango.
> 
> I understand this as a confirmation, that with actual D2 you cannot port
> tango and expect it to work as it does with D1.
> Even an optimization cannot help. IMO the only two possible solutions are
> 1. Redesign the code -or-
> 2. Wait for scoped delegate

Okay, and for the 98% of people who don't mind having some extra heap allocation, we have to wait.

Though this isn't terribly relevant -- I think that, if D2 were less of a moving target, Tango would already have been ported and released for D2.
October 14, 2008
Walter Bright, el 11 de octubre a las 02:47 me escribiste:
> Bill Baxter wrote:
> > I think there was some hope that making a really stable D1.0 would somehow make D1.0 an attractive choice for companies.  But come on. It was a stretch when D1 was just a niche language.  Now it's a niche language that's also obsolete.
> 
> People made it clear they were not going to use a language for production if it got new features every month. D 1.0 is a complete and very usable language, with the goal of being stable and bug free.
> 
> You can't have it both ways - you cannot have a stable language while constantly adding features.

Yes you can. Please, take a look to Python development model =)

-- 
Leandro Lucarella (luca) | Blog colectivo: http://www.mazziblog.com.ar/blog/
----------------------------------------------------------------------------
GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145  104C 949E BFB6 5F5A 8D05)
----------------------------------------------------------------------------
Es más probable que el tomate sea perita, a que la pera tomatito.
	-- Peperino Pómoro
1 2 3
Next ›   Last »