May 10, 2014
Am 10.05.2014 08:27, schrieb Manu via Digitalmars-d:
> On 10 May 2014 07:05, Wyatt via Digitalmars-d
> <digitalmars-d@puremagic.com> wrote:
>> On Friday, 9 May 2014 at 16:12:00 UTC, Manu via Digitalmars-d wrote:
>> ...
> The only option I know that works is Obj-C's solution, as demonstrated
> by a very successful embedded RTOS, and compared to competition, runs
> silky smooth. Indeed iOS makes it a specific design goal that it
> should always feel silky smooth, never stuttery, they consider it a
> point of quality, and I completely agree. I don't know what other
> horse to back?
> ...

The problem when comparing iOS with Android, is that we aren't comparing ARC with GC.

We are comparing a full OS, which we don't know how much ARC is actually used versus standard malloc/new with another OS, which has a
so-and-so VM implementation, used mostly software rendering until version 4.1, and care for low end devices was only done in 4.4.

If we add Windows Phone to the mix, then we have a .NET stack like Android (WP7 - GC/JIT) or in its sucessor (WP8)  native code generation for .NET (GC) with a COM model for OS APIs (ARC).

Both versions of Windows Phone run smoother that many Android phones, even the WP7 ones.

Not saying you are not right, just that we need to look at the whole stack when comparing mobile OS, not just GC vs ARC.

--
Paulo


May 10, 2014
On 5/9/14, 11:27 PM, Manu via Digitalmars-d wrote:
> ARC overhead would have no meaningful impact on performance, GC may
> potentially freeze execution. I am certain I would never notice ARC
> overhead on a profiler, and if I did, there are very simple methods to
> shift it elsewhere in the few specific circumstances it emerges.

This is very, very, very wrong. -- Andrei

May 10, 2014
On Saturday, 10 May 2014 at 07:08:04 UTC, Andrei Alexandrescu wrote:
> On 5/9/14, 11:27 PM, Manu via Digitalmars-d wrote:
>> ARC overhead would have no meaningful impact on performance, GC may
>> potentially freeze execution. I am certain I would never notice ARC
>> overhead on a profiler, and if I did, there are very simple methods to
>> shift it elsewhere in the few specific circumstances it emerges.
>
> This is very, very, very wrong. -- Andrei

I've seen this discussion ("it's almost performance-free", "it's a performance killer") so many times, I can't even say who has the burden of proof anymore.
May 10, 2014
On Saturday, 10 May 2014 at 07:42:05 UTC, Francesco Cattoglio wrote:
> On Saturday, 10 May 2014 at 07:08:04 UTC, Andrei Alexandrescu wrote:
>> On 5/9/14, 11:27 PM, Manu via Digitalmars-d wrote:
>>> ARC overhead would have no meaningful impact on performance, GC may
>>> potentially freeze execution. I am certain I would never notice ARC
>>> overhead on a profiler, and if I did, there are very simple methods to
>>> shift it elsewhere in the few specific circumstances it emerges.
>>
>> This is very, very, very wrong. -- Andrei
>
> I've seen this discussion ("it's almost performance-free", "it's a performance killer") so many times, I can't even say who has the burden of proof anymore.

I wish that someone would take the time and implement ARC in D. That's the only way to prove anything. If you implement it and you can provide clear evidence for its advantages, then that just ends all discussions.
May 10, 2014
On Saturday, 10 May 2014 at 06:53:07 UTC, Paulo Pinto wrote:
> Am 10.05.2014 08:27, schrieb Manu via Digitalmars-d:
>> On 10 May 2014 07:05, Wyatt via Digitalmars-d
>
> The problem when comparing iOS with Android, is that we aren't comparing ARC with GC.
>
> We are comparing a full OS, which we don't know how much ARC is actually used versus standard malloc/new with another OS, which has a
> so-and-so VM implementation, used mostly software rendering until version 4.1, and care for low end devices was only done in 4.4.
>
> If we add Windows Phone to the mix, then we have a .NET stack like Android (WP7 - GC/JIT) or in its sucessor (WP8)  native code generation for .NET (GC) with a COM model for OS APIs (ARC).
>
> Both versions of Windows Phone run smoother that many Android phones, even the WP7 ones.
>
> Not saying you are not right, just that we need to look at the whole stack when comparing mobile OS, not just GC vs ARC.
>
> --
> Paulo

+1
--
Paolo
May 10, 2014
On Saturday, 10 May 2014 at 08:18:30 UTC, w0rp wrote:

>>
>> I've seen this discussion ("it's almost performance-free", "it's a performance killer") so many times, I can't even say who has the burden of proof anymore.
>
> I wish that someone would take the time and implement ARC in D. That's the only way to prove anything. If you implement it and you can provide clear evidence for its advantages, then that just ends all discussions.

How hard would this be, exactly ?

Perhaps then, and only then, could you make a apples to apples comparison ?

Nick
May 10, 2014
Le 10/05/2014 01:31, Francesco Cattoglio a écrit :
> On Friday, 9 May 2014 at 21:05:18 UTC, Wyatt wrote:
>> But conversely, Manu, something has been bothering me: aren't you
>> restricted from using most libraries anyway, even in C++? "Decent" or
>> "acceptable" performance isn't anywhere near "maximum", so shouldn't
>> any library code that allocates in any language be equally suspect?
>> So from that standpoint, isn't any library you use in any language
>> going to _also_ be tuned for performance in the hot path?  Maybe I'm
>> barking up the wrong tree, but I don't recall seeing this point
>> addressed.
>>
>> More generally, I feel like we're collectively missing some important
>> context:  What are you _doing_ in your 16.6ms timeslice?  I know _I'd_
>> appreciate a real example of what you're dealing with without any
>> hyperbole.  What actually _must_ be done in that timeframe?  Why must
>> collection run inside that window?  What must be collected when it
>> runs in that situation?  (Serious questions.)
> I'll try to guess: if you want something running at 60 Frames per
> Second, 16.6ms is the time
> you have to do everything between frames. This means that in that timeframe
> you have to:
> -update your game state.
> -possibly process all network I/O.
> -prepare the rendering pipeline for the next frame.
>
> Updating the game state can imply make computations on lots of stuff:
> physics, animations, creation and deletion of entities and particles, AI
> logic... pick your poison. At every frame you will have an handful of
> objects being destroyed and a few resources that might go forgotten. One
> frame would probably only need very little objects collected. But given
> some times the amount of junk can grow out of control easily. Your code
> will end up stuttering at some point (because of random collections at
> random times), and this can be really bad.

As I know AAA game engine are reputed to do zero allocations during the frame computation, but I think it less the case noways cause of the dynamism of scene and huge environments that are streamed.

I recently fix a performance issue due to a code design that force destruction of walls (I am working on an architecture application) before creating them back when the user move them. gprof, show me that this point took around 30% of CPU time in a frame, and only allocations/destructions was about 5%. This 5% percents contains destructions of object in the architecture part and the 3D engine, same for construction. Construction also add operation like new geometry updload, so I don't think place of new and delete was high without the job made by constructors and destructors.
Reserving memory (malloc) isn't really an issue IMO, but operations related to constructions and destructions of objects can be expensive.
May 10, 2014
On 5/10/2014 6:05 AM, Wyatt wrote:
> On Friday, 9 May 2014 at 16:12:00 UTC, Manu via Digitalmars-d wrote:
>>
>> Let's also bear in mind that Java's GC is worlds ahead of D's.
>>
> Is Sun/Oracle reference implementation actually any good?
>

Yes. Given all the man hours that have gone into it over the years, it would be surprising if it weren't. Actually, though, there isn't really one collector that ships with the JRE anymore. There are different implementations, each using a different strategy, that can be selected at JVM startup. Furthermore, each collector is tunable so if you aren't satisfied with the default collector out of the box, you can use the JVM's instrumentation to find the most optimal match for your app's usage patterns. See [1] for some details.

[1] http://www.infoq.com/articles/Java_Garbage_Collection_Distilled

May 10, 2014
Le 10/05/2014 08:53, Paulo Pinto a écrit :
> Am 10.05.2014 08:27, schrieb Manu via Digitalmars-d:
>> On 10 May 2014 07:05, Wyatt via Digitalmars-d
>> <digitalmars-d@puremagic.com> wrote:
>>> On Friday, 9 May 2014 at 16:12:00 UTC, Manu via Digitalmars-d wrote:
>>> ...
>> The only option I know that works is Obj-C's solution, as demonstrated
>> by a very successful embedded RTOS, and compared to competition, runs
>> silky smooth. Indeed iOS makes it a specific design goal that it
>> should always feel silky smooth, never stuttery, they consider it a
>> point of quality, and I completely agree. I don't know what other
>> horse to back?
>> ...
>
> The problem when comparing iOS with Android, is that we aren't comparing
> ARC with GC.
>
> We are comparing a full OS, which we don't know how much ARC is actually
> used versus standard malloc/new with another OS, which has a
> so-and-so VM implementation, used mostly software rendering until
> version 4.1, and care for low end devices was only done in 4.4.
>
> If we add Windows Phone to the mix, then we have a .NET stack like
> Android (WP7 - GC/JIT) or in its sucessor (WP8)  native code generation
> for .NET (GC) with a COM model for OS APIs (ARC).
>
> Both versions of Windows Phone run smoother that many Android phones,
> even the WP7 ones.
>
> Not saying you are not right, just that we need to look at the whole
> stack when comparing mobile OS, not just GC vs ARC.
>
> --
> Paulo
>
>
I don't know well WP8 models, but this one must run smoothly :
http://www.nokia.com/fr-fr/mobiles/telephone-portable/lumia1320/fiche-technique/

Just like Android phones, the battery is huge : 3400mAh
It's the same for CPU : Qualcomm Snapdragon™ S4, dual core 1,7 GHz
Only RAM seems reasonable : 1Go

And it's maybe easier for Microsoft to do a GC with good performances cause they control everything on the OS. There is certainly some specifics memory management in the kernel related to the GC. When Google "just" put small pieces together based on a linux kernel.

Yesterday I played with my nexus 5 at a game in augmented reality, it took 2hr 20 minutes to completely drain the battery (2300 mAh). Sadly this game doesn't exist on iOS but IMO an iPhone would do the same with a smaller battery (1440 mAh for iPhone 5s).
May 10, 2014
On 10 May 2014 19:07, Xavier Bigand via Digitalmars-d <digitalmars-d@puremagic.com> wrote:
> Le 10/05/2014 01:31, Francesco Cattoglio a écrit :
>
>> On Friday, 9 May 2014 at 21:05:18 UTC, Wyatt wrote:
>>>
>>> But conversely, Manu, something has been bothering me: aren't you restricted from using most libraries anyway, even in C++? "Decent" or "acceptable" performance isn't anywhere near "maximum", so shouldn't any library code that allocates in any language be equally suspect? So from that standpoint, isn't any library you use in any language going to _also_ be tuned for performance in the hot path?  Maybe I'm barking up the wrong tree, but I don't recall seeing this point addressed.
>>>
>>> More generally, I feel like we're collectively missing some important context:  What are you _doing_ in your 16.6ms timeslice?  I know _I'd_ appreciate a real example of what you're dealing with without any hyperbole.  What actually _must_ be done in that timeframe?  Why must collection run inside that window?  What must be collected when it runs in that situation?  (Serious questions.)
>>
>> I'll try to guess: if you want something running at 60 Frames per
>> Second, 16.6ms is the time
>> you have to do everything between frames. This means that in that
>> timeframe
>> you have to:
>> -update your game state.
>> -possibly process all network I/O.
>> -prepare the rendering pipeline for the next frame.
>>
>> Updating the game state can imply make computations on lots of stuff: physics, animations, creation and deletion of entities and particles, AI logic... pick your poison. At every frame you will have an handful of objects being destroyed and a few resources that might go forgotten. One frame would probably only need very little objects collected. But given some times the amount of junk can grow out of control easily. Your code will end up stuttering at some point (because of random collections at random times), and this can be really bad.
>
>
> As I know AAA game engine are reputed to do zero allocations during the frame computation, but I think it less the case noways cause of the dynamism of scene and huge environments that are streamed.

Running a game in a zero-alloc environment was a luxury that ended
about 10 years ago, for reasons that you say.
Grand Theft Auto proved that you don't need to have loading screens,
and now it's a basic requirement. We also have a lot more physics,
environmental destruction, and other dynamic behaviour.

It was also something we achieved in the past with extreme complexity,
and the source of many (most?) bugs. We still allocated technically,
but we had to micro-manage every single little detail, mini pools and
regions for everything.
Processors are better now, in the low-frequency code, we can afford to
spend a tiny bit of time using a standard allocation model. The key is
the growing separation between low-frequency and high-frequency code.
On a 33mhz Playstation, there wasn't really much difference between
the worlds. Now there is, and we can afford to allow 'safety' into the
language at the cost of a little memory management. We just can't have
that cost include halting threads all the time for lengthy collect
processes.
I know this, because we already use RC extensively in games anyway;
DirectX uses COM, and most resources use manual RC because we need
things to release eagerly, and for destructors to work properly. I
don't see how ARC would add any significant cost to the manual RC that
is basically standard for many years. It would add simplicity and
safety, which are highly desirable.


> I recently fix a performance issue due to a code design that force destruction of walls (I am working on an architecture application) before creating them back when the user move them. gprof, show me that this point took around 30% of CPU time in a frame, and only allocations/destructions was about 5%. This 5% percents contains destructions of object in the architecture part and the 3D engine, same for construction. Construction also add operation like new geometry updload, so I don't think place of new and delete was high without the job made by constructors and destructors. Reserving memory (malloc) isn't really an issue IMO, but operations related to constructions and destructions of objects can be expensive.

It's better to have eager destructors (which empowered you with the
ability to defer them if you need to), than to not have destructors at
all, which has been the center if this last few days argument.
It's not hard to defer resource destruction to a background thread in
an eager-destruction model.