December 31, 2013
On Tuesday, 31 December 2013 at 17:52:56 UTC, Chris Cain wrote:
> Well, that's certainly a good point. There's _probably_ some extra optimizations that could be done with a compiler supported new. Maybe it could make some significantly faster code, but this assumes many things:
>
> 1. The compiler writer will actually do this analysis and write the optimization (my bets are that DMD will likely not do many of the things you suggest).

I think many optimizations become more valuable when you start doing whole program anlysis.

> 2. The person writing the code is writing code that is allocating several times in a deeply nested loop.

The premise of efficient high level/generic programming is that the optimizer will undo naive code. Pseudo code example:

inline process(inarray, allocator){
   a = allocator.alloc(Array)
   a.init()
   for e in inarray { a.append(foo(e)) }
   return a
}

b = process(emptyarray,myallocator)
dosomething(b)
myallocator.free(b)

The optimizer should get rid of all of this. But since alloc() followed by free() most likely leads to side effects, it can't and you end up with:

b = myallocator.alloc(1000)
myallocator.free(b)

> 3. Despite the person making the obvious critical error of allocating several times in a deeply nested loop, he must not have made any other significant errors or those other errors must also be covered by optimizations

I disagree that that inefficiencies due to high level programming is a mistake if the compiler has opportunity to get rid of it. I wish D would target high level programming in the global scope and low level programming in limited local scopes. I think few applications need hand optimization globally, except perhaps raytracers and compilers.

> Manual optimization in this case isn't too unreasonable.

I think manual optimization in most cases should be privided by the programmer as compiler hints and constraints.

> Think of replacing library calls when it's noticed that it's an allocate function. It's pretty dirty and won't actually happen nor do I suggest it should happen, but it's actually still also _possible_.

Yes, why not? As long as the programmer has the means to control it. Why not let the compiler choose allocation strategies based on profiling for instance?
December 31, 2013
On Tuesday, 31 December 2013 at 19:53:29 UTC, Ola Fosheim Grøstad wrote:
> On Tuesday, 31 December 2013 at 17:52:56 UTC, Chris Cain wrote:
>> 1. The compiler writer will actually do this analysis and write the optimization (my bets are that DMD will likely not do many of the things you suggest).
>
> I think many optimizations become more valuable when you start doing whole program anlysis.

You're correct, but I think the value only comes if it's actually done, which was my point.

>> 2. The person writing the code is writing code that is allocating several times in a deeply nested loop.
>
> The premise of efficient high level/generic programming is that the optimizer will undo naive code.

Sure. My point was that it's a very precise situation that the optimization would actually work effectively enough to be significant enough to discard the advantages of using a library solution. If there were no trade offs for using a compiler supported new, then even a tiny smidge of an optimization here and there is perfectly reasonable. Unfortunately, that's not the case. The only times where I think the proposed optimization is significant enough to overcome the tradeoff is precisely the type of situation I described.

Note I'm _not_ arguing that performing optimizations is irrelevant. Like you said, "The premise of efficient high level/generic programming is that the optimizer will undo naive code." But that is _not_ the only facet that needs to be considered here. If it were, you'd be correct and we should recommend only using new. But since there are distinct advantages to a library solution and distinct disadvantages to the compiler solution, the fact that you _could_, with effort, make small optimizations on occasion just isn't enough to overturn the other tradeoffs you're making.

>> 3. Despite the person making the obvious critical error of allocating several times in a deeply nested loop, he must not have made any other significant errors or those other errors must also be covered by optimizations
>
> I disagree that that inefficiencies due to high level programming is a mistake if the compiler has opportunity to get rid of it. I wish D would target high level programming in the global scope and low level programming in limited local scopes. I think few applications need hand optimization globally, except perhaps raytracers and compilers.

You seem to be misunderstanding my point again. I'm _not_ suggesting D not optimize as much as possible and I'm not suggesting everyone "hand optimize" everything. Following my previous conditions, this condition is obviously suggesting that there isn't any other significant problems which would minimize the effect of your proposed optimization.

So, _if_ the optimization is put in place, and _if_ the code in question is deeply nested to make the code take a significant amount of time so that your proposed optimization has a chance to be actually useful, then now we have to ask the question "are there any other major problems that are also taking up significant time?" If the answer is "Yes, there are other major problems" then your proposed speed-up seems less likely to matter. That's where I was going.

>> Manual optimization in this case isn't too unreasonable.
>
> I think manual optimization in most cases should be privided by the programmer as compiler hints and constraints.

In some cases, yes. An "inline" hint, for instance, makes a ton of sense. Are you suggesting that there should be a hint provided to new? Something like: `Something thing = new (@stackallocate) Something(arg1,arg2);`? If so, it seems like a really roundabout way to do it when you could just do `Something thing = stackAlloc.make!Something(arg1,arg2);` I don't see hints possibly being provided to new being an advantage at all. All that means is that to add additional "hint" allocators, you'd have to dive into the compiler (and language spec) as opposed to easily writing your own as a library.

>> Think of replacing library calls when it's noticed that it's an allocate function. It's pretty dirty and won't actually happen nor do I suggest it should happen, but it's actually still also _possible_.
>
> Yes, why not? As long as the programmer has the means to control it. Why not let the compiler choose allocation strategies based on profiling for instance?

Uhh... You don't see the problem with the compiler tying itself to the implementation of a library allocate function? Presumably such a thing would _only_ be done using the default library allocator since when the programmer says "use std.allocator.StackAllocator" he generally means it. And I find the whole idea of the compiler hard-coding "if the programmer uses std.allocator.DefaultAllocator.allocate then instead of emitting that function call do ..." to be more than a bit ugly. Possible, but horrific.
December 31, 2013
On Tuesday, 31 December 2013 at 20:29:34 UTC, Chris Cain wrote:
> On Tuesday, 31 December 2013 at 19:53:29 UTC, Ola Fosheim Grøstad wrote:
>> I think many optimizations become more valuable when you start doing whole program anlysis.
>
> You're correct, but I think the value only comes if it's actually done, which was my point.

Well, there is a comment in the DMD source code that suggest that it is being thought about, at least. :)

Anyway, I think threading, locking and memory management are areas that should not be controlled by black boxes. Partially for optimization, but also for partial correctness "proofs".

> But since there are distinct advantages to a library solution and distinct disadvantages to the compiler solution, the fact that you _could_, with effort, make small optimizations on occasion just isn't enough to overturn the other tradeoffs you're making.

The way I see it: programmers today avoid the heap and target the stack. They shouldn't have to. The compiler should handle that. C++ compilers do reaaonably well on low level basic blocks, but poorly on higher levels, so programmers are used to hand optimizing for that situation.

I think many allocations could be dissolved and replaced with passing values in registers, or simply reused with limited reinitialization with better high level analysis, or maybe have allocations take place at the call site rather than in the repeatedly called function. Automagically!

> You seem to be misunderstanding my point again. I'm _not_ suggesting D not optimize as much as possible and I'm not suggesting everyone "hand optimize" everything. Following my

Well, l think c++ish programmers in today are hand optimizing everything at the medium level, in a sense. Less so in Python (it is too slow to bother :-)


>> I think manual optimization in most cases should be privided by the programmer as compiler hints and constraints.
>
> In some cases, yes. An "inline" hint, for instance, makes a ton of sense. Are you suggesting that there should be a hint provided to new?

Actually I want meta-level constructs, like:
- this new will allocate at least 1000 objects that are 100-400 bytes,
- all new allocs marked by X beyond this point will be released by position Y
- this new should not be pages if possible
- i dont mind if this new is mapped to disk
- this new is a cache object, destroy whenever you feel like
- this new will never hit another thread

> Presumably such a thing would _only_ be done using the default library allocator since when the programmer says "use std.allocator.StackAllocator" he generally means it.

He shouldn't have to. You dont write your own paging or scheduler for the OS either. You complain until it works. :)
January 03, 2014
"Chris Cain" <clcain@uncg.edu> writes:

> On Monday, 30 December 2013 at 11:23:22 UTC, JN wrote:
>> The best you can do in those
>> languages usually is to just not allocate stuff during the game.
>
> Yeah. The techniques to accomplish this in GC-only languages surprisingly mirror some of the techniques where malloc is available, though. For instance, the object pool pattern has the object already allocated and what you do is just ask for an object from the pool and set it up for your needs. When you're done, you just give it back to the pool to be recycled. It's very similar to what you'd do in any other language, but a little more restricted (other languages, like D, might just treat the memory as untyped bytes and the "object pool" would be more flexible and could support any number of types of objects).

I find even in C++ that I need to create object pools for speeding up our code.  Generally this is due to objects that have allocated memory in them, such as vectors.  For example

(C++)
class blah {
  vector<int> a, b, c, d;
};

I end up making the object declare a reset function so that it can be recycled without paying for the vector reallocations.

This is definitely a useful pattern to not have to rewrite.

Jerry
January 04, 2014
On 12/29/2013 02:47 PM, "Ola Fosheim Grøstad" <ola.fosheim.grostad+dlang@gmail.com>" wrote:
> On Sunday, 29 December 2013 at 20:36:27 UTC, Walter Bright wrote:
>> I'll reiterate that the GC will NEVER EVER pause your program unless
>> you are actually calling the GC to allocate memory. A loop that does
>> not GC allocate WILL NEVER PAUSE.
>
> That's fine, except when you have real-time threads.
>
I'm always astounded how often real-time anything gets thrown around.

Yeah sure, some programs are under time constraints.  But from the way everyone freaked out you'd think *all* programs are real-time.

But in fact it's a very small subset.  Hell, it's small enough to be a *special case*.

> So unless you use non-temporal load/save in your GC traversal (e.g. on
> x86 you have SSE instructions that bypass the cache), your GC might
> trash the cache for other cores that run real-time threads which are
> initiated as call-backs from the OS.

Awesome.  So now to run a real-time application, you can't have any program that uses a GC running on the same machine.
Somehow I don't think that is grounded in reality.


> These callbacks might happen 120+ times per seconds and your runtime
> cannot control those, they have the highest user-level priority.
>
> Granted, the latest CPUs have a fair amount of level 3 cache, and the
> most expensive ones might have a big level 4 cache, but I still think it
> is a concern. Level 1 and 2 caches are small: 64KB/128KB.


January 04, 2014
On Saturday, 4 January 2014 at 07:00:28 UTC, 1100110 wrote:
> But in fact it's a very small subset.  Hell, it's small enough to be a *special case*.

No, real time applications are not a very small subset. Hard real-time applications are in a smaller subset, although most audio-applications for performance fall into this category.

It is however THE subset where you need the kind of  low-level control that C/C++ provides and which D is supposed to provide. For non-real-time applications you can usually get acceptable performance with higher level languages, if you pick the right language for the task.

> Awesome.  So now to run a real-time application, you can't have any program that uses a GC running on the same machine.

Depends on the characteristics of the GC, if it is cache friendly, how many cache lines it touches per iteration and the real-time application. If you tune a tabular audio-application to full load on a single core and L3 cache size, then fast mark-sweep on a large dataset, frequent cache invalidation in the other threads and high memory-bus activity on the remaining cores is most certainly not desirable.

But yes, running multiple high-load programs in parallel is indeed problematic for most high-load real-time applications (games, audio software). And to work around that you need lots of extra logic (such as running with approximations on missed frames where possible or doing buffering based on heuristics).
January 08, 2014
On Saturday, 28 December 2013 at 11:13:55 UTC, Barry L. wrote:
> Hello everyone, first post...
>
> Just saw this:  http://joeduffyblog.com/2013/12/27/csharp-for-systems-programming/
>
> D (and Rust) get a mention with this quote:  "There are other candidates early in their lives, too, most notably Rust and D. But hey, my team works at Microsoft, where there is ample C# talent and community just an arm’s length away."

They are any conclusion about this ?
they are 10 page and most part talk about D gc…
January 08, 2014
On Wednesday, 8 January 2014 at 22:55:24 UTC, bioinfornatics wrote:
> On Saturday, 28 December 2013 at 11:13:55 UTC, Barry L. wrote:
>> Hello everyone, first post...
>>
>> Just saw this:  http://joeduffyblog.com/2013/12/27/csharp-for-systems-programming/
>>
>> D (and Rust) get a mention with this quote:  "There are other candidates early in their lives, too, most notably Rust and D. But hey, my team works at Microsoft, where there is ample C# talent and community just an arm’s length away."
>
> They are any conclusion about this ?
> they are 10 page and most part talk about D gc…

Thank you.

Microsoft might put together a great language for system programming but if it is going to be used outside the Microsoft world, then LLVM will be essential.

GCC has previously been used by processor vendors in order to support languages like C/C++. LLVM is now gradually taking over that part and I expect LLVM to become the compiler framework of choice. The same is really valid for the D language, without LLVM the D language will not live on.

I don't really know the plan from Microsoft here but I doubt that they will release the source and support LLVM so I guess the wide acceptance of this new language will be limited to Microsoft development only. Then we might have people who will make an LLVM implementation of M# by themselves, we'll see
January 09, 2014
On Wednesday, 8 January 2014 at 22:55:24 UTC, bioinfornatics wrote:
>
> They are any conclusion about this ?
> they are 10 page and most part talk about D gc…

It is concluded that C(and optionally C++ - depending on the speaker) is inherently faster than anything else because C(++) is a "portable assembly language" and therefore it encourages writing fast software.

For example, most of real C(++) programmers preallocate large blocks of memory space for future usage instead of allocating space for single variables like most of programmers using the discussed language. Cache locality gives a huge speed gains to the former group, while the latter group gets diabetes because of syntactic sugar. It's also worth noting that C(++) programmers are using memory more efficiently because they only allocate and deallocate memory only when needed - memory is reclaimed by OS as fast as possible. This can't be achieved by garbage collection which frees memory in batches.

C(++) is prefered over assembly because it's just as fast or even faster than what would you write manually, yet it allows you to focus on algorithms and data structures instead of low level details of the machine like cache locality and memory layout. Yet, you still retain full control - by using inline asm you can regain some cycles wasted on adhering to calling conventions, etc. C(++) macro language is superior to what NASM, FASM and others have to offer - it's much simpler to use than those and it serves as a prefered way of achieving robust compile time polymorphism.

C(++) is designed to be simple and fast language. It can be adopted easily in various architectures because of many undefined behaviors (which leave wiggle-room for implementers) and lack of runtime library - you can just use OS calls!

There's no way this language can beat C(++), don't even try to fight with years of tradition. Simply join the cult!
January 09, 2014
On Thursday, 9 January 2014 at 00:20:36 UTC, QAston wrote:

>
> C(++) is designed to be simple [snip]

Good one! ;)
1 2 3 4 5 6 7 8 9 10
Next ›   Last »