December 29, 2011
Agreed.

There are plenty of real-world, even 'common' examples where the lack of being able to force inlining for a function is a problem. The main one I've run into is not being able to inline functions with assembly, thus not being able to implement efficient SIMD operations.
December 29, 2011
On 29/12/11 11:19 AM, Vladimir Panteleev wrote:
> On Thursday, 29 December 2011 at 09:16:23 UTC, Walter Bright wrote:
>> Are you a ridiculous hacker? Inline x86 assembly that the compiler
>> actually understands in 32 AND 64 bit code, hex string literals like
>> x"DE ADB EEF" where spacing doesn't matter, the ability to set data
>> alignment cross-platform with type.alignof = 16, load your shellcode
>> verbatim into a string like so: auto str = import("shellcode.txt");
>
> I would like to talk about this for a bit. Personally, I think D's
> system programming abilities are only half-way there. Note that I am not
> talking about use cases in high-level application code, but rather
> low-level, widely-used framework code, where every bit of performance
> matters (for example: memory copy routines, string builders, garbage
> collectors).
>
> In-line assembler as part of the language is certainly neat, and in fact
> coming from Delphi to C++ I was surprised to learn that C++
> implementations adopted different syntax for asm blocks. However,
> compared to some C++ compilers, it has severe limitations and is D's
> only trick in this alley.
>
> For one thing, there is no way to force the compiler to inline a
> function (like __forceinline / __attribute((always_inline)) ). This is
> fine for high-level code (where users are best left with PGO and "the
> compiler knows best"), but sucks if you need a guarantee that the
> function must be inlined. The guarantee isn't just about inlining
> heuristics, but also implementation capabilities. For example, some
> implementations might not be able to inline functions that use certain
> language features, and your code's performance could demand that such a
> short function must be inlined. One example of this is inlining
> functions containing asm blocks - IIRC DMD does not support this. The
> compiler should fail the build if it can't inline a function tagged with
> @forceinline, instead of shrugging it off and failing silently, forcing
> users to check the disassembly every time.
>
> You may have noticed that GCC has some ridiculously complicated
> assembler facilities. However, they also open the way to the
> possibilities of writing optimal code - for example, creating custom
> calling conventions, or inlining assembler functions without restricting
> the caller's register allocation with a predetermined calling
> convention. In contrast, DMD is very conservative when it comes to
> mixing D and assembler. One time I found that putting an asm block in a
> function turned what were single instructions into blocks of 6
> instructions each.
>
> D's lacking in this area makes it impossible to create language features
> that are on the level of D's compiler built-ins. For example, I have
> tested three memcpy implementations recently, but none of them could
> beat DMD's standard array slice copy (despite that in release mode it
> compiles to a simple memcpy call). Why? Because the overhead of using a
> custom memcpy routine negated its performance gains.
>
> This might have been alleviated with the presence of sane macros, but no
> such luck. String mixins are not the answer: trying to translate
> macro-heavy C code to D using string mixins is string escape hell, and
> we're back to the level of shell scripts.
>
> We've discussed this topic on IRC recently. From what I understood,
> Andrei thinks improvements in this area are not "impactful" enough,
> which I find worrisome.
>
> Personally, I don't think D qualifies as a true "system programming
> language" in light of the above. It's more of a compiled language with
> pointers and assembler. Before you disagree with any of the above, first
> (for starters) I'd like to invite you to translate Daniel Vik's C memcpy
> implementation to D:
> http://www.danielvik.com/2010/02/fast-memcpy-in-c.html . It doesn't even
> use inline assembler or compiler intrinsics.

+1

Also: vector instrinsics.

Also: alignment specifications (not just member variables).

The lack of both these things is currently causing me much pain :-( Manually aligning things gets tiresome after a while.
December 29, 2011
Vladimir Panteleev:

> One example of this is inlining functions containing asm blocks - IIRC DMD does not support this. The compiler should fail the build if it can't inline a function tagged with @forceinline, instead of shrugging it off and failing silently, forcing users to check the disassembly every time.

Right.


> You may have noticed that GCC has some ridiculously complicated assembler facilities. However, they also open the way to the possibilities of writing optimal code - for example, creating custom calling conventions, or inlining assembler functions without restricting the caller's register allocation with a predetermined calling convention. In contrast, DMD is very conservative when it comes to mixing D and assembler. One time I found that putting an asm block in a function turned what were single instructions into blocks of 6 instructions each.

LDC has a mean to inline functions with asm, and asm expressions. DMD too should have both. I am saying this since two or three years.

Bye,
bearophile
December 29, 2011
On 2011-12-29 11:15, Caligo wrote:
>
>
> On Thu, Dec 29, 2011 at 3:16 AM, Walter Bright
> <newshound2@digitalmars.com <mailto:newshound2@digitalmars.com>> wrote:
>
>     http://pastebin.com/AtuzJqh0
>
>
> This is somewhat of a serious question:  If there is a God (I'm not
> saying there isn't, and I'm not saying there is), what language would he
> choose to create the universe?  It would be hard for us mortals to
> imagine, but would it resemble a functional programming language more or
> something else?  And what type of hardware would the code run on?  I
> mean, there are computations happening all around us, e.g., when an
> apple falls or planets circle the sun, etc, so what's performing all the
> computation?

Servers in the cloud of course :)

-- 
/Jacob Carlborg
December 29, 2011
Kapps Wrote:

> Agreed.
> 
> There are plenty of real-world, even 'common' examples where the lack of being able to force inlining for a function is a problem. The main one I've run into is not being able to inline functions with assembly, thus not being able to implement efficient SIMD operations.

The problem is not just inlining but also needless loads and stores at the beginnings and ends of asm blocks. For example in the following code:

void test(ref V a, ref V b)
{
    asm
    {
        movaps XMM0, a;
        addps  XMM0, b;
        movaps a, XMM0;
    }
    asm
    {
        movaps XMM0, a;
        addps  XMM0, b;
        movaps a, XMM0;
    }
}

compiles to:


   0:   55                      push   %rbp
   1:   48 8b ec                mov    %rsp,%rbp
   4:   48 83 ec 10             sub    $0x10,%rsp
   8:   48 89 7d f0             mov    %rdi,-0x10(%rbp)
   c:   48 89 75 f8             mov    %rsi,-0x8(%rbp)
  10:   0f 28 45 f8             movaps -0x8(%rbp),%xmm0
  14:   0f 58 45 f0             addps  -0x10(%rbp),%xmm0
  18:   0f 29 45 f8             movaps %xmm0,-0x8(%rbp)
  1c:   0f 28 45 f8             movaps -0x8(%rbp),%xmm0
  20:   0f 58 45 f0             addps  -0x10(%rbp),%xmm0
  24:   0f 29 45 f8             movaps %xmm0,-0x8(%rbp)
  28:   48 8b e5                mov    %rbp,%rsp
  2b:   5d                      pop    %rbp
  2c:   c3                      retq

The needles loads and stores would make it impossible to write an efficient simd add function even if the functions containing asm blocks could be inlined.
December 29, 2011
On 29.12.2011 12:19, Vladimir Panteleev wrote:
> On Thursday, 29 December 2011 at 09:16:23 UTC, Walter Bright wrote:
>> Are you a ridiculous hacker? Inline x86 assembly that the compiler
>> actually understands in 32 AND 64 bit code, hex string literals like
>> x"DE ADB EEF" where spacing doesn't matter, the ability to set data
>> alignment cross-platform with type.alignof = 16, load your shellcode
>> verbatim into a string like so: auto str = import("shellcode.txt");
>
> I would like to talk about this for a bit. Personally, I think D's
> system programming abilities are only half-way there. Note that I am not
> talking about use cases in high-level application code, but rather
> low-level, widely-used framework code, where every bit of performance
> matters (for example: memory copy routines, string builders, garbage
> collectors).
>
> In-line assembler as part of the language is certainly neat, and in fact
> coming from Delphi to C++ I was surprised to learn that C++
> implementations adopted different syntax for asm blocks. However,
> compared to some C++ compilers, it has severe limitations and is D's
> only trick in this alley.
>
> For one thing, there is no way to force the compiler to inline a
> function (like __forceinline / __attribute((always_inline)) ).
[snip]
> Personally, I don't think D qualifies as a true "system programming
> language" in light of the above. It's more of a compiled language with
> pointers and assembler.

I don't think the situation is any different with DMC. I think that if D isn't a systems programming lanugage, neither is C or C++ without vendor-specific extensions.

But it doesn't really matter -- the main conclusion is still correct: D is missing some features which could improve performance considerably.

Before you disagree with any of the above, first
> (for starters) I'd like to invite you to translate Daniel Vik's C memcpy
> implementation to D:

> http://www.danielvik.com/2010/02/fast-memcpy-in-c.html . It doesn't even
> use inline assembler or compiler intrinsics.

Note that the memcpy described there is _far_ from optimal. Memcpy is all about cache effciency. DMD translates memcpy to the single instruction "rep movsd" which you'd think would be optimal, but you can actually beat it by a factor of four or more for long lengths.
December 29, 2011
On Thursday, 29 December 2011 at 14:44:45 UTC, Don wrote:
>> http://www.danielvik.com/2010/02/fast-memcpy-in-c.html . It doesn't even
>> use inline assembler or compiler intrinsics.
>
> Note that the memcpy described there is _far_ from optimal. Memcpy is all about cache effciency. DMD translates memcpy to the single instruction "rep movsd" which you'd think would be optimal, but you can actually beat it by a factor of four or more for long lengths.

I've never seen DMD emit rep movsd. Does rep movsd even make sense when the memory areas do not have the same alignment? memcpy in snn.lib has a rep movsd instruction, but there's lots of other code (including what looks like Duff's device).
December 29, 2011
On 12/29/11 4:15 AM, Caligo wrote:
>
>
> On Thu, Dec 29, 2011 at 3:16 AM, Walter Bright
> <newshound2@digitalmars.com <mailto:newshound2@digitalmars.com>> wrote:
>
>     http://pastebin.com/AtuzJqh0
>
>
> This is somewhat of a serious question:  If there is a God (I'm not
> saying there isn't, and I'm not saying there is), what language would he
> choose to create the universe?  It would be hard for us mortals to
> imagine, but would it resemble a functional programming language more or
> something else?  And what type of hardware would the code run on?  I
> mean, there are computations happening all around us, e.g., when an
> apple falls or planets circle the sun, etc, so what's performing all the
> computation?

Obligatory: http://xkcd.com/224/

Andrei
December 29, 2011
On Thursday, 29 December 2011 at 14:44:45 UTC, Don wrote:
> I don't think the situation is any different with DMC. I think that if D isn't a systems programming lanugage, neither is C or C++ without vendor-specific extensions.

You're right... I've never extensively used a C/C++ compiler without similar extensions, though. The fact that major vendors come up with their own extensions to do many of the same features shows that they might have better been standardized.
December 29, 2011
On 12/29/11 2:13 PM, a wrote:
> void test(ref V a, ref V b)
> {
>      asm
>      {
>          movaps XMM0, a;
>          addps  XMM0, b;
>          movaps a, XMM0;
>      }
>      asm
>      {
>          movaps XMM0, a;
>          addps  XMM0, b;
>          movaps a, XMM0;
>      }
> }
>
> […]
>
> The needles loads and stores would make it impossible to write an efficient simd add function even if the functions containing asm blocks could be inlined.

Yes, this is indeed a problem, and as far as I'm aware, usually solved in the gamedev world by using the (SSE) intrinsics your favorite C++ compiler provides, instead of resorting to inline asm.

David