View mode: basic / threaded / horizontal-split · Log in · Help
December 29, 2011
Re: System programming in D (Was: The God Language)
On 12/29/11 2:29 PM, Walter Bright wrote:
> On 12/29/2011 11:47 AM, Walter Bright wrote:
>> On 12/29/2011 3:19 AM, Vladimir Panteleev wrote:
>>> I'd like to invite you to translate Daniel Vik's C memcpy
>>> implementation to D:
>>> http://www.danielvik.com/2010/02/fast-memcpy-in-c.html
>>
>> Challenge accepted.
>
> Here's another version that uses string mixins to ensure inlining of the
> COPY functions. There are no call instructions in the generated code.
> This should be as good as the C version using the same code generator.
[snip]

In other news, TAB has died with Kim-Jong Il. Please stop using it.

Andrei
December 29, 2011
Re: System programming in D (Was: The God Language)
David Nadlinger Wrote:

> On 12/29/11 2:13 PM, a wrote:
> > void test(ref V a, ref V b)
> > {
> >      asm
> >      {
> >          movaps XMM0, a;
> >          addps  XMM0, b;
> >          movaps a, XMM0;
> >      }
> >      asm
> >      {
> >          movaps XMM0, a;
> >          addps  XMM0, b;
> >          movaps a, XMM0;
> >      }
> > }
> >
> > […]
> >
> > The needles loads and stores would make it impossible to write an efficient simd add function even if the functions containing asm blocks could be inlined.
> 
> Yes, this is indeed a problem, and as far as I'm aware, usually solved 
> in the gamedev world by using the (SSE) intrinsics your favorite C++ 
> compiler provides, instead of resorting to inline asm.
> 
> David

IIRC Walter doesn't want to add vector intrinsics, so it would be nice if the functions to do vector operations could be efficiently  written using inline assembly.  It would also be a more general solution than having intrinsics. Something like that is possible with gcc extended inline assembly. For example this: 

typedef float v4sf __attribute__((vector_size(16)));

void vadd(v4sf *a, v4sf *b)
{
   asm(
       "addps %1, %0" 
       : "=x" (*a) 
       : "x" (*b), "0" (*a)
       : );
}

void test(float * __restrict__ a, float * __restrict__ b)
{
   v4sf * va = (v4sf*) a;
   v4sf * vb = (v4sf*) b;
   vadd(va,vb);
   vadd(va,vb);
   vadd(va,vb);
   vadd(va,vb);
}

compiles to:

00000000004004c0 <test>:
 4004c0:       0f 28 0e                movaps (%rsi),%xmm1
 4004c3:       0f 28 07                movaps (%rdi),%xmm0
 4004c6:       0f 58 c1                addps  %xmm1,%xmm0
 4004c9:       0f 58 c1                addps  %xmm1,%xmm0
 4004cc:       0f 58 c1                addps  %xmm1,%xmm0
 4004cf:       0f 58 c1                addps  %xmm1,%xmm0
 4004d2:       0f 29 07                movaps %xmm0,(%rdi)

This should also be possible with GDC, but I couldn't figure out how to get something like __restrict__ (if you want to use vector types and gcc extended inline assembly with GDC, see http://www.digitalmars.com/d/archives/D/gnu/Support_for_gcc_vector_attributes_SIMD_builtins_3778.html and https://bitbucket.org/goshawk/gdc/wiki/UserDocumentation).
December 29, 2011
Re: System programming in D (Was: The God Language)
Walter Bright Wrote:

> On 12/29/2011 5:13 AM, a wrote:
> > The needles loads and stores would make it impossible to write an efficient
> > simd add function even if the functions containing asm blocks could be
> > inlined.
> 
> This does what you're asking for:
> 
> void test(ref float a, ref float b)
> {
>      asm
>      {
>          naked;
>          movaps  XMM0,[RSI];
>          addps   XMM0,[RDI];
>          movaps  [RSI],XMM0;
>          movaps  XMM0,[RSI];
>          addps   XMM0,[RDI];
>          movaps  [RSI],XMM0;
>          ret;
>      }
> }

What I want is to be able to write short functions using inline assembly and have them inlined and compiled even to a single instruction where possible. This can be done with gcc. See my post here: http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=153879
December 29, 2011
Re: System programming in D (Was: The God Language)
On 12/29/2011 2:52 PM, a wrote:
> What I want is to be able to write short functions using inline assembly and
> have them inlined and compiled even to a single instruction where possible.
> This can be done with gcc. See my post here:
> http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=153879

I understand. I just wished to make sure you knew about 'naked' and what good it 
was for.
December 29, 2011
Re: System programming in D (Was: The God Language)
On 12/29/2011 12:19 PM, Vladimir Panteleev wrote:
> Before you disagree with any of the above, first
> (for starters) I'd like to invite you to translate Daniel Vik's C memcpy
> implementation to D:
> http://www.danielvik.com/2010/02/fast-memcpy-in-c.html . It doesn't even
> use inline assembler or compiler intrinsics.

Ok, I have performed a direct translation (with all the preprocessor 
stuff replaced by string mixins). However, I think I could do a lot 
better starting from scratch in D. I have performed some basic testing 
with all the configuration options, and it seems to work correctly.

// File: memcpy.d direct translation of memcpy.c

/********************************************************************
 ** File:     memcpy.c
 **
 ** Copyright (C) 1999-2010 Daniel Vik
 **
 ** This software is provided 'as-is', without any express or implied
 ** warranty. In no event will the authors be held liable for any
 ** damages arising from the use of this software.
 ** Permission is granted to anyone to use this software for any
 ** purpose, including commercial applications, and to alter it and
 ** redistribute it freely, subject to the following restrictions:
 **
 ** 1. The origin of this software must not be misrepresented; you
 **    must not claim that you wrote the original software. If you
 **    use this software in a product, an acknowledgment in the
 **    use this software in a product, an acknowledgment in the
 **    product documentation would be appreciated but is not
 **    required.
 **
 ** 2. Altered source versions must be plainly marked as such, and
 **    must not be misrepresented as being the original software.
 **
 ** 3. This notice may not be removed or altered from any source
 **    distribution.
 **
 **
 ** Description: Implementation of the standard library function memcpy.
 **             This implementation of memcpy() is ANSI-C89 compatible.
 **
 **             The following configuration options can be set:
 **
 **           LITTLE_ENDIAN   - Uses processor with little endian
 **                             addressing. Default is big endian.
 **
 **           PRE_INC_PTRS    - Use pre increment of pointers.
 **                             Default is post increment of
 **                             pointers.
 **
 **           INDEXED_COPY    - Copying data using array indexing.
 **                             Using this option, disables the
 **                             PRE_INC_PTRS option.
 **
 **           MEMCPY_64BIT    - Compiles memcpy for 64 bit
 **                             architectures
 **
 **
 ** Best Settings:
 **
 ** Intel x86:  LITTLE_ENDIAN and INDEXED_COPY
 **
 *******************************************************************/


/********************************************************************
 ** Configuration definitions.
 *******************************************************************/

version = LITTLE_ENDIAN;
version = INDEXED_COPY;


/********************************************************************
 ** Includes for size_t definition
 *******************************************************************/

/********************************************************************
 ** Typedefs
 *******************************************************************/

version(MEMCPY_64BIT) version(D_LP32) static assert(0, "not a 64 bit 
compile");
version(D_LP64){
    alias ulong              UIntN;
    enum TYPE_WIDTH =        8;
}else{
    alias uint               UIntN;
    enum TYPE_WIDTH =        4;
}


/********************************************************************
 ** Remove definitions when INDEXED_COPY is defined.
 *******************************************************************/

version(INDEXED_COPY){
    version(PRE_INC_PTRS)
        static assert(0, "cannot use INDEXED_COPY together with 
PRE_INC_PTRS!");
}

/********************************************************************
 ** The X template
 *******************************************************************/

string Ximpl(string x){
    import utf = std.utf;
    string r=`"`;
    for(typeof(x.length) 
i=0;i<x.length;r~=x[i..i+utf.stride(x,i)],i+=utf.stride(x,i)){
        if(x[i]=='@'&&x[i+1]=='('){
            auto start = ++i; int nest=1;
            while(nest){
                i+=utf.stride(x,i);
                if(x[i]=='(') nest++;
                else if(x[i]==')') nest--;
            }
            i++;
            r~=`"~`~x[start..i]~`~"`;
            if(i==x.length) break;
        }
        if(x[i]=='"'||x[i]=='\\'){r~="\\"; continue;}
    }
    return r~`"`;
}

template X(string x){
    enum X = Ximpl(x);
}


/********************************************************************
 ** Definitions for pre and post increment of pointers.
 *******************************************************************/

// uses *(*&x)++ and similar to work around a bug in the parser

version(PRE_INC_PTRS){
    string START_VAL(string x)           {return mixin(X!q{(*&@(x))--;});}
    string INC_VAL(string x)             {return mixin(X!q{*++(*&@(x))});}
    string CAST_TO_U8(string p, string o){
        return mixin(X!q{(cast(ubyte*)@(p) + @(o) + TYPE_WIDTH)});
    }
    enum WHILE_DEST_BREAK  =                     (TYPE_WIDTH - 1);
    enum PRE_LOOP_ADJUST   =                     q{- (TYPE_WIDTH - 1)};
    enum PRE_SWITCH_ADJUST =                     q{+ 1};
}else{
    string START_VAL(string x)           {return q{};}
    string INC_VAL(string x)             {return mixin(X!q{*(*&@(x))++});}
    string CAST_TO_U8(string p, string o){
        return mixin(X!q{(cast(ubyte*)@(p) + @(o))});
    }
    enum WHILE_DEST_BREAK  =                     0;
    enum PRE_LOOP_ADJUST   =                     q{};
    enum PRE_SWITCH_ADJUST =                     q{};
}




/********************************************************************
 ** Definitions for endians
 *******************************************************************/

version(LITTLE_ENDIAN){
    enum SHL = q{>>};
    enum SHR = q{<<};
}else{
    enum SHL = q{<<};
    enum SHR = q{>>};
}

/********************************************************************
 ** Macros for copying words of  different alignment.
 ** Uses incremening pointers.
 *******************************************************************/

string CP_INCR() {
    return mixin(X!q{
        @(INC_VAL(q{dstN})) = @(INC_VAL(q{srcN}));
    });
}

string CP_INCR_SH(string shl, string shr) {
    return mixin(X!q{
        dstWord   = srcWord @(SHL) @(shl);
        srcWord   = @(INC_VAL(q{srcN}));
        dstWord  |= srcWord @(SHR) @(shr);
        @(INC_VAL(q{dstN})) = dstWord;
    });
}



/********************************************************************
 ** Macros for copying words of  different alignment.
 ** Uses array indexes.
 *******************************************************************/

string CP_INDEX(string idx) {
    return mixin(X!q{
        dstN[@(idx)] = srcN[@(idx)];
    });
}

string CP_INDEX_SH(string x, string shl, string shr) {
    return mixin(X!q{
        dstWord   = srcWord @(SHL) @(shl);
        srcWord   = srcN[@(x)];
        dstWord  |= srcWord @(SHR) @(shr);
        dstN[@(x)]= dstWord;
    });
}



/********************************************************************
 ** Macros for copying words of different alignment.
 ** Uses incremening pointers or array indexes depending on
 ** configuration.
 *******************************************************************/

version(INDEXED_COPY){
    alias CP_INDEX CP;
    alias CP_INDEX_SH CP_SH;
    string INC_INDEX(string p, string o){
        return mixin(X!q{
            ((@(p)) += (@(o)));
        });
    }
}else{
    string CP(string idx) {return mixin(X!q{@(CP_INCR())});}
    string CP_SH(string idx, string shl, string shr){
        return mixin(X!q{
            @(CP_INCR_SH(mixin(X!q{@(shl)}), mixin(X!q{@(shr)})));
        });
    }
    string INC_INDEX(string p, string o){return q{};}
}


string COPY_REMAINING(string count) {
    return mixin(X!q{
        @(START_VAL(q{dst8}));
        @(START_VAL(q{src8}));

        switch (@(count)) {
        case 7: @(INC_VAL(q{dst8})) = @(INC_VAL(q{src8}));
        case 6: @(INC_VAL(q{dst8})) = @(INC_VAL(q{src8}));
        case 5: @(INC_VAL(q{dst8})) = @(INC_VAL(q{src8}));
        case 4: @(INC_VAL(q{dst8})) = @(INC_VAL(q{src8}));
        case 3: @(INC_VAL(q{dst8})) = @(INC_VAL(q{src8}));
        case 2: @(INC_VAL(q{dst8})) = @(INC_VAL(q{src8}));
        case 1: @(INC_VAL(q{dst8})) = @(INC_VAL(q{src8}));
        case 0:
        default: break;
        }
    });
}

string COPY_NO_SHIFT() {
    return mixin(X!q{
        UIntN* dstN = cast(UIntN*)(dst8 @(PRE_LOOP_ADJUST));
        UIntN* srcN = cast(UIntN*)(src8 @(PRE_LOOP_ADJUST));
        size_t length = count / TYPE_WIDTH;

        while (length & 7) {
            @(CP_INCR());
            length--;
        }

        length /= 8;

        while (length--) {
            @(CP(q{0}));
            @(CP(q{1}));
            @(CP(q{2}));
            @(CP(q{3}));
            @(CP(q{4}));
            @(CP(q{5}));
            @(CP(q{6}));
            @(CP(q{7}));

            @(INC_INDEX(q{dstN}, q{8}));
            @(INC_INDEX(q{srcN}, q{8}));
        }

        src8 = @(CAST_TO_U8(q{srcN}, q{0}));
        dst8 = @(CAST_TO_U8(q{dstN}, q{0}));

        @(COPY_REMAINING(q{count & (TYPE_WIDTH - 1)}));

        return dest;
    });
}



string COPY_SHIFT(string shift) {
    return mixin(X!q{
        UIntN* dstN  = cast(UIntN*)(((cast(UIntN)dst8) 
@(PRE_LOOP_ADJUST)) &
                                    ~(TYPE_WIDTH - 1));
        UIntN* srcN  = cast(UIntN*)(((cast(UIntN)src8) 
@(PRE_LOOP_ADJUST)) &
                                    ~(TYPE_WIDTH - 1));
        size_t length  = count / TYPE_WIDTH;
        UIntN srcWord = @(INC_VAL(q{srcN}));
        UIntN dstWord;

        while (length & 7) {
            @(CP_INCR_SH(mixin(X!q{8 * @(shift)}), mixin(X!q{8 * 
(TYPE_WIDTH - @(shift))})));
            length--;
        }

        length /= 8;

        while (length--) {
            @(CP_SH(q{0}, mixin(X!q{8 * @(shift)}), mixin(X!q{8 * 
(TYPE_WIDTH - @(shift))})));
            @(CP_SH(q{1}, mixin(X!q{8 * @(shift)}), mixin(X!q{8 * 
(TYPE_WIDTH - @(shift))})));
            @(CP_SH(q{2}, mixin(X!q{8 * @(shift)}), mixin(X!q{8 * 
(TYPE_WIDTH - @(shift))})));
            @(CP_SH(q{3}, mixin(X!q{8 * @(shift)}), mixin(X!q{8 * 
(TYPE_WIDTH - @(shift))})));
            @(CP_SH(q{4}, mixin(X!q{8 * @(shift)}), mixin(X!q{8 * 
(TYPE_WIDTH - @(shift))})));
            @(CP_SH(q{5}, mixin(X!q{8 * @(shift)}), mixin(X!q{8 * 
(TYPE_WIDTH - @(shift))})));
            @(CP_SH(q{6}, mixin(X!q{8 * @(shift)}), mixin(X!q{8 * 
(TYPE_WIDTH - @(shift))})));
            @(CP_SH(q{7}, mixin(X!q{8 * @(shift)}), mixin(X!q{8 * 
(TYPE_WIDTH - @(shift))})));

            @(INC_INDEX(q{dstN}, q{8}));
            @(INC_INDEX(q{srcN}, q{8}));
        }

        src8 = @(CAST_TO_U8(q{srcN}, mixin(X!q{(@(shift) - 
TYPE_WIDTH)})));
        dst8 = @(CAST_TO_U8(q{dstN}, q{0}));

        @(COPY_REMAINING(q{count & (TYPE_WIDTH - 1)}));

        return dest;
    });
}


/********************************************************************
 **
 ** void *memcpy(void *dest, const void *src, size_t count)
 **
 ** Args:     dest        - pointer to destination buffer
 **           src         - pointer to source buffer
 **           count       - number of bytes to copy
 **
 ** Return:   A pointer to destination buffer
 **
 ** Purpose:  Copies count bytes from src to dest.
 **           No overlap check is performed.
 **
 *******************************************************************/

void *memcpy(void *dest, const void *src, size_t count)
{
    ubyte* dst8 = cast(ubyte*)dest;
    ubyte* src8 = cast(ubyte*)src;
    if (count < 8) {
        mixin(COPY_REMAINING(q{count}));
        return dest;
    }

    mixin(START_VAL(q{dst8}));
    mixin(START_VAL(q{src8}));

    while ((cast(UIntN)dst8 & (TYPE_WIDTH - 1)) != WHILE_DEST_BREAK) {
        mixin(INC_VAL(q{dst8})) = mixin(INC_VAL(q{src8}));
        count--;
    }
    switch ((mixin(`(cast(UIntN)src8)`~ PRE_SWITCH_ADJUST)) & 
(TYPE_WIDTH - 1)) {
    // { } required to work around DMD bug
    case 0: {mixin(COPY_NO_SHIFT());} break;
    case 1: {mixin(COPY_SHIFT(q{1}));}   break;
    case 2: {mixin(COPY_SHIFT(q{2}));}   break;
    case 3: {mixin(COPY_SHIFT(q{3}));}   break;
static if(TYPE_WIDTH > 4){ // was TYPE_WIDTH >= 4. bug in original code.
    case 4: {mixin(COPY_SHIFT(q{4}));}   break;
    case 5: {mixin(COPY_SHIFT(q{5}));}   break;
    case 6: {mixin(COPY_SHIFT(q{6}));}   break;
    case 7: {mixin(COPY_SHIFT(q{7}));}   break;
}
    default: assert(0);
    }
}


void main(){
    int[13] x = [1,2,3,4,5,6,7,8,9,0,1,2,3];
    int[13] y;
    memcpy(y.ptr, x.ptr, x.sizeof);
    import std.stdio;   writeln(y);
}
December 30, 2011
Re: System programming in D (Was: The God Language)
On Thursday, 29 December 2011 at 19:47:39 UTC, Walter Bright 
wrote:
> On 12/29/2011 3:19 AM, Vladimir Panteleev wrote:
>> I'd like to invite you to translate Daniel Vik's C memcpy 
>> implementation to D:
>> http://www.danielvik.com/2010/02/fast-memcpy-in-c.html
>
> Challenge accepted.

Ah, a direct translation using functions! This is probably the 
most elegant approach, however - as I'm sure you've noticed - the 
programmer has no control over what gets inlined.

> Examining the assembler output, it inlines everything except 
> COPY_SHIFT, COPY_NO_SHIFT, and COPY_REMAINING. The inliner in 
> dmd could definitely be improved, but that is not a problem 
> with the language, but the implementation.

This is the problem with heuristic inlining: while great by 
itself, in a position such as this the programmer is left with no 
choice but to examine the assembler output to make sure the 
compiler does what the programmer wants it to do. Such behavior 
can change from one implementation to another, and even from one 
compiler version to another. (After all, I don't think that we 
can guarantee that what's inlined today, will be inlined 
tomorrow.)

> Continuing in that vein, please note that neither C nor C++ 
> require inlining of any sort. The "inline" keyword is merely a 
> hint to the compiler. What inlining takes place is completely 
> implementation defined, not language defined.

I think we can agree that the C inline hint is of limited use. 
However, major C compiler vendors implement an extension to force 
inlining. Generally, I would say that common vendor extensions 
seen in other languages are an opportunity for D to avoid a 
similar mess: such extensions would not have to be required to be 
implemented, but when they are, they would use the same syntax 
across implementations.

> I wish to note that the D version semantically accomplishes the 
> same thing as the C version without using mixins or CTFE - it's 
> all straightforward code, without the abusive preprocessor 
> tricks.

I don't think there's much value in that statement. After all, 
except for a few occasional templates (which weren't strictly 
necessary), your translation uses few D-specific features. If you 
were to leave yourself at the mercy of a C compiler's optimizer, 
your rewrite would merely be a testament against C macros, not 
the power of D.

However, the most important part is: this translation is 
incorrect. C macros in the original code provide a guarantee that 
the code is inlined. D cannot make such guarantees - even your 
amended version is tuned to one specific implementation (and 
possibly, only a specific range of versions of it).
December 30, 2011
Re: System programming in D (Was: The God Language)
On Thursday, 29 December 2011 at 20:58:59 UTC, Timon Gehr wrote:
> I don't think you should use DMD to benchmark the D language.

You're missing my point. We can't count that optimizers in all 
implementations will be perfect. I am suggesting language 
features which could provide guarantees to the programmer 
regarding how the code will be compiled. If an implementation 
cannot satisfy them, the programmer should be told so, so he 
could try something else - rather than having to sift through 
disassembler listings or use a profiler.
December 30, 2011
Re: System programming in D (Was: The God Language)
On Thursday, 29 December 2011 at 23:47:08 UTC, Timon Gehr wrote:
> ** The X template

Good work, but I'm not sure if inventing a DSL to make up for the 
problems in D string mixins that C macros don't have qualifies as 
"doing it right".
December 30, 2011
Re: System programming in D (Was: The God Language)
On 12/29/2011 9:51 PM, Vladimir Panteleev wrote:
> Ah, a direct translation using functions! This is probably the most elegant
> approach, however - as I'm sure you've noticed - the programmer has no control
> over what gets inlined.

The programmer also has no control over which variables go into which registers. 
(Early C compilers did provide this.)


> I think we can agree that the C inline hint is of limited use. However, major C
> compiler vendors implement an extension to force inlining.

I know.


> I don't think there's much value in that statement. After all, except for a few
> occasional templates (which weren't strictly necessary), your translation uses
> few D-specific features. If you were to leave yourself at the mercy of a C
> compiler's optimizer, your rewrite would merely be a testament against C macros,
> not the power of D.

I think this criticism is off target, because the C example was almost entirely 
macros - and macros that were used in the service of evading C language 
limitations. The point wasn't to use clever D features, the challenge was to 
demonstrate you can get the same results in D as in C.


> However, the most important part is: this translation is incorrect. C macros in
> the original code provide a guarantee that the code is inlined. D cannot make
> such guarantees - even your amended version is tuned to one specific
> implementation (and possibly, only a specific range of versions of it).

I also think this is off target, because a C compiler really doesn't guarantee 
**** about efficiency, it only guarantees that it will work "as if" it was 
executed on some idealized abstract machine. Even dividing code up into 
functions is completely arbitrary, and open to wildly different strategies that 
are perfectly legal to any C compiler. A C compiler doesn't have to enregister 
anything in variables, either, and that has far more of a performance impact 
than inlining.

There are a very wide range of code generation techniques that compilers employ. 
All of them, to verify that they are being applied, require inspection of the 
assembler output. Many argue that the compiler should tell you about inlining - 
but what about all those others? I think the focus on inlining (as opposed to 
other possible optimizations) is out of proportion, likely exacerbated by dmd 
needing to do a better job of it.

I completely agree that DMD's inliner is underpowered and needs improvement. I 
am less sure that this demonstrates that the language needs changes.

Functions below a certain size should be inlined if possible. Those above that 
size do not benefit perceptibly from inlining. Where that certain size exactly 
is, who knows, but I doubt that functions near that size will benefit much from 
user intervention.
December 30, 2011
Re: System programming in D (Was: The God Language)
On Friday, 30 December 2011 at 06:53:06 UTC, Walter Bright wrote:
> I think this criticism is off target, because the C example was 
> almost entirely macros - and macros that were used in the 
> service of evading C language limitations. The point wasn't to 
> use clever D features, the challenge was to demonstrate you can 
> get the same results in D as in C.

...

> I also think this is off target, because a C compiler really 
> doesn't guarantee **** about efficiency, it only guarantees 
> that it will work "as if" it was executed on some idealized 
> abstract machine. Even dividing code up into functions is 
> completely arbitrary, and open to wildly different strategies 
> that are perfectly legal to any C compiler. A C compiler 
> doesn't have to enregister anything in variables, either, and 
> that has far more of a performance impact than inlining.

Even though the core language (of C and D) are not specific to 
any one platform, writing fast code has never been about 
targeting abstract idealized virtual machines. Some assumptions 
need to be made. Most assumptions that the C memcpy code makes 
can be expected to generally be true across major C compilers 
(e.g. macros are at least as fast as regular functions). However, 
your D port makes some rather fragile assumptions regarding the 
compiler implementation.

Let's eliminate the language distinction, and consider two memcpy 
versions - one using macros, the other using functions (not even 
with "inline"). Would you say that the second is generally as 
fast as the first? I'm being intentionally vague: saying that 
their performance is "about the same" is holding on MUCH more 
fragile assumptions.

The fact that major compiler vendors implement language 
extensions to facilitate writing optimized code shows that there 
is a demand for it. Even compilers that are great at optimization 
(GCC, LLVM) have such intrinsics.

I'm not necessarily advocating changing the core language (e.g. 
new @attributes, things that would need to go into TDPLv2). 
However, what I think would greatly improve the situation is to 
have DigitalMars provide recommendations for 
implementation-specific extensions that provide more control with 
regards to how the code is compiled (pragma names, keywords 
starting with __, etc.). Once they're defined, pull requests to 
add them to DMD will follow.

> Functions below a certain size should be inlined if possible. 
> Those above that size do not benefit perceptibly from inlining. 
> Where that certain size exactly is, who knows, but I doubt that 
> functions near that size will benefit much from user 
> intervention.

I agree, but this wasn't as much about heuristics, but compiler 
capabilities (e.g. inlining assembler functions).
1 2 3 4 5 6 7 8 9
Top | Discussion index | About this forum | D home