April 13, 2016
On 4/13/2016 3:58 AM, Marco Leise wrote:
> How about this style as an alternative?:
>
> immutable bool mmx;
> immutable bool hasPopcnt;
>
> shared static this()
> {
>      import gcc.builtins;
>      mmx       = __builtin_cpu_supports("mmx"   ) > 0;
>      hasPopcnt = __builtin_cpu_supports("popcnt") > 0;
> }
>

Please do not invent an alternative interface, use the one in core.cpuid:

   http://dlang.org/phobos/core_cpuid.html#.mmx
April 13, 2016
Am Wed, 13 Apr 2016 04:14:48 -0700
schrieb Walter Bright <newshound2@digitalmars.com>:

> On 4/13/2016 3:58 AM, Marco Leise wrote:
> > How about this style as an alternative?:
> >
> > immutable bool mmx;
> > immutable bool hasPopcnt;
> >
> > shared static this()
> > {
> >      import gcc.builtins;
> >      mmx       = __builtin_cpu_supports("mmx"   ) > 0;
> >      hasPopcnt = __builtin_cpu_supports("popcnt") > 0;
> > }
> > 
> 
> Please do not invent an alternative interface, use the one in core.cpuid:
> 
>     http://dlang.org/phobos/core_cpuid.html#.mmx

Yes, they are all @property and a substitution with direct access to the globals will work around GDC's lack of cross-module inlining. Otherwise these feature checks which might be used in hot code, are more costly than they should be. I hate when things get in the way of efficiency. :)

-- 
Marco

April 13, 2016
On 4/13/2016 5:47 AM, Marco Leise wrote:
> Yes, they are all @property and a substitution with direct
> access to the globals will work around GDC's lack of
> cross-module inlining. Otherwise these feature checks which
> might be used in hot code, are more costly than they should be.
> I hate when things get in the way of efficiency. :)


It doesn't need to be efficient, because such checks should be done at a higher level in the program's logic, not on low level code. Even so, the program could cache the result of the call.
April 14, 2016
On 13 April 2016 at 13:14, Walter Bright via Digitalmars-d <digitalmars-d@puremagic.com> wrote:
> On 4/13/2016 3:58 AM, Marco Leise wrote:
>>
>> How about this style as an alternative?:
>>
>> immutable bool mmx;
>> immutable bool hasPopcnt;
>>
>> shared static this()
>> {
>>      import gcc.builtins;
>>      mmx       = __builtin_cpu_supports("mmx"   ) > 0;
>>      hasPopcnt = __builtin_cpu_supports("popcnt") > 0;
>> }
>>
>
> Please do not invent an alternative interface, use the one in core.cpuid:
>
>    http://dlang.org/phobos/core_cpuid.html#.mmx

An alternative interface needs to be invented anyway for other CPUs.
April 14, 2016
On 4/14/2016 1:21 AM, Iain Buclaw via Digitalmars-d wrote:
> An alternative interface needs to be invented anyway for other CPUs.

That would be fine. But there is no reason to redo core.cpuid for x86 machines.
April 15, 2016
On Sunday, 3 April 2016 at 07:39:00 UTC, Manu wrote:
> On 3 April 2016 at 16:14, 9il via Digitalmars-d <digitalmars-d@puremagic.com> wrote:
>>
>> Is it possible to introduce compile time information about target platform? I am working on BLAS from scratch implementation. And it is no hope to create something useable without CT information about target.
>>
>> Best regards,
>> Ilya
>
> My SIMD implementation has been blocked on that for years too.
> I need to know the SIMD level flags passed to the compiler at least, and DMD needs to introduce the concept.

https://github.com/ldc-developers/ldc/pull/1434
April 15, 2016
On Tuesday, 12 April 2016 at 10:55:18 UTC, xenon325 wrote:
>
> Have you seen how GCC's function multiversioning [1] ?
>

I've been thinking about the gcc multiversioning since you mentioned it previously.

I keep thinking about how the optimal algorithm for something like matrix multiplication depends on the size of the matrices.

For instance, you might do something for very small matrices that just relies on one processor, then you add in SIMD as the size grows, then you add in multiple CPUs, then you add in the GPU (or maybe you add before CPUs), then you add in multiple computers.

I don't know how some of those choices would get made at compile time for dynamic arrays. Would need some kind of run-time approach. At least for static arrays, you could do multiple versions of the function and then use template constraints to call whichever function. Some tuning would be necessary.
April 16, 2016
Am Fri, 15 Apr 2016 18:54:12 +0000
schrieb jmh530 <john.michael.hall@gmail.com>:

> On Tuesday, 12 April 2016 at 10:55:18 UTC, xenon325 wrote:
> >
> > Have you seen how GCC's function multiversioning [1] ?
> > 
> 
> I've been thinking about the gcc multiversioning since you mentioned it previously.
> 
> I keep thinking about how the optimal algorithm for something like matrix multiplication depends on the size of the matrices.
> 
> For instance, you might do something for very small matrices that just relies on one processor, then you add in SIMD as the size grows, then you add in multiple CPUs, then you add in the GPU (or maybe you add before CPUs), then you add in multiple computers.

GCC only has one architecture as a target at a time. As long
as this is so, there is little point in contemplating how it
handles multiple architectures and network traffic. :)
CPUs run the bulk of code, from booting over kernel and
drivers to applications and there will always be something
that can be optimized if it is statically known that a certain
instruction set is supported. To pick up your matrices
example, imagine OpenGL code that has some 4x4 matrices that
are in no direct relation to each other. The GPU is only good
at bulk processing, and it doesn't apply here. So you need the
general purpose processor and benefit from the knowledge that
some SSE level is supported. In general, when you have to make
many quick decisions on small amounts of data the GPU or
networking are out of question.

-- 
Marco

April 16, 2016
Am Tue, 12 Apr 2016 23:22:37 -0700
schrieb Walter Bright <newshound2@digitalmars.com>:

> >            "mulq %[y]"
> >            : "=a" tmp.lo, "=d" tmp.hi : "a" x, [y] "rm" y;
> 
> I don't see anything elegant about those lines, starting with "mulq" is not in any of the AMD or Intel CPU manuals. The assembler should notice that 'y' is a ulong and select the 64 bit version of the MUL opcode automatically.
> 
> I can see nothing to recommend the:
> 
>      "=a" tmp.lo
> 
> syntax. How about something comprehensible like "tmp.lo = EAX"? I bet people could even figure that out without consulting stackoverflow! :-)
> 
> I have no idea what:
> 
>     "a" x
> 
> and:
> 
>      [y] "rm" y
> 
> mean, nor why the ":" appears sometimes and the "," other times.

Tell me again, what's more elgant !

        uint* pnb = cast(uint*)cf.processorNameBuffer.ptr;
        version(GNU)
        {
            asm { "cpuid" : "=a" pnb[0], "=b" pnb[1], "=c" pnb[ 2], "=d" pnb[ 3] : "a" 0x8000_0002; }
            asm { "cpuid" : "=a" pnb[4], "=b" pnb[5], "=c" pnb[ 6], "=d" pnb[ 7] : "a" 0x8000_0003; }
            asm { "cpuid" : "=a" pnb[8], "=b" pnb[9], "=c" pnb[10], "=d" pnb[11] : "a" 0x8000_0004; }
        }
        else version(D_InlineAsm_X86)
        {
            asm pure nothrow @nogc {
                push ESI;
                mov ESI, pnb;
                mov EAX, 0x8000_0002;
                cpuid;
                mov [ESI], EAX;
                mov [ESI+4], EBX;
                mov [ESI+8], ECX;
                mov [ESI+12], EDX;
                mov EAX, 0x8000_0003;
                cpuid;
                mov [ESI+16], EAX;
                mov [ESI+20], EBX;
                mov [ESI+24], ECX;
                mov [ESI+28], EDX;
                mov EAX, 0x8000_0004;
                cpuid;
                mov [ESI+32], EAX;
                mov [ESI+36], EBX;
                mov [ESI+40], ECX;
                mov [ESI+44], EDX;
                pop ESI;
            }
        }
        else version(D_InlineAsm_X86_64)
        {
            asm pure nothrow @nogc {
                push RSI;
                mov RSI, pnb;
                mov EAX, 0x8000_0002;
                cpuid;
                mov [RSI], EAX;
                mov [RSI+4], EBX;
                mov [RSI+8], ECX;
                mov [RSI+12], EDX;
                mov EAX, 0x8000_0003;
                cpuid;
                mov [RSI+16], EAX;
                mov [RSI+20], EBX;
                mov [RSI+24], ECX;
                mov [RSI+28], EDX;
                mov EAX, 0x8000_0004;
                cpuid;
                mov [RSI+32], EAX;
                mov [RSI+36], EBX;
                mov [RSI+40], ECX;
                mov [RSI+44], EDX;
                pop RSI;
            }
        }

-- 
Marco

April 16, 2016
On 4/16/2016 2:40 PM, Marco Leise wrote:
> Tell me again, what's more elgant !

If I wanted to write in assembler, I wouldn't write in a high level language, especially a weird one like GNU version.