April 06, 2016
On Tuesday, 5 April 2016 at 21:41:46 UTC, Johan Engelen wrote:
> On Tuesday, 5 April 2016 at 21:29:41 UTC, Walter Bright wrote:
>> 
>> I want to make it clear that dmd does not generate AFX specific code, has no switch to enable AFX code generation and has no basis for setting predefined version identifiers for it.
>
> How about adding a "__target(...)" compile-time function, that would return false if the compiler doesn't know?
>
> __target("broadwell")  --> true means: target cpu is broadwell, false means compiler doesn't know or target cpu is not broadwell.
>
> Would that work for all?

Yes, something like that is what I am looking for.
Two nitpicks:
1. __target("broadwell") is not well API. Something like that would be more efficient:
enum target = __target();
// .. use target
2. Is it possible to reflect additional settings about instruction set? Maybe "broadwell,-avx"?
April 06, 2016
On Wednesday, 6 April 2016 at 06:11:15 UTC, 9il wrote:
>
> Yes, only few of us would use this feature directly, however, many of us would use this under-the-hood in BLAS/SIMD oriented part of Phobos.

Especially since everyone says to use LDC for the fastest code anyway...
April 06, 2016
On 5 April 2016 at 20:30, Walter Bright via Digitalmars-d <digitalmars-d@puremagic.com> wrote:
> On 4/5/2016 2:39 AM, 9il wrote:
>>
>> On Tuesday, 5 April 2016 at 08:34:32 UTC, Walter Bright wrote:
>>>
>>> On 4/4/2016 11:10 PM, 9il wrote:
>>> I still don't understand why you cannot just set '-version=xxx' on the
>>> command
>>> line and then switch off that version in your custom code.
>>
>>
>> I can do it, however I would like to get this information from compiler. Why?
>>
>> 1. This would help to eliminate configuration bugs.
>> 2. This would reduce work for users and simplified user experience.
>> 3. This is possible and not very hard to implement if I am not wrong.
>
>
>
> Where does the compiler get the information that it should compile for, say, AFX?

I would add that GDC and LDC have such compiler flags and it's
possible that they could pass the state of those flags through as
versions, but all compilers need to agree on the set of versions that
will be defined for this purpose. If DMD users express them as
-version=[STANDARD_VERSION_NAME], that's fine, I guess, but a proper
flag would help avoid the situation where people get the version names
wrong, and it feels a little bit more deliberate.
Setting a version this way might lead them to presume that it's just
an arbitrary setting by the author of the build script, and not
actually an agreed standard name that GDC and LDC also produce from
their compiler flags.

But at very least, the important detail is that the version ID's are standardised and shared among all compilers.
April 06, 2016
On 6 April 2016 at 07:41, Johan Engelen via Digitalmars-d <digitalmars-d@puremagic.com> wrote:
> On Tuesday, 5 April 2016 at 21:29:41 UTC, Walter Bright wrote:
>>
>>
>> I want to make it clear that dmd does not generate AFX specific code, has no switch to enable AFX code generation and has no basis for setting predefined version identifiers for it.
>
>
> How about adding a "__target(...)" compile-time function, that would return false if the compiler doesn't know?
>
> __target("broadwell")  --> true means: target cpu is broadwell, false means compiler doesn't know or target cpu is not broadwell.
>
> Would that work for all?

With respect to SIMD, knowing a processor model like 'broadwell' is not helpful, since we really want to know 'sse4'. If we know processor model, then we need to keep a compile-time table in our code somewhere if every possible cpu ever known and it's associated feature set. Knowing the feature we're interested is what we need.
April 06, 2016
On Wednesday, 6 April 2016 at 12:40:04 UTC, Manu wrote:
> On 6 April 2016 at 07:41, Johan Engelen via Digitalmars-d <digitalmars-d@puremagic.com> wrote:
>> [...]
>
> With respect to SIMD, knowing a processor model like 'broadwell' is not helpful, since we really want to know 'sse4'. If we know processor model, then we need to keep a compile-time table in our code somewhere if every possible cpu ever known and it's associated feature set. Knowing the feature we're interested is what we need.

Yes, however this can be implemented in a spcial Phobos module. So compilers would need less work. --Ilya
April 06, 2016
On Wednesday, 6 April 2016 at 13:26:51 UTC, 9il wrote:
> On Wednesday, 6 April 2016 at 12:40:04 UTC, Manu wrote:
>> On 6 April 2016 at 07:41, Johan Engelen via Digitalmars-d <digitalmars-d@puremagic.com> wrote:
>>> [...]
>>
>> With respect to SIMD, knowing a processor model like 'broadwell' is not helpful, since we really want to know 'sse4'. If we know processor model, then we need to keep a compile-time table in our code somewhere if every possible cpu ever known and it's associated feature set. Knowing the feature we're interested is what we need.
>
> Yes, however this can be implemented in a spcial Phobos module. So compilers would need less work. --Ilya

After browsing through some LLVM code, I think is actually very easy for LDC to also tell you about which features (sse2, avx, etc.) a target supports.

Probably the most difficult part is defining an API. Ilya made a start here:
http://forum.dlang.org/post/eodutgruoofruperrgif@forum.dlang.org
(but he doesn't like his earlier API "bool a = __target("broadwell")" any more ;-P , I also think enum cpu = __target(); would be nicer)

April 06, 2016
On Wednesday, 6 April 2016 at 14:31:58 UTC, Johan Engelen wrote:
> Probably the most difficult part is defining an API. Ilya made a start here:
> http://forum.dlang.org/post/eodutgruoofruperrgif@forum.dlang.org
> (but he doesn't like his earlier API "bool a = __target("broadwell")" any more ;-P , I also think enum cpu = __target(); would be nicer)

Ahaha))  --Ilya
April 06, 2016
On 4/6/2016 5:36 AM, Manu via Digitalmars-d wrote:
> But at very least, the important detail is that the version ID's are
> standardised and shared among all compilers.

It's a reasonable suggestion; some points:

1. This has been characterized as a blocker, it is not, as it does not impede writing code that takes advantage of various SIMD code generation at compile time.

2. I'm not sure these global settings are the best approach, especially if one is writing applications that dynamically adjusts based on the CPU the user is running on. The main trouble comes about when different modules are compiled with different settings. What happens with template code generation, when the templates are pulled from different modules? What happens when COMDAT functions are generated? (The linker picks one arbitrarily and discards the others.) Which settings wind up in the executable will be not easily predictable.

I suspect that using a pragma would be a much better approach:

   pragma(SIMD, AFX)
   {
	... code ...
   }

Doing it on the command line is certainly the traditional way, but it strikes me as being bug-prone and as unhygienic and obsolete as the C preprocessor is (for similar reasons).
April 07, 2016
On 6 April 2016 at 23:26, 9il via Digitalmars-d <digitalmars-d@puremagic.com> wrote:
> On Wednesday, 6 April 2016 at 12:40:04 UTC, Manu wrote:
>>
>> On 6 April 2016 at 07:41, Johan Engelen via Digitalmars-d <digitalmars-d@puremagic.com> wrote:
>>>
>>> [...]
>>
>>
>> With respect to SIMD, knowing a processor model like 'broadwell' is not helpful, since we really want to know 'sse4'. If we know processor model, then we need to keep a compile-time table in our code somewhere if every possible cpu ever known and it's associated feature set. Knowing the feature we're interested is what we need.
>
>
> Yes, however this can be implemented in a spcial Phobos module. So compilers would need less work. --Ilya

Sure, but it's an ongoing maintenance task, constantly requiring
population with metadata for new processors that become available.
Remember, most processors are arm processors, and there are like 20
manufacturers of arm chips, and many of those come in a series of
minor variations with/without sub-features present, and in a lot of
cases, each permutation of features attached to random manufacturers
arm chip 'X' doesn't actually have a name to describe it. It's also
completely impractical to declare a particular arm chip by name when
compiling for arm. It's a sloppy relationship comparing intel and AMD
let alone the myriad of arm chips available.
TL;DR, defining architectures with an intel-centric naming convention
is a very bad idea.
April 07, 2016
On 7 April 2016 at 10:42, Walter Bright via Digitalmars-d <digitalmars-d@puremagic.com> wrote:
> On 4/6/2016 5:36 AM, Manu via Digitalmars-d wrote:
>>
>> But at very least, the important detail is that the version ID's are standardised and shared among all compilers.
>
>
> It's a reasonable suggestion; some points:
>
> 1. This has been characterized as a blocker, it is not, as it does not impede writing code that takes advantage of various SIMD code generation at compile time.

It's sufficiently blocking that I have not felt like working any
further without this feature present. I can't feel like it 'works' or
it's 'done', until I can demonstrate this functionality.
Perhaps we can call it a psychological blocker, and I am personally
highly susceptible to those.

> 2. I'm not sure these global settings are the best approach, especially if one is writing applications that dynamically adjusts based on the CPU the user is running on.

They are necessary to provide a baseline. It is typical when building
code that you specify a min-spec. This is what's used by default
throughout the application.
Runtime selection is not practical in a broad sense. Emitting small
fragments of SIMD here and there will probably take a loss if they are
all surrounded by a runtime selector. SIMD is all about pipelining,
and runtime branches on SIMD version are antithesis to good SIMD
usage; they can't be applied for small-scale deployment.
In my experience, runtime selection is desirable for large scale
instantiations at an outer level of the work loop. I've tried to
design this intent in my library, by making each simd API capable of
receiving SIMD version information via template arg, and within the
library, the version is always passed through to dependent calls.
The Idea is, if you follow this pattern; propagating a SIMD version
template arg through to your outer function, then you can instantiate
your higher-level work function for any number of SIMD feature
combinations you feel is appropriate.
Naturally, this process requires a default, otherwise this usage
baggage will cloud the API everywhere (rather than in the few cases
where a developer specifically wants to make use of it), and many
developers in 2015 feel SSE2 is a weak default. I would choose SSE4.1
in my applications, xbox developers would choose AVX1, it's very
application/target-audience specific, but SSE2 is the only reasonable
selection if we are not to accept a hint from the command line.

> The main trouble comes about when different modules are
> compiled with different settings. What happens with template code
> generation, when the templates are pulled from different modules? What
> happens when COMDAT functions are generated? (The linker picks one
> arbitrarily and discards the others.) Which settings wind up in the
> executable will be not easily predictable.

In my library design, the baseline simd version (expected from the
compiler) is mangled into the symbols, just as in the case a user
overrides it when instantiating a code path that may be selected on
runtime branch.
I had imagined this would solve such link related symbol selection
problems. Can you think of cases where this is insufficient?


> I suspect that using a pragma would be a much better approach:
>
>    pragma(SIMD, AFX)
>    {
>         ... code ...
>    }
>
> Doing it on the command line is certainly the traditional way, but it strikes me as being bug-prone and as unhygienic and obsolete as the C preprocessor is (for similar reasons).

I've done it with a template arg because it can be manually
propagated, and users can extrapolate the pattern into their outer
work functions, which can then easily have multiple versions
instantiated for runtime selection.
I think it's also important to mangle it into the symbol name for the
reasons I mention above.