October 08, 2016
On Monday, 26 September 2016 at 18:43:38 UTC, Ilya Yaroshenko wrote:
> 4. Generic unaligned load/store like (like LDC loadUnaligned and storeUnaligned)

See https://github.com/MartinNowak/druntime/blob/23373260e65af5edea989b61d6660832fedbec15/src/core/internal/arrayop.d#L78.
October 08, 2016
On Monday, 26 September 2016 at 20:11:19 UTC, Ilya Yaroshenko wrote:
> Yes, the same true for Mir too. A precompiled library based on top of Mir GLAS can be used with DMD.

Is this feasible, i.e. is there a finite amount of kernels that we can precompile and use?
I thought the kernels were fully template generated, but your idea suggests sth. different.
October 08, 2016
On Saturday, 8 October 2016 at 17:26:17 UTC, Martin Nowak wrote:
> On Monday, 26 September 2016 at 18:43:38 UTC, Ilya Yaroshenko wrote:
>> 4. Generic unaligned load/store like (like LDC loadUnaligned and storeUnaligned)
>
> See https://github.com/MartinNowak/druntime/blob/23373260e65af5edea989b61d6660832fedbec15/src/core/internal/arrayop.d#L78.

Could you please give an example how it works for user?
I mean aligned vs unaligned.

Does this is always inlined intrinsic (i mean this function has not any its machine code in the object file / library e.g. always inlined into the function body even in debug compilaiton)?

October 08, 2016
On Saturday, 8 October 2016 at 17:28:14 UTC, Martin Nowak wrote:
> On Monday, 26 September 2016 at 20:11:19 UTC, Ilya Yaroshenko wrote:
>> Yes, the same true for Mir too. A precompiled library based on top of Mir GLAS can be used with DMD.
>
> Is this feasible, i.e. is there a finite amount of kernels that we can precompile and use?
> I thought the kernels were fully template generated, but your idea suggests sth. different.

Mir is generic library and it will have few dub configurations. For example a configuration which provides precompiled extern(C) BLAS API for common types. This is useful for D as for other languages like C or Julia. In addition, it allows to have few precompiled versions, optimized for different CPUs: x87, SSE2, AVX intel, AVX amd, AVX2 intel, AVX2 amd, and etc. The proper configuration may be chosen RT or CT.
October 08, 2016
On 10/8/16 1:22 PM, Martin Nowak wrote:
> Integrating this with a pre-compiled ldc library is a fantastic idea OTOH.
> If we can make this work, it will be much less effort and yield the
> fastest implementation. Also would speed up the development cycle a bit
> b/c the kernels don't need to be recompiled/optimized.

You mean dmd/ldc/etc interop at binary level? Yes, that would be pretty rad indeed! -- Andrei

October 08, 2016
On 10/8/16 2:49 PM, Andrei Alexandrescu wrote:
> On 10/8/16 1:22 PM, Martin Nowak wrote:
>> Integrating this with a pre-compiled ldc library is a fantastic idea
>> OTOH.
>> If we can make this work, it will be much less effort and yield the
>> fastest implementation. Also would speed up the development cycle a bit
>> b/c the kernels don't need to be recompiled/optimized.
>
> You mean dmd/ldc/etc interop at binary level? Yes, that would be pretty
> rad indeed! -- Andrei

(after thinking a bit more) ... but Mir seems to rely in good part on templates, which makes pre-compiled libraries less effective. -- Andrei

October 08, 2016
On 10/8/2016 10:26 AM, Martin Nowak wrote:
>
> See
> https://github.com/MartinNowak/druntime/blob/23373260e65af5edea989b61d6660832fedbec15/src/core/internal/arrayop.d#L78.
>

Further information should be posted here:

https://issues.dlang.org/show_bug.cgi?id=16558
October 08, 2016
On 9/28/2016 2:48 AM, Ilya Yaroshenko wrote:
> On Wednesday, 28 September 2016 at 09:41:02 UTC, Jacob Carlborg wrote:
>> On 2016-09-28 11:06, Ilya Yaroshenko wrote:
>>
>>> Done. Full DMD performance Issues related to Mir list can be found here
>>> https://github.com/libmir/mir/wiki/Compiler-and-druntime-bugs#dmd-performance-issues
>>>
>>
>> It found be better to use the tag field in bugzilla instead of putting "[Mir]"
>> in the title.
>
> There are both tags and "[Mir]"

I added the 'performance' keyword to each of the issues.
October 10, 2016
On Saturday, 8 October 2016 at 18:53:32 UTC, Andrei Alexandrescu wrote:
>> You mean dmd/ldc/etc interop at binary level? Yes, that would be pretty

Should already work, but of courses isn't well tested.

> (after thinking a bit more) ... but Mir seems to rely in good part on templates, which makes pre-compiled libraries less effective. -- Andrei

Exactly, this is what I was wondering. Maybe it uses a finite set of precompilable kernels?
October 10, 2016
On Monday, 10 October 2016 at 05:20:56 UTC, Martin Nowak wrote:
>> (after thinking a bit more) ... but Mir seems to rely in good part on templates, which makes pre-compiled libraries less effective. -- Andrei
>
> Exactly, this is what I was wondering. Maybe it uses a finite set of precompilable kernels?

Well at least for gemm pointers to kernels are used, but they are still templated atm.
https://github.com/libmir/mir/blob/f7a904161df7af0a8443a0237a958460432f980c/source/mir/glas/internal/gemm.d#L97