June 18, 2018 Re: Encouraging preliminary results implementing memcpy in D | ||||
---|---|---|---|---|
| ||||
Posted in reply to David Nadlinger | On Sunday, 17 June 2018 at 17:00:00 UTC, David Nadlinger wrote: > On Wednesday, 13 June 2018 at 06:46:43 UTC, Mike Franklin wrote: >> https://github.com/JinShil/memcpyD >> >> […] >> >> Feedback, advise, and pull requests to improve the implementation are most welcome. > > The memcpyD implementation is buggy; it assumes that all arguments are aligned to their size. This isn't necessarily true. For example, `ubyte[1024].alignof == 1`, and struct alignment can also be set explicitly using align(N). Yes, I'm already aware of that. My plan is to create optimized implementations for aligned data, and then handled unaligned data as compositions of the various aligned implementations. For example a 3 byte copy would be a short copy plus a byte copy. That may not be appropriate for all cases. I'll have to measure, and adapt. > On x86, you can get away with this in a lot of cases even though it's undefined behaviour [1], but this is not necessarily the case for SSE/AVX instructions. In fact, that's probably a pretty good guess as to where those weird crashes you mentioned come from. Thanks! I think you're right. > For loading into vector registers, you can use core.simd.loadUnaligned instead (ldc.simd.loadUnaligned for LDC – LDC's druntime has not been updated yet after {load, store}Unaligned were added upstream as well). Unfortunately the code gen is quite a bit worse: Exibit A: https://run.dlang.io/is/jIuHRG *(cast(void16*)(&s2)) = *(cast(const void16*)(&s1)); _Dmain: push RBP mov RBP,RSP sub RSP,020h lea RAX,-020h[RBP] xor ECX,ECX mov [RAX],RCX mov 8[RAX],RCX lea RDX,-010h[RBP] mov [RDX],RCX mov 8[RDX],RCX movdqa XMM0,-020h[RBP] movdqa -010h[RBP],XMM0 xor EAX,EAX leave ret add [RAX],AL .text._Dmain ends Exhibit B: https://run.dlang.io/is/PLRfhW storeUnaligned(cast(void16*)(&s2), loadUnaligned(cast(const void16*)(&s1))); _Dmain: push RBP mov RBP,RSP sub RSP,050h lea RAX,-050h[RBP] xor ECX,ECX mov [RAX],RCX mov 8[RAX],RCX lea RDX,-040h[RBP] mov [RDX],RCX mov 8[RDX],RCX mov -030h[RBP],RDX mov -010h[RBP],RAX movdqu XMM0,[RAX] movdqa -020h[RBP],XMM0 movdqa XMM1,-020h[RBP] movdqu [RDX],XMM1 xor EAX,EAX leave ret add [RAX],AL .text._Dmain ends If the code gen was better, that would definitely be the way to go; to have unaligned and aligned share the same implementation. Maybe I can fix the DMD code gen, or implement a `copyUnaligned` intrinsic. Also, there doesn't seem to be any equivalent 32-byte implementations in `core.simd`. Is that just because noone's bother to implement them yet? And with AVX512, we should probably have 64-byte implementations as well. Mike |
June 18, 2018 Re: Encouraging preliminary results implementing memcpy in D | ||||
---|---|---|---|---|
| ||||
Posted in reply to Mike Franklin | On Monday, 18 June 2018 at 02:31:25 UTC, Mike Franklin wrote:
> Unfortunately the code gen is quite a bit worse:
Scratch that. If compiling with -O it seems to do the right thing.
Mike
|
Copyright © 1999-2021 by the D Language Foundation