Jump to page: 1 2
Thread overview
What LDC flags should be used to get the fastest executable on Windows?
Mar 06, 2021
Preetpal
Mar 06, 2021
Imperatorn
Mar 06, 2021
Preetpal
Mar 06, 2021
Preetpal
Mar 06, 2021
Dennis
Mar 06, 2021
Preetpal
Mar 06, 2021
Dennis
Mar 06, 2021
Preetpal
Mar 06, 2021
Imperatorn
Mar 06, 2021
Imperatorn
Mar 06, 2021
Preetpal
Mar 11, 2021
Lorenso
Mar 12, 2021
Imperatorn
March 06, 2021
I was wondering how to get the fastest binary using the LDC compiler.

I am currently using the following command to compile my program:
ldc2 -O3 -release -mcpu=native -flto=full -defaultlib=phobos2-ldc-lto,druntime-ldc-lto -m64 -betterC -static main.d

Is the -defaultlib command might be redundant if the betterC flag is being used?

This is probably pretty silly but the program I am trying to optimize is this one (https://gist.github.com/preetpalS/d2482d6ec91eb8147e6cff43ab197ed5). The program runs on each keystroke so I would like to optimize it.
March 06, 2021
On Saturday, 6 March 2021 at 06:29:11 UTC, Preetpal wrote:
> I was wondering how to get the fastest binary using the LDC compiler.
>
> I am currently using the following command to compile my program:
> ldc2 -O3 -release -mcpu=native -flto=full -defaultlib=phobos2-ldc-lto,druntime-ldc-lto -m64 -betterC -static main.d
>
> Is the -defaultlib command might be redundant if the betterC flag is being used?
>
> This is probably pretty silly but the program I am trying to optimize is this one (https://gist.github.com/preetpalS/d2482d6ec91eb8147e6cff43ab197ed5). The program runs on each keystroke so I would like to optimize it.

There's not much going on in the code there. Where are you experiencing problems? Have you profiled it?
March 06, 2021
On Saturday, 6 March 2021 at 09:07:05 UTC, Imperatorn wrote:
> On Saturday, 6 March 2021 at 06:29:11 UTC, Preetpal wrote:
>> I was wondering how to get the fastest binary using the LDC compiler.
>>
>> I am currently using the following command to compile my program:
>> ldc2 -O3 -release -mcpu=native -flto=full -defaultlib=phobos2-ldc-lto,druntime-ldc-lto -m64 -betterC -static main.d
>>
>> Is the -defaultlib command might be redundant if the betterC flag is being used?
>>
>> This is probably pretty silly but the program I am trying to optimize is this one (https://gist.github.com/preetpalS/d2482d6ec91eb8147e6cff43ab197ed5). The program runs on each keystroke so I would like to optimize it.
>
> There's not much going on in the code there. Where are you experiencing problems? Have you profiled it?

There is an issue with the code where the running program very infrequently misses when the SHIFT key is released. As this program keeps track of what modifier keys are currently pressed (as it is supposed keep track of all key presses and key releases), this can cause the wrong keyboard shortcut to be triggered.

This could be a performance related problem as the hook that this program installs in Windows (WH_KEYBOARD_LL) supposedly has a default timeout of 300 milliseconds (https://www.autohotkey.com/docs/commands/_IfTimeout.htm).

As it is small program, I re-implemented it in C (https://gist.github.com/preetpalS/81405cd78ade738034cfa6d49e2a4202) to see if it could reduce the problem I was seeing. Based on my observations it did reduce the problem but it did not eliminate it. This led me to believe that the issue I was seeing in the D version was performance-related.

I added additional compiler flags to LDC2 and recompiled the program and I do think there has been an improvement (instances of shift key releases not being registered have been reduced based on my observations).
- Before I was compiling with: ldc2 -O3 -release -m64 -betterC -static main.d
- Currently I am compiling with: ldc2 -O3 -release -mcpu=native -flto=full -defaultlib=phobos2-ldc-lto,druntime-ldc-lto -m64 -betterC -static main.d

I have not profiled the code yet as I am unsure of how to do so. I was thinking of adding a global counter (int or something) in the program in that would be incremented whenever the hook was fired and adding an error count that is incremented whenever a key press or release event for a modifier key does not change the tracked state but if the program was actually having performance issues this additional code would probably change the performance characteristics of the code anyways.
March 06, 2021
On Saturday, 6 March 2021 at 10:51:35 UTC, Preetpal wrote:
> On Saturday, 6 March 2021 at 09:07:05 UTC, Imperatorn wrote:
>> There's not much going on in the code there. Where are you experiencing problems? Have you profiled it?
>
> As it is small program, I re-implemented it in C (https://gist.github.com/preetpalS/81405cd78ade738034cfa6d49e2a4202) to see if it could reduce the problem I was seeing. Based on my observations it did reduce the problem but it did not eliminate it. This led me to believe that the issue I was seeing in the D version was performance-related.

I just really want to be sure that the D version of the program can match the C version of the program.
March 06, 2021
On Saturday, 6 March 2021 at 10:51:35 UTC, Preetpal wrote:
> On Saturday, 6 March 2021 at 09:07:05 UTC, Imperatorn wrote:
>> [...]
>
> There is an issue with the code where the running program very infrequently misses when the SHIFT key is released. As this program keeps track of what modifier keys are currently pressed (as it is supposed keep track of all key presses and key releases), this can cause the wrong keyboard shortcut to be triggered.
>
> [...]

I see, back in the day when I did stuff like this, I used GetAsyncKeyState instead. Have you tried that approach?
March 06, 2021
On Saturday, 6 March 2021 at 11:34:33 UTC, Imperatorn wrote:
> On Saturday, 6 March 2021 at 10:51:35 UTC, Preetpal wrote:
>> On Saturday, 6 March 2021 at 09:07:05 UTC, Imperatorn wrote:
>>> [...]
>>
>> There is an issue with the code where the running program very infrequently misses when the SHIFT key is released. As this program keeps track of what modifier keys are currently pressed (as it is supposed keep track of all key presses and key releases), this can cause the wrong keyboard shortcut to be triggered.
>>
>> [...]
>
> I see, back in the day when I did stuff like this, I used GetAsyncKeyState instead. Have you tried that approach?

Or maybe it was GetKeyboardState, don't remember, was about 17 years ago 😁
March 06, 2021
On Saturday, 6 March 2021 at 10:57:31 UTC, Preetpal wrote:
> On Saturday, 6 March 2021 at 10:51:35 UTC, Preetpal wrote:
>> On Saturday, 6 March 2021 at 09:07:05 UTC, Imperatorn wrote:
>>> There's not much going on in the code there. Where are you experiencing problems? Have you profiled it?
>>
>> As it is small program, I re-implemented it in C (https://gist.github.com/preetpalS/81405cd78ade738034cfa6d49e2a4202) to see if it could reduce the problem I was seeing. Based on my observations it did reduce the problem but it did not eliminate it. This led me to believe that the issue I was seeing in the D version was performance-related.
>
> I just really want to be sure that the D version of the program can match the C version of the program.

I very much doubt this is performance related.
Your program doesn't do any heavy computations itself, it just calls into the Windows API, which is dynamically linked, so link-time optimization makes no difference. Optimization flags like -O3 just shave off nanoseconds from the calling code.

A notable difference with your C version is that you use global variables, which in D are thread-local by default. To match C, make your declarations __gshared and see if it helps:

```
__gshared bool altPressed = false;
__gshared bool controlPressed = false;
__gshared bool shiftPressed = false;
__gshared bool winkeyPressed = false;
```
March 06, 2021
On Saturday, 6 March 2021 at 11:50:05 UTC, Dennis wrote:
> On Saturday, 6 March 2021 at 10:57:31 UTC, Preetpal wrote:
>> On Saturday, 6 March 2021 at 10:51:35 UTC, Preetpal wrote:
>>> On Saturday, 6 March 2021 at 09:07:05 UTC, Imperatorn wrote:
>>>> There's not much going on in the code there. Where are you experiencing problems? Have you profiled it?
>>>
>>> As it is small program, I re-implemented it in C (https://gist.github.com/preetpalS/81405cd78ade738034cfa6d49e2a4202) to see if it could reduce the problem I was seeing. Based on my observations it did reduce the problem but it did not eliminate it. This led me to believe that the issue I was seeing in the D version was performance-related.
>>
>> I just really want to be sure that the D version of the program can match the C version of the program.
>
> I very much doubt this is performance related.
> Your program doesn't do any heavy computations itself, it just calls into the Windows API, which is dynamically linked, so link-time optimization makes no difference. Optimization flags like -O3 just shave off nanoseconds from the calling code.
>
> A notable difference with your C version is that you use global variables, which in D are thread-local by default. To match C, make your declarations __gshared and see if it helps:
>
> ```
> __gshared bool altPressed = false;
> __gshared bool controlPressed = false;
> __gshared bool shiftPressed = false;
> __gshared bool winkeyPressed = false;
> ```

I am kind of skeptical that this problem is performance-related as well but based on the decreased number of times that this problem occurred during my usage of the C version of program versus the D version that was not compiled with as aggressive optimization flags, it suggests that this is a performance problem. Additionally after compiling the D version of the program with more aggressive optimization flags, I did notice the problem occur less frequently. It could be a coincidence but either way I would like to optimize this program.

Well I created the following test program on https://godbolt.org to see what kind of difference __gshared makes:

__gshared bool test = false;

extern (C) int main(string[] args) {
    testString(args[0]);
    return 0;
}

void testString(string input) {
    if (input.length > 5) {
        test = true;
    }
}

Using __gshared reduces the number of instructions the compiler generates, so it should make it faster. The godbolt website currently does not support compiling D on Windows, if it did I could just compare the generated code between the two versions of the program right now (the C program can be compiled there). Thanks for the suggestion about using __gshared. Also, using -O3 also seems to reduce the number of instructions generated in the small test program.

Without __gshared: https://godbolt.org/z/dKTGc6

With __gshared: https://godbolt.org/z/8hxP1q
March 06, 2021
On Saturday, 6 March 2021 at 13:45:42 UTC, Preetpal wrote:
> I am kind of skeptical that this problem is performance-related as well but based on the decreased number of times that this problem occurred during my usage of the C version of program versus the D version that was not compiled with as aggressive optimization flags, it suggests that this is a performance problem.

Have you compared your C version with and without optimization flags?
If that makes a difference, I think it's a race condition. Though optimization flags might change the timing and mask away the issue, it doesn't really solve the problem.
A performance problem would be if your hook function actually took near 300 ms to complete, but that doesn't seem the case here.

> Well I created the following test program on https://godbolt.org to see what kind of difference __gshared makes:

I didn't suggest __gshared because it is faster, I suggested because I don't know which thread calls your callback function. If SetWindowsHookEx makes the calls to the callback from different threads, then different instances of your variables will be read / written to. So maybe Thread A sets shiftPressed = true, and Thread B sets shiftPressed = false, and then Thread A still reads shiftPressed = true when executing detectAndActivateShortcut.
March 06, 2021
On Saturday, 6 March 2021 at 14:56:00 UTC, Dennis wrote:
> On Saturday, 6 March 2021 at 13:45:42 UTC, Preetpal wrote:
>> I am kind of skeptical that this problem is performance-related as well but based on the decreased number of times that this problem occurred during my usage of the C version of program versus the D version that was not compiled with as aggressive optimization flags, it suggests that this is a performance problem.
>
> Have you compared your C version with and without optimization flags?
> If that makes a difference, I think it's a race condition. Though optimization flags might change the timing and mask away the issue, it doesn't really solve the problem.
> A performance problem would be if your hook function actually took near 300 ms to complete, but that doesn't seem the case here.

Well in this instance, the callback is not called from different threads so you wouldn't have to worry about race conditions. According to the documentation (https://docs.microsoft.com/en-us/windows/win32/winmsg/lowlevelkeyboardproc): "This hook is called in the context of the thread that installed it. The call is made by sending a message to the thread that installed the hook."

In this instance, I think I jumped to conclusions too early. There is indeed a bug in the program I wrote and by exploiting it, I can trigger the error on command in the conditional in my installed hook that deals with the type of message received.

else if(wParam == WM_KEYUP || wParam == WM_SYSKEYDOWN)

The second part of || test should be "wParam == WM_SYSKEYUP" instead of "wParam == WM_SYSKEYDOWN". According to the documentation (https://docs.microsoft.com/en-us/windows/win32/inputdev/wm-syskeyup), the WM_SYSKEYUP is triggered when keys are released when the ALT key is held. So by releasing any modifier when the ALT key wass held, the internal state of the program would be corrupted since the program would miss the release of that modifier key. The number of times that I probably encountered the issue in my program was purely a function of the order in which I was releasing keys (and based on what programs I was using, for example, Emacs) and not due to the number of the times the hook was timing out since due to it being slow.


>> Well I created the following test program on https://godbolt.org to see what kind of difference __gshared makes:
>
> I didn't suggest __gshared because it is faster, I suggested because I don't know which thread calls your callback function. If SetWindowsHookEx makes the calls to the callback from different threads, then different instances of your variables will be read / written to. So maybe Thread A sets shiftPressed = true, and Thread B sets shiftPressed = false, and then Thread A still reads shiftPressed = true when executing detectAndActivateShortcut.

Another conclusion I made too early was assuming that __gshared is "faster" than using "thread-local storage" based on the number of instructions that the compiler generated (I don't know the actual cost of the operations (time-wise) to access/write to global variables with or without them being annotated __gshared. I have added to __gshared to the global variables my D version of the program so that it more closely matches the C version.

I have updated both of my versions of the programs with the fix.
« First   ‹ Prev
1 2