February 14
On 14/02/2024 5:49 PM, Walter Bright wrote:
> On 2/12/2024 7:31 PM, Richard (Rikki) Andrew Cattermole wrote:
>> dmd having bad codegen here isn't a surprise, that is to be expected.
> 
> I'm used to people saying that DMD doesn't do data flow analysis. It does. In fact, it is based on my OptimumC, which was the first C compiler on DOS to do DFA back in the 1980s.
> 
> This isn't a case of buggy DFA. It's a case of doing DFA correctly.
> 
> The issue is pointer aliasing. A pointer can point to anything, including const data. Therefore, storing through a pointer can alter any value that is reachable via a pointer. Therefore, storing through a pointer invalidates any cached value already read.
> 
> This is what you're seeing.
> 
> C99 tried to address this with __restrict, but few people use it or understand it. D didn't bother with it because people will inevitably misuse __restrict and get their data mysteriously corrupted.

I'm aware that what dmd is doing is right. But it isn't using SIMD either, so there is some performance left on the table.

A backend today vs 15 years ago is very different in terms of what they can do. Let alone one that last saw significant active development 30 years ago.

I suggest that you have a chat with Bruce about this during the next meetup. We've been trying to explain to you just how backends have changed over the years from a users perspective. The ones today can only be described with one word: magical.

But I will say this: 20 years ago you needed inline assembly, 10 years ago you needed intrinsics, today you may not need to do anything special in D, although will in C++.
February 14
I'll give an example for something I wrote up in a reply for a Reddit post:

Using ldc I was able to get this code to do 8.8 million vectors in around 1 second.

```d
module strategy;

int run(byte[] a, byte[] b) {
    int sum = 0;

    for(size_t i; i < a.length; i += 64) {
        static foreach(j; 0 .. 64) {
            sum += cast(short)a[i + j] * cast(short)b[i + j];
        }
    }

    return sum;
}
```

To achieve the same thing using Go, they had to write assembly (see DotVNNI example).

https://sourcegraph.com/blog/slow-to-simd

LDC is able to get to almost the maximum a CPU can do, with what looks to be almost naive looking code. DMD cannot compete with this, nor should it aim to.

Here is what actual naive code looks like for this problem:

```d
module strategy;

int run(byte[] a, byte[] b) {
    assert(a.length == b.length);

    int sum = 0;

    foreach(i; 0 .. a.length) {
        sum += cast(short)a[i] * cast(short)b[i];
    }

    return sum;
}
```

7 million vectors per second.

They had to write assembly to get that speed. I got it, without doing anything special...
February 14
On 13.02.24 04:31, Richard (Rikki) Andrew Cattermole wrote:
> dmd having bad codegen here isn't a surprise, that is to be expected.
> 
> Now for ldc:
> 
> ```d
> void fillBP(immutable(uint*) value, uint* dest) {
>       dest[0] = *value;
>       dest[1] = *value;
>       dest[2] = *value;
>       dest[3] = *value;
> }
> ```
> 
> I expected that to not do the extra loads, but it did.
> 
> ```d
> void fillBP(immutable(uint*) value, uint* dest) {
>      dest[0 .. 4][] = *value;
> }
> ```
> 
> And that certainly should not be doing it either.
> Even if it wasn't immutable.
> 
> For your code, because it is not immutable and therefore can be changed externally on another thread, the fact that the compiler has to do the loads is correct. This isn't a bug.

An unsynchronized read of a location that is modified by another thread is a race condition and a race condition is UB, so this scenario is not an optimization blocker.

I think with this specific code reading back the pointers is unnecessary, because assigning `*value` to itself does not produce an observable difference in behavior given that it is also read and assigned to a different location at least once. (Unless LDC defines the behavior of unaligned reads/writes?)

Of course, if the code instead assigned `*value+1` to the entries of `dest`, the reads would become necessary.

In general, I think LDC and GDC are pretty conservative about optimizing based on UB, which I think has benefits for security.
February 14
On 14.02.24 05:49, Walter Bright wrote:
> On 2/12/2024 7:31 PM, Richard (Rikki) Andrew Cattermole wrote:
>> dmd having bad codegen here isn't a surprise, that is to be expected.
> 
> I'm used to people saying that DMD doesn't do data flow analysis. It does. In fact, it is based on my OptimumC, which was the first C compiler on DOS to do DFA back in the 1980s.
> 
> This isn't a case of buggy DFA. It's a case of doing DFA correctly.
> 
> The issue is pointer aliasing. A pointer can point to anything, including const data. Therefore, storing through a pointer can alter any value that is reachable via a pointer. Therefore, storing through a pointer invalidates any cached value already read.
> 
> This is what you're seeing.
> ...

Well, my understanding is that storing a value that was read from a `uint*` cannot invalidate a value read from the same `uint*` unless unaligned reads/writes are allowed or the pointer can point to itself.

Are unaligned reads/writes allowed in D? Can a `uint*` point to itself? Is there something else I missed?

How to make the following assertion fail by filling the `...` holes with code without invoking UB?

```d
void fillBP1(uint* value, uint* dest)pure{
    dest[0] = *value;
    dest[1] = *value;
    dest[2] = *value;
    dest[3] = *value;
}

void fillBP2(uint* value, uint* dest)pure{
    auto tmp = *value;
    dest[0] = tmp;
    dest[1] = tmp;
    dest[2] = tmp;
    dest[3] = tmp;
}

uint[4] distinguish(alias f)()pure{
    ...; // TODO
    uint[4] dest = ...; // TODO
    uint* value = ...;  // TODO
    f(value,dest.ptr);
    return dest;
}

void main(){
    import std.stdio;
    assert(distinguish!fillBP1()==distinguish!fillBP2());
    writeln(test!fillBP1);
    writeln(test!fillBP2);
}
```

> C99 tried to address this with __restrict, but few people use it or understand it. D didn't bother with it because people will inevitably misuse __restrict and get their data mysteriously corrupted.
> 

My understanding is the case in the OB does not require `__restrict`. `value` can point to any of the 4 fields of `dest`, it is okay to only read it once even in that case. Of course, maybe the data flow analysis is more conservative than that, but I wouldn't necessarily point to this case as one where data flow analysis is working "correctly".
February 14
On 13.02.24 07:02, Bruce Carneal wrote:
> 
> To reuse the value the compiler would have to prove that the memory locations do not overlap.

Not really, it only has to show that the value in the memory location has not changed since the last read.
February 14
On Wednesday, 14 February 2024 at 11:46:24 UTC, Timon Gehr wrote:
> On 13.02.24 07:02, Bruce Carneal wrote:
>> 
>> To reuse the value the compiler would have to prove that the memory locations do not overlap.
>
> Not really, it only has to show that the value in the memory location has not changed since the last read.

True.  I jumped to the general case where I'd seen sub optimal code gen.  I should have commented on both the given special, stationary, case and the more general.


February 14

On Wednesday, 14 February 2024 at 04:49:28 UTC, Walter Bright wrote:

>

This isn't a case of buggy DFA. It's a case of doing DFA correctly.

The issue is pointer aliasing. A pointer can point to anything, including const data. Therefore, storing through a pointer can alter any value that is reachable via a pointer. Therefore, storing through a pointer invalidates any cached value already read.

This is what you're seeing.

C99 tried to address this with __restrict, but few people use it or understand it. D didn't bother with it because people will inevitably misuse __restrict and get their data mysteriously corrupted.

This, here, is the answer.

Moral of the story, use (gdc, ldc) vendor attributes if you really care.

import core.attribute;

void fillBP(@restrict uint* dest, uint* value)
{
    //...
February 16

On Tuesday, 13 February 2024 at 13:30:11 UTC, Johan wrote:

>

I hope someone can find the link to some DConf talk (me or Andrei) or forum post where I talk about why LDC assumes that immutable(uint*) points to mutable (nota bene) data. The reason is the mutable thread synchronization field in immutable class variable storage (__monitor), combined with casting an immutable class to an array of immutable bytes.

Side-effects in-between immutable(uint*) lookup could run into a synchronization event on the immutable data (i.e. mutating it).

Shouldn't it be okay for immutable(uint*) to not access __monitor? I think, __monitor should work outside of type safety, like reference counting, then immutability won't matter.

shared(Object.Monitor*) getMonitor(immutable Object o)
{
	size_t addr=cast(size_t)cast(void*)&o.__monitor;
	return cast(shared(Object.Monitor*))(addr-20+20);
}

Is it legal for optimizer to cast away restrict qualifier like this?

February 16
On 16/02/2024 10:34 PM, Kagamin wrote:
> On Tuesday, 13 February 2024 at 13:30:11 UTC, Johan wrote:
> 
>     I hope someone can find the link to some DConf talk (me or Andrei)
>     or forum post where I talk about why LDC assumes that
>     |immutable(uint*)| points to /mutable/ (nota bene) data. The reason
>     is the mutable thread synchronization field in immutable class
>     variable storage (|__monitor|), combined with casting an immutable
>     class to an array of immutable bytes.
> 
>     Side-effects in-between immutable(uint*) lookup could run into a
>     synchronization event on the immutable data (i.e. mutating it).
> 
> Shouldn't it be okay for immutable(uint*) to not access |__monitor|? I think, |__monitor| should work outside of type safety, like reference counting, then immutability won't matter.

That covers immutable, but not immutable that has become const.

The only way I can see it working reliably is to simply turn off the mutex when it is immutable.
February 16
On Friday, 16 February 2024 at 09:56:50 UTC, Richard (Rikki) Andrew Cattermole wrote:
> That covers immutable, but not immutable that has become const.

That's fine, const can't be restrict qualified unconditionally, but immutable can.