Why use a DFA instead of DIP1000? (page 3)

Settings

Help

Index » General » Why use a DFA instead of DIP1000? (page 3)

September 14

Re: Why use a DFA instead of DIP1000?

Posted by Walter Bright
in reply to Richard (Rikki) Andrew Cattermole

Permalink

Walter Bright

Posted in reply to Richard (Rikki) Andrew Cattermole

Permalink

A DFA that only works with a forward pass will generate a lot of complaints as people get baffled by the situations where it fails.

The forward pass attribute inference D has generates complaints and bug reports when it fails, because it does not handle cycles.

September 14

Re: Why use a DFA instead of DIP1000?

Posted by Walter Bright
in reply to Richard (Rikki) Andrew Cattermole

Permalink

Walter Bright

Posted in reply to Richard (Rikki) Andrew Cattermole

Permalink

On 9/13/2025 8:42 PM, Richard (Rikki) Andrew Cattermole wrote:
> The linked code is not static if, its a regular if.

I know. I was trying to clarify the difference between regular if and static if.

September 15

Re: Why use a DFA instead of DIP1000?

Posted by Richard (Rikki) Andrew Cattermole
in reply to Walter Bright

Permalink

Richard (Rikki) Andrew Cattermole

Posted in reply to Walter Bright

Permalink

On 15/09/2025 7:55 AM, Walter Bright wrote:
> A DFA that only works with a forward pass will generate a lot of complaints as people get baffled by the situations where it fails.
> 
> The forward pass attribute inference D has generates complaints and bug reports when it fails, because it does not handle cycles.

Oh I see where you're thinking.

Okay, consider the following reports, does any report (or lack thereof) suggest people are going to be baffled by it?

Relevant files (start.d for reports source code, out.txt for full log):

https://gist.github.com/rikkimax/652f96a33934bd8ae0a68fb849821385#file-start-d

```
start.d
testdfa\start.d(236): Error: Variable `ptr` was required to be non-null and has become null
    foreach (i; 0 .. 2)
    ^
testdfa\start.d(257): Error: Variable `ptr` was required to be non-null and has become null
    foreach (i; 0 .. 2)
    ^
testdfa\start.d(308): Error: Dereference on null variable `ptr`
        int v = *ptr; // error
                ^
testdfa\start.d(330): Error: Dereference on null variable `ptr`
        int v = *ptr; // error
                ^
testdfa\start.d(613): Error: Assert can be proven to be false
    assert(val); // Error: val is 0
           ^
testdfa\start.d(619): Error: Assert can be proven to be false
    assert(stack.length == 1); // Error: stack is null
                        ^
testdfa\start.d(673): Error: Dereference on null variable `ptr`
        return *ptr; // error could be null
               ^
testdfa\start.d(703): Error: Assert can be proven to be false
        assert(ptr !is null); // error
                   ^
extracted.d
```

Worth looking at extracted.d and the full log.

6 days ago

Re: Why use a DFA instead of DIP1000?

Posted by Richard (Rikki) Andrew Cattermole
in reply to kdevel

Permalink

Richard (Rikki) Andrew Cattermole

Posted in reply to kdevel

Permalink

On 14/09/2025 7:58 AM, kdevel wrote:
> On Saturday, 13 September 2025 at 15:43:27 UTC, Richard (Rikki) Andrew Cattermole wrote:
>> [...]
>>> I thought that the intent of the original code
>>>
>>> ```d
>>> int* ptr;
>>>
>>> void func(bool b, scope int* p) @safe {
>>>    assert(!b);
>>>
>>>    if (b) {
>>>      ptr = p; // Error: scope variable `p` assigned to global variable `ptr`
>>>    }
>>> }
>>> ```
>>>
>>> was to show that there is some code-block
>>>
>>> ```d
>>>      ptr = p; // Error: scope variable `p` assigned to global variable `ptr`
>>> ```
>>>
>>> which is never executed regardless of the value of `b`.
>>
>> Half right.
>>
>> There is a code block that will never be executed, it is because of variable b's state is known to prevent that code block from execution.
> 
> Then it is dead code that should be manually removed? Isn't that the problem here? The compiler correctly complains about invalid code, this is not a false positive, even if that code is never executed.
> 
> If I want code not to be executed I comment it out or delete it.

In practice, when this is applied people don't complain about it.

https://github.com/dlang/dmd/issues/21859

Compiles with full D, but not -betterC (which does the extra check without the culling which is a bug).

```d
void func() nothrow {
    if (0) {
     	 throw new Exception("");  // Error: cannot use `throw` statements with -betterC
    }
}
```

When both DFA and Control Flow graph (CFG) processing is working correctly its a well loved compiler feature. You don't know its there unless something has gone wrong like in the above code.

When I learned how nothrow inference worked I was really impressed with how it was applying CFG analysis. I did not expect the frontend to be doing it.

6 days ago

Re: Why use a DFA instead of DIP1000?

Posted by Adam Wilson
in reply to Richard (Rikki) Andrew Cattermole

Permalink

Adam Wilson

Posted in reply to Richard (Rikki) Andrew Cattermole

Permalink

I wasn't entirely sure which post to respond to because I needed Rikki's examples, but this response is for Walter.

On Monday, 15 September 2025 at 01:26:31 UTC, Richard (Rikki) Andrew Cattermole wrote:

start.d
testdfa\start.d(236): Error: Variable `ptr` was required to be non-null and has become null
    foreach (i; 0 .. 2)
    ^
testdfa\start.d(330): Error: Dereference on null variable `ptr`
        int v = *ptr; // error
                ^
extracted.d

C# has both of the above errors (as warnings) enabled by default.

Walter, the twin assertions that the above errors will be confusing to programmer, and that they will slow down the compiler are entirely false. Millions of programmers every day see messages very similar to those in C# compiler outputs they will know exactly what is meant.

Furthermore, C# is renowned for it's speed. I just built a 250kLOC project in less than one second.

Here is examples of C#'s DFA in action:

Main.xaml.cs(530,22,530,24): warning CS0168: The variable 'ex' is declared but never used

Customers.cs(26,37,26,53): warning CS0649: Field 'Customers.customersService' is never assigned to, and will always have its default value null

DataUploadController.cs(41,46,41,54): warning CS0169: The field 'DataUploadController.resolver' is never used

The VB.NET compiler also has DFA and is equally as fast (they share codebases):

WorkOrderDirectory.vb(26,13): warning BC42024: Unused local variable: 'J'.

The above are all real world examples from the code base that I just rebuilt.

There is no rule that states that DFA must be slow as there exist examples disproving the assertion.

5 days ago

Re: Why use a DFA instead of DIP1000?

Posted by Dennis
in reply to Richard (Rikki) Andrew Cattermole

Permalink

Dennis

Posted in reply to Richard (Rikki) Andrew Cattermole

Permalink

On Saturday, 13 September 2025 at 16:12:06 UTC, Richard (Rikki) Andrew Cattermole wrote:

The fact that D is not architectured to be able to stop these false positives is an intended bug in our design. Not all PL's work like this the ML family certainly don't. They tie very advanced analysis engines including DFA into their type system.

So the answer to my question is: this it not motivated by reported issues from D users, but by other languages doing this. If the only example for D you can give is an "unrealistic" one, can you please give a realistic example from another language then?

Just because people don't understand that this is possible, doesn't mean it hasn't been an issue. I see it as a case that people don't know what they don't know. So they don't complain.
This is a hole that the C family relies upon having due to historical reasons and so hasn't reached common knowledge.

I understand DFA is an interesting research topic, and if your claim was that it might lead to interesting discoveries or new paradigms I fully agree. But when you say it's necessary to fix DIP1000 specifically, I need some evidence to believe it. There's been over 100 DIP1000-related bug reports with simple non-DFA solutions, and I'm skeptical that people 'don't know' to report DFA related issues. Several users reported issues about Value Range Propagation (VRP) not working across statements. Even when they don't explicitly know DFA, they still experience a problem and can intuit a solution being available.

5 days ago

Re: Why use a DFA instead of DIP1000?

Posted by Dennis
in reply to Adam Wilson

Permalink

Dennis

Posted in reply to Adam Wilson

Permalink

On Monday, 15 September 2025 at 23:02:25 UTC, Adam Wilson wrote:

Here is examples of C#'s DFA in action:

(...)

There is no rule that states that DFA must be slow as there exist examples disproving the assertion.

When Walter talks about DFA, he refers to the specific algorithm he learned in the 1980s and implemented in his backend optimizer, which solves data-flow equations in a loop until it converges on a fixed point. This algorithm is superlinear, so considered 'too slow' for the frontend which aims to analyse n statements in O(n) time.

What Rikki implements is a 4th semantic pass, which analyzes the code after the existing 3 linear semantic passes have completed. I don't know exactly how it works, but from his communication I gather it's a subset of 'full' DFA which trades a bit of accuracy for compile speed.

The C# examples of unused symbols are, from my brief experimentation on https://dotnetfiddle.net, even simpler than that. They don't take control flow into account at all:

public class Program
{
	static int y;
	
	public static void Main()
	{
		int x = 0;
		if (y < 0)
			if (y > 0) // this branch is dead code
				x++; // but C# doesn't mark this line as unreachable, or x as unused
	}
}

That's not to say it's not useful, I think dmd should compute this information similarly and expose it through an LSP interface. However, this can be implemented in the existing linear semantic3 pass with a simple usage counter/boolean per declaration.

These 3 things have all been labeled 'DFA' so far which makes the discussion really confusing, so it's best to clarify exactly which DFA flavor you're talking about.

5 days ago

Re: Why use a DFA instead of DIP1000?

Posted by Richard (Rikki) Andrew Cattermole
in reply to Dennis

Permalink

Richard (Rikki) Andrew Cattermole

Posted in reply to Dennis

Permalink

On 16/09/2025 9:59 PM, Dennis wrote:
> On Monday, 15 September 2025 at 23:02:25 UTC, Adam Wilson wrote:
>> Here is examples of C#'s DFA in action:
>>
>> (...)
>>
>> There is no rule that states that DFA must be slow as there exist examples disproving the assertion.
> 
> When Walter talks about DFA, he refers to the specific algorithm he learned in the 1980s and implemented in his backend optimizer, which solves data-flow equations in a loop until it converges on a fixed point. This algorithm is superlinear, so considered 'too slow' for the frontend which aims to analyse n statements in O(n) time.
> 
> What Rikki implements is a 4th semantic pass, which analyzes the code after the existing 3 linear semantic passes have completed. I don't know exactly how it works, but from his communication I gather it's a subset of 'full' DFA which trades a bit of accuracy for compile speed.

You haven't got my stuff entirely right.

The fast DFA which is what I'm working on right now, runs in semantic 3.

Its very similar to DIP1000 in terms of it doing a single forward pass with the statement and expression walkers.

The slow DFA which I want to work on eventually, is indeed more like the classical DFA's of the 80's and would run in semantic 4. Since this is a known quantity and it isn't the default experience which all these features hinge on having a solution to, so it isn't a priority.

Like classical DFA's the fast DFA does indeed use lattices, join and meet operators. But I have extra information available like depth checks, write counts and scopes modelling that classical DFA's do not have.

The fast DFA does something quite interesting, it isn't modelling what is modellable, it instead models what it can't model, and what is left is what is modellable. You need to have unknown state, with ways of putting variables into it. This allows me to cull as much false and perceived false positives as possible.

Data flow analysis is the derivative of proof assistants. They work on properties and sometimes in the case of VRP they integrate values into their calculations.

If it weren't for my zero false & perceived false positives rule, the fast DFA engine would be almost as good as a full DFA engine. Its missing a complete interprocedural analysis, and single function cyclicity stuff.

> The C# examples of unused symbols are, from my brief experimentation on https://dotnetfiddle.net, even simpler than that. They don't take control flow into account at all:
> 
> ```C#
> public class Program
> {
>      static int y;
> 
>      public static void Main()
>      {
>          int x = 0;
>          if (y < 0)
>              if (y > 0) // this branch is dead code
>                  x++; // but C# doesn't mark this line as unreachable, or x as unused
>      }
> }
> ```

It does actually.

Godbolt has a control flow graph display for C#.

https://csharp.godbolt.org/z/nMjPqYens

```
Program:Main() (FullOpts):
       xor      edi, edi
       tail.jmp [System.Console:WriteLine(int)]
```

It silently culled the branches. Exactly what DFA should do in this situation.

> That's not to say it's not useful, I think dmd should compute this information similarly and expose it through an LSP interface. However, this can be implemented in the existing linear semantic3 pass with a simple usage counter/boolean per declaration.
And that would be a rewrite and I don't mean just DIP1000.

Doing this stuff as part of the type system isn't the normal way to handle it, and for good reason. Attaching a DFA engine today directly into the type system to resolve this is quite frankly too late. From what I've seen that would be a bad direction to go in and have no desire to watch that being tried.

5 days ago

Re: Why use a DFA instead of DIP1000?

Posted by Dennis
in reply to Richard (Rikki) Andrew Cattermole

Permalink

Dennis

Posted in reply to Richard (Rikki) Andrew Cattermole

Permalink

On Tuesday, 16 September 2025 at 10:47:03 UTC, Richard (Rikki) Andrew Cattermole wrote:

It does actually.

Godbolt has a control flow graph display for C#.

https://csharp.godbolt.org/z/nMjPqYens

Program:Main() (FullOpts):
       xor      edi, edi
       tail.jmp [System.Console:WriteLine(int)]

It silently culled the branches. Exactly what DFA should do in this situation.

I was referring to C#'s unused variable warnings that Adam brought up, not the code generator.

And that would be a rewrite and I don't mean just DIP1000.

Doing this stuff as part of the type system isn't the normal way to handle it, and for good reason. Attaching a DFA engine today directly into the type system to resolve this is quite frankly too late. From what I've seen that would be a bad direction to go in and have no desire to watch that being tried.

I don't understand this paragraph at all, you're referring to 5 different things (rewrite, DIP1000, too late, type system, things you've seen) and I don't know the context/relevance of any of those.

5 days ago

Re: Why use a DFA instead of DIP1000?

Posted by Richard (Rikki) Andrew Cattermole
in reply to Dennis

Permalink

Richard (Rikki) Andrew Cattermole

Posted in reply to Dennis

Permalink

On 16/09/2025 11:20 PM, Dennis wrote:
>     And that would be a rewrite and I don't mean just DIP1000.
> 
>     Doing this stuff as part of the type system isn't the normal way to
>     handle it, and for good reason. Attaching a DFA engine today
>     directly into the type system to resolve this is quite frankly too
>     late. From what I've seen that would be a bad direction to go in and
>     have no desire to watch that being tried.
> 
> I don't understand this paragraph at all, you're referring to 5 different things (rewrite, DIP1000, too late, type system, things you've seen) and I don't know the context/relevance of any of those.

Hmm, I think I see a missing piece of the puzzle.

A type system is the types and declarations and how they get processed in a language.

When a DFA is attached to a type system properly, you are not calling functions like runSemantic on AST nodes. This is handled by the worklist algorithm. This is an important distinction between what we have and what the more advanced type systems with a DFA attached to them have.

Take a look at all of the issues we have with resolving templates. These stem from not having the DFA directed type system. Its very easy to exceed what the compiler can instantiate with cycles and other ordering issues. We've all run into them and they are rarely fixable.

And no, this isn't a new thing. This is from the 70's just a different branch of DFA and related analysis engines. I doubt Walter would have learned this branch, and may not have been exposed to it, its mostly academic (for a reason up until the 90's). The papers he has sent me are all backend related, not these other branches.

I do want to say, that I have been knowingly confusing DFA and other analysis engines, this is fully intended. Their interplay can be pretty heavy and would feed from one to the other here. Lines get blurred in the implementation and their solutions can be based upon the same theory & algorithms.

Now compare us to the ML family with signatures (type classes), modules and their variation of meta-programming. They can resolve stuff we wouldn't have a dream of doing. We have to employ CTFE just to come close to what they can do, and even then there is stuff we simply will never be able to do due to ordering issues.

This would be a good time to define what lint level means to a type system. In essence it puts anything that the type system knows about into a read only mode. You can't change it in a given lint level engine.

This makes DIP1000 in the type system, not lint level. Due to it messing with storage classes.

What analysis dmd does do like VRP and CFG, is impressive. Not because they are the most advanced thing in the world (not even close). But because our type system wasn't meant to support it. This isn't a strength of its design.

A good thing to note here is how SDC uses a worklist algorithm to process its type system. https://github.com/snazzy-d/sdc/blob/master/src/d/semantic/scheduler.d

SDC will have the ability to solve problems that dmd cannot do. Not because the language prevents it, but because the type system implementation does not allow for it by having a much lower ceiling in its capability.

To give dmd the kind of capability where you have a DFA engine dictating analysis of the type system, or feeding a significant amount of information throughout a function is a rewrite. Simply because the existing architecture wasn't written with that in mind. It was never meant to have templates like we have today, nor analysis like VRP.

The literature is massive on these topics, and gets quite broad. It can be argued differently than what I am.

Some people who read this, may conclude that dmd needs a rewrite, and to that end I know SDC folks with love some help. However my take away from a pragmatic perspective is to recognize what we can't do, to help determine what we can do. D is a production language, not an academic or toy one. If we have to give up strong guarantees by default for DFA features then so be it, I'd rather have some than none.

Top | Forum index | About this forum

Forums