A Long Term Vision for the D programming language
D -- The best programming language!
I imagine a DConf where you guys yell at me, not because we disagree,
but because I'm old and forgot my hearing aid.
This is what I think needs to be done to get us there.
GitHub / Organisation
GitHub has >10^7 accounts, D's bugzilla has what, 10^3?
No matter what feature github is missing there is no reason to not migrate to
github.
The casual D user, when he finds a bug, will never report it we he has to
create a special account on our bugzilla.
Github has an okay api, I bet we can replicate 99% of the features that are
missing with very little code executed by some bots.
Additionally, we are terrible at longer term planing and management.
In pretty much all software projects, you can find milestones, epics, roadmaps.
Github has those features, github is were our code lives, why does our planing
not life there as well.
I fully understand that D is a community project and that we can not tell the
bulk of the contributors to work on issue X or milestone Y, but we could ask
them nicely.
And if we follow our own project planing, they might just follow along as well.
Currently, I don't know where D is heading.
And if I don't know, how should average JS developer know?
Not by reading a few hundred forum posts, but by looking at the project
planing tools he/she is used to.
D does need more people, removing unnecessary bar of entry that our bugzilla
should be a no-brainer.
The role of the language/std leadership I see as keeping on top of the issues,
PR's, milestones, etc..
Setting priorities, motivating people with good libraries on code.dlang
to get them into phobos.
And of course, laying out new directions and goal for the
language and library.
Not short term but long term e.g. ~5 years.
Only after that work is done comes the developing.
Having more development time left would be the measure of success for the
leadership side.
The D Compiler
Long term goal
My desktop computer has 64GB of RAM, my laptop has 16GB why is that all D
compiler work like its 1975 where lexer, parser, ... were different programs?
Having played a bit with web languages like svelte and elm, I'm disappointed
when going back to D.
An incremental compile, with reggae, for my work project takes about seven
seconds.
Elm basically had the unittests running by the time the keyUp event reached my
editor.
Svelte was equally fast, but instead of re-running the tests the updated
webpage was already loaded.
I know, that D and those two language aim for different platforms, but I the
premise should be clear.
Why redo work, if I got enough memory to store it all many times over.
For example, if I have a function
T addTwo(T)(T a, T b) {
return a + b;
}
and a test
unittest {
auto rslt = addTwo(1, 2);
assert(rslt == 3);
}
and change a + b
to a * b
only the unittest calling it should be
re-compiled and executed.
Additionally, in most modern language most editor's/ide's can show me what
the type of rslt
is, plus many more convenience features.
The compiler at some point knew what the type of rslt
was, but it forgets it
as soon as the compilation is done.
No editor can benefit from this information, that the compiler had.
The worst thing though, when I compile next time and no dependency leading to
rslt
has changed, the compiler computes it all over again.
What a waist of time.
Enough talking about how bad the current state is, let's see how much greener
the grass could be.
Imagine, a compiler daemon that you start once per project/program that keeps
track of all files related to this project.
When a files changes, it lexes and parses the file, and stores the information
it can reuse.
As it has the old parse tree of the previous version of the file, it should be
able to figure out which declarations have changed.
At some point even dmd must know what the dependencies between declarations in
different modules are, or what type template types have been evaluated to.
If that information is stored, building an lsp (language server protocol)
interface that each lsp client can talk to, to get this information is the
easy part.
When all the dependencies, are tracked the above example for the minimal
unittest re-run should be possible.
Which a well defined dependency graph effective multi-threading should be
possible as well.
Why do I have to choose which backend I want to use before I start
the compiler.
I would imagine, if the compiler daemon didn't find any errors with my code, I
should be able to tell it, use llvm to build me x86 executable.
When I ask next for an executable build with the gcc backend, only the parts
that change because of version blocks should be rebuild.
There is no reason to re-lex, or re-parse, re-anything any already opened file.
Even better, when working on the unittests why create any executable at all.
Why not create whatever bytecode any reasonable VM requires and pass it.
Companies run on lua, why can't my unittests?
There are embedded devices that run a python VM as its execution environment.
Compiling unittests to machine-code shouldn't be a thing.
WASM needs to be first class citizen.
I want to compile D to javascript, nice looking javascript.
Now for the really big guns.
When the compiler daemon is basically the glue that glues compiler library
functions together, we could create, basically, database migration for
breaking changes.
As an example, lets say autodecoding should be removed.
We would write a program that would, as one part of it, find all instances
of a foreach over a string s
and replace that which a s.byDchar
;
For all breaking changes between version, we supply code migrations.
If we are really eager to please, we write a script that applies those to all
packages on code.dlang.org and creates PR's where possible on github.
No more sed scripts, no more 2to3.py scripts, proper compiler library support
for code migrations.
Just imagine the productivity gains for your private code bases when you have
to do refactoring.
Refactoring, your D programming, by creating a D programming for the D
compiler library.
To add one more level of meta, this could be levered to do refactoring on the
compiler library source itself.
The member id
for the class TemplateInstance should be called identifier
,
no problem lets write a small migration scripts.
When phobos canFind becomes isFindable, just write a small D program and run
it on the compiler codebase.
The documentation/spec of the language leaves things to be desired, when can
spend huge amount of man power on it, but keeping the spec correct and up to
date is a tedious, thankless task.
And to be frank, we don't have the numbers, just take a look at the photo of
the C++ standard committee meeting, and of the last physical dconf.
But why work hard when we can work smart.
Why can't we use __traits(allMembers,
to iterate the AST classes and
generate the grammar from that?
You changed the grammar, fair enough, just re-run the AST classes to ddoc tool,
done.
I know the current AST classes are not correct reflective of the language
grammar, but maybe that is something worth fixing.
Also, there are hundreds of small D files that are used as test cases for the
compiler, why aren't they part of the spec?
Just to state the obvious, this would require the compiler library to
understand dub.(json|sdl) files, but some of that work is already being worked
on ;-)
Error message
We need really good error message.
After playing around with elm, coming back to D is really hard.
In comparison, D might as well just use the system speaker to send a peep
every time it finds an error.
phobos
Batteries included, all of them, even the small flat strange ones.
Serialization
That means that phobos needs to have support for json-schema, yaml, ini, toml,
sdl, jsonnet.
Given a json-schema file named schema.json
we need to be able to write
mixin(generateJsonSchemaASTandParser(import("schema.json")))
and get a
parser and AST hierarchy based on the schema.json
.
json-schema is also sometimes used for yaml, that should be support as well.
Some of the other formats support similar schema specifications as well.
Given a hierarchy of classes/structs, phobos also needs a method to build
parser for those file formats.
Yes that means serialization should be a part of phobos.
Ideally, we find an abstract DSL set of UDA that can be reused for all of the
formats, but the more important step is to have them in phobos.
Perfection being the enemy of the good and all.
Event loop
phobos needs to have support for an event loop.
The compiler daemon library thing needs that, and that thing should be a heavy
user of phobos, dogfooding right.
io_uring seems to be the fast modern system on linux > 5.2, obviously Windows,
MacOSX needs to be supported as well.
But again, if the windows event loop is 5x slower than linux, so be it.
It is much more important, that there is no friction to get started.
The average, javescript dev looking for a statically typed language will
likely be blown away by the performance nonetheless.
I'm not saying, merge vibe-core, but I'm saying take a really close look at
vibe-core, and grill Sönke for a couple of hours.
At least with io_uring this event loop should scale mostly linear in
performance with the amount of threads, given enough CPU cores.
HTTP
Yes, 1, 2, and 3.
Interop
I'm not sure if this is the right place to talk about this, but I didn't find
any better place, so here I go.
autowrap ^1 already allows trivial interaction with python and excel.
This and support for C#, WASM, haskell, golang, and rust should be part of
phobos/D.
If a project demands to get some toml output out of a golang call, passing it
to haskell because there is an algorithm you want to reuse, followed by
a call to scikit-learn, and finally passing it to C#, D should be the obvious
choose.
Error Messages
The error messages in phobos are sometimes not great.
That is not good.
When you come from another language, that is not c++, and try to get started
with ranges good error messages in phobos are important.
One obvious example is how we constrain template function similar to this:
auto someFun(R)(R r) if(isInputRange(R)) {
...
}
you get stuff like
a.d(8): Error: template `a.someFun` cannot deduce function from argument types `!()(int)`
a.d(3): Candidate is: `someFun(R)(R r)`
with `R = int`
must satisfy the following constraint:
` isInputRange!R`
looks helpful but it is not as good as it could be.
If you don't know what an InputRange is, this does not help you.
You have to go to the documentation.
This could be made a lot easier by a small refactor.
auto someFun(R)(R r) {
static assert(isInputRange!R, inputRangeErrorFormatter!R);
...
}
The function inputRangeErrorFormatter
would create a string that shows
which of the required features of an InputRange are not fulfilled by R
.
Especially, when there is overload resolution done by Template Constrains the
error message get difficult to understand fast.
Just look at:
a.d(3): Candidates are: `someFun(R)(R r)`
with `R = int`
must satisfy the following constraint:
` isInputRange!R`
a.d(7): `someFun(R)(R r)`
with `R = int`
must satisfy the following constraint:
` isRandomAccessRange!R`
This can be fixed quite easily as well:
private auto someFunIR(R)(R r) { ... }
private auto someFunRAR(R)(R r) { ... }
auto somFun(R)(R r) {
static if(isInputRange!R) {
someFunIR(r);
} else static if(isRandomAccessRange!R) {
someFunRAR(r);
} else {
static assert(false, "R should be either be an "
~ "InputRange but " ~ inputRangeErrorFormatter!R
~ "\n or R should be an RandomAccessRange but "
~ randomAccessRangeErrorFormatter!R
~ "\n therefore you can call " ~ __FUNCTION__);
}
}
Synchronization
This section is needed to be read with the section about shared in
The Language part of this text.
When we have an event loop that also works with threads, communication has to
happen somehow.
Mutex do not scale, because it is just to hard.
As an exercise, name the three necessary requirements for a deadlock.
Wrong, there are four.
- Mutual exclusion
- Hold and wait
- No preemption
- Circular wait
phobos must have message passing that works with threads and the event-loop.
Two kinds of mail-boxes are to be support 1-to-1 and 1-to-N, where N is a
defined number of receives, such that the next sender is blocked until all N
have read.
Both types support multiple senders, and predefined mailbox queue sizes.
Making this @safe, and not just @trusted, will likely require some copying.
That is fine, when copying is eating your multi-threading gains,
multi-threading was not the solution to your problem, IMO.
Message passing and the SumType are likely a nice way to emulate the Ada
rendezvous concept.
The Language
Get your tomatoes and eggs ready.
GC
There GC is here to stay, you don't do manual memory management (MMM) in a
compiler daemon that tracks dependency.
I don't care how smart you are, you are not that smart.
D is not going to run the ECU of the next Boeing airplane, rust will succeed C
there.
Rust will succeed C and C++ everywhere, but who cares JS runs the rest.
How many OS kernels have you written, but how many data transformations have
you written.
So fight a war that is over and lost, for a niche field anyway, or actually
have some wins and run the world.
Mixing MMM, RC, and GC, is also too complicated IMO.
The whole lifetime tracking requirements make my head spin.
That being said, I think there is a place to reuse the gained knowledge.
In my day job I have a lot of code that results in a call to std.array.array
allocating an array of some T which by the end of the function gets
transformed into something else that is then returned.
The array never leaves the scope of the function.
Given lifetime analysis the compiler could insert GC.free calls.
Think automatic T t
to scope T t
transformation.
At least for the code I have been writing for the last two years, this should
release quite a bit of memory back to the GC, without the GC every having to
mark and sweep.
We want the JS developer, if we have to teach them to use MMM, and or RC we
might as well not try.
I don't even want to think about memory I want to get some work done.
I don't want to get more work by thinking about memory.
I want to get my project running and iterate on that.
To summarize, GC and GC only.
shared
As said in the phobos section about synchronisation, this is an important
building block.
As shared is basically broken, maybe painting a holistic picture of where we
want D's multi-threading/fiber programming to go is better than to take a look
at shared on its own.
For me, this would mean sharing data between threads and/or fibers should be
as easy and error free has letting the GC handle memory.
That means, race conditions need very difficult to produce the same as
deadlocks.
This, to me, implies message passing or Ada rendezvous and not trading locks
to work on shared data.
betterC
betterC is, at best, a waste-by-product, if we have to use betterC to write
something for WASM, or anything significant, we might as well start learning
rust right now.
autodecoding
Having been saved by it a couple of times, and using a non US keyboard
everyday, I still think it is not a terrible idea, but I think this battle is
lost and I'm already full of tomatoes by this point.
Meaning, autodecoding will have to go.
At the same time we have to update std.uni and std.utf.
The majority of developers and users of software speak languages that do not
fit into ASCII.
When a project requires text processing, your first thought must be D, not
perl.
std.uni and std.utf have to be a superset of the std.uni and std.utf of the top
20 languages.
properties
Let's keep it simple, and consistent.
You add parenthesis to call a function.
You can not call a property function with parenthesis.
You can not take the address of a property function.
@safe pure @nogc UDA
Consistency is king:
@safe -> safe
@trusted -> trusted
@system -> system
@nogc -> nogc
Long story short, language attributes do not start with a @, user defined
attributes (UDAs) do.
string interpolation
I had this in the phobos section at the start of writing this.
String interpolation is not what you want, I know it is what you want right
now, because you think it fixes your problem, but it does not.
String interpolation is like shoe laces, you want them, but you are walking on
lava, opening shoes are not actually your problem.
For work, I have D that generates about 10k lines of typescript, and the
places where string interpolation would have helped were trivial to do in
std.format.
IMO, the better solution would be something like vibe's diet, mustache,
handlebar that doesn't require a buildstep like diet.
Whitespace control and Nullable is a big part of this to.
ImportC
ImportC must have a preprocessor, or it is DOA.
Shelling out to gcc or clang to preprocess, makes the build system horrible
which in turn will make the compiler library daemon thing difficult to build.
This is also important for the language interop, as I imagine that most
interop will go through a layer of C.
When ImportC can use openssl 1.0.2s or so it is good enough.
Having done some usage of openssl recently, my eyes can not un-see the
terribleness that is the openssl usage of C.
Specification
This was already partially discussed in the long term goals, but needs better
documentation or better yet a spec.
The cool thing is, we don't need to be an ISO spec aka. a pdf.
We could very well be a long .d file with lots of comments and unittests.
Frankly, I think that would be much more useful anyway.
Of giving a few select/unmaintained example of a language feature show the
tests the compiler runs.
Actually, having looked at some of the tests to figure out how stuff should be
I would imagine other people would benefit as well.
When the compiler fails to execute the spec, either the spec is wrong or the
compiler has a bug.
Two birds with one stone, right? right!
Andorid/IOS
Obviously, D needs to run on those platforms.
Both platforms have api's, using them must be as easy dub add andoird@12.0.1
.
The gtkd people basically wrote a small program to create a D interface to gtk
from the gtk documentation.
I bet a round of drinks at the next physical dconf that this is possible for
android and ios as well.
The dart language people shall come to fear our binding generation
capabilities.
On Versioning
D3 will never happen, it sounds to much like what we got when we moved from D1
to D2.
The D2 version number 2.098.X does not make sense.
D 2.099 plus std v2 would also be terrible.
By the time I have explained to somebody new why D is in version 2.099 with
phobos having parts in version v2 in addition to std.experimental, which is
was pretty much DOA, the person has installed, compiled, and run "hello world"
in rust.
I talked to Andrei about this, as it seemed that we where firmly set in our
corners of the argument.
Andrei mentioned the C++ approach, which has been really successful.
Good ideas are there to steal, so lets do what C++ does.
Lets call the next D 23, the one after that maybe D 25.
Backwards compatibility is not a given.
But we ship the latest version of, lets say, three D versions with the
current release.
D X is implemented in D X-1.
This would mean that the three old D version would still need to be able to
create working binaries ~10 years down the road.
I would say, the older versions should only get patches that stop them from
doing so.
If they come with a bug, and we have moved on to a new D version, this bug
will exist forever in that D version.
Leadership
I'm writing this section as one of the last.
This is maybe one of the most important parts, but also the hardest
to validate.
When reading the forum, or the github PR's I get the feeling that people think
that D is a consensus driven, meritocracy.
That is not the case, and that is okay.
The impression of it is very dangerous as it sets people up to be continuously
disappointed.
Just look for all the posts where people complain that Walter does not change
his mind.
To me this posts shows this disconnect, people except Walter to change his
mind because, at least to their mind, their idea is better what Walter thinks.
But he doesn't have to agree, because he is the benevolent dictator for life.
Who is right or wrong is irrelevant, the impression of level of influence is
not.
Being a bit dramatic, given people false hope, that gets disappointed, will
drive them away from D.
A simple solution, IMO, is to take clear stance on issues.
Direct simple language.
A leadership person saying, yes xor no to thing X.
When new information comes up that warrants a reversal of such a statement,
leadership would lay out how decision (yes|no) on X was changed by new
information Y.
I see the DIP process troublesome as it gives the impression of say of what D
will become.
Maybe renaming D Improvement Proposals into
D Improvement Suggestion would be an option while simultaneously increasing
the amount of work that should go into writing of a DIS.
I find that the especially the given Rationals are way to short to give a way
the pros and cons of an improvement of most existing DIPs.
Just have a look at the quality of the C++ proposals.
The DIS' should aim for that.
Or at least have a matrix how the improvement interacts with each of the D
features and an analysis how this actually makes D better in real world terms
(code.dlang.org).
This would be another nice usage for the compiler library daemon thing.
Always asking, just because we could, should we.
But taking formal steps for the DIP can be avoided I believe if the direction
the language should develop in is clearly marked by leadership.
There is no need to discuss the shared atomics DIP, if leadership dictates
that message passing is the selected mechanism for thread communication and
only that.
Sure you can still argue for shared atomics, but you have no reason to be
disappointed when nobody takes you serious, as you already knew where the
journey is going.
The practical way forward
This year (2021), move from bugzilla to github.
A nice Christmas present to show that we mean business.
D 23:
- remove auto-decoding
- safe by default
- attribute consistency
- ImportC preprocessor
- remove std.experimental
D 25:
- All but, the compiler daemon library thing
D 27:
- Compiler daemon thing.
The work on the compiler daemon thing, will have to start before 2025.
The motto
I'm serious about the motto at the top.
When people start complaining that their language is better, its free
marketing for D.
Closing
If D continues the way it does, it will soon be irrelevant.
And I don't want that, I want to be yelled at dconf 2071.
D's powerful templates, ctfe, and ranges made heads turn, but the other
language have caught up.
Let us really innovate, so that D not only becomes the Voldemort language for
C++, but for all other languages as well, because D is the best language.