Jump to page: 1 26  
Page
Thread overview
July 16
Hi everyone,

At the end of May I've started working on my GSoC project, Header Generation for C/C++

Introduction
------------

In recent years, the D programming language has gained more and more
attention and existing C and C++ codebases are starting to incrementally integrate D
components.

In order to be able to use D components, a C or C++ interface to them must be
provided; in C and C++, this is done through header files. Currently, this process is entirely
manual, with the responsibility of writing a header file falling on shoulders of the
programmer. The larger the D portion of a codebase is, the more tedious the task
becomes: the best example being the DMD frontend which amounts to roughly ~310000
lines of code for which the C++ header files that are used by other backend
implementations (gdc, ldc) are manually managed. This is a repetitive, time consuming,
and rather boring task: this is the perfect job for a machine.

Project goal
------------

The deliverable of the project is a tool that automatically generates C and C++
header files from D module files. This can be achieved either by a library solution using
DMD as a Library, or by adding this feature in the DMD frontend through a compiler
switch.

The advantage of using DMD as a Library is that this wouldn’t increase the
complexity of compiler frontend codebase. The disadvantage will be that the user will be
required to install a third-party tool. Contrasting to this, the addition of the feature to the
frontend would result in a smoother integration with all the backends that use the DMD
frontend.

We have decided to go with the compiler switch approach.

One major milestone (and success marker) for the project is to automatically generate the
DMD frontend headers required by GDC/LDC.

Implementation strategy
-----------------------

The feature will require the implementation of a `Visitor` class that will traverse
the `AST` resulted after the parsing phase of the D code. For each top-level `Dsymbol`
(variable, function, struct, class etc.) the associated C++ correspondent will be written in
the header file.

The visitor will override the visiting methods of two types of nodes:
* Traversal nodes - these nodes simply implement the `AST` traversal logic:
`ModuleDeclaration`, `ScopeDeclaration`, etc.
* Output nodes - these nodes will implement the actual header generation logic:
`FuncDeclaration`, `StructDeclaration`, `VarDeclaration`, etc.

The header file will consist of declarations from `public extern (C++)` and `public extern (C)`
declarations/definitions from D modules.

Project status
--------------

I've started work [0] with the revival of DMD's PR 8591 [1], rebasing it and converting it into
a compiler switch.

The next step was to add a bunch of tests for the existing code, which revealed the following issues
* StructDeclaration:
  - align different than 1 does nothing; we should support align(n), where `n` in [1, 2, 4, 8, 16]
  - align(n): inside struct definition doesn’t add alignment, but breaks generation of default ctors
  - default ctors should be generated only if struct has no ctors
  - if a struct has ctors defined, only default ctor (S() { … }) should be generated to init members to default values, and the defined ctors must be declared
  - if a struct has a void initializer (`member = void`), the code segfaults
  - a struct should only define ctors if it’s `extern (C++)`

  As you can see, a bunch of the issues above are related to auto-generated ctor definitions.
  You might wonder "But why are there any definitions?"; the default ctors are there because D initializes
  member fields with a default value, while C and C++ do not, and this might break existing GDC/LDC behaviour.
  Ideally, we wouldn't generate any definitions, and if we can confirm the ctor definitions aren't needed, we'll remove them.

* ClassDeclaration:
  - align(n) does nothing. You can use align on classes in C++, though It is generally regarded as bad practice and should be avoided

* FuncDeclaration:
  - default arguments can be any valid D code, including a lambda function or a complex expression; we don't want to go down the path of generating C or C++ code, so for now default arguments get ignored.

* TemplateDeclaration:
  - templates imply code generation, so for now we don't support them

After writing the tests and understanding what are the issues, I got more comfortable with the codebase and I got on to the next (current) step: generating the DMD frontend header files from DMD's `*.d` frontend modules.

This took quite some time and sweat to get going: the major pain point here is given by templates.
There is `dmd/root/array.d` which has a templated `Array(T)` that is used throughout the codebase.
Since we don't support templates, we decided to keep the manual management of the `dmd/root/*.h` headers, but things aren't that simple.

The issue: while we don't explicitly pass in any of the `dmd/root/*.d` modules, some of them are processed during the semantic analysis phase, which will generate the definition of some `struct`s and `enum`s from `dmd/root/*.d` into the generated frontend header. When the generated header is used in conjunction with the manually managed header files from `dmd/root/*.h` a `struct`/`enum` re-definition error will be thrown by the compiler.

I kept scratching my head at how to avoid this, and in the end I went with explicitly ignoring anything that comes from a `dmd/root/*.d` module. Ideally, this special casing shouldn't be needed, and it should go away if we can add support for some simple D -> C++ templates.

So now, the current state of affairs is that the code in the PR [0] can link with and pass the `cxx-unittests`.

How to use it
-------------

The current PR [0] code is generating a `C++` header file out of a list of `.d` modules passed at compile time.

The simplest form of the CLI switch is `dmd -HC a.d b.d`

This will visit the ASTs of modules `a` and `b` and output a single header file at `stdout`.

By using the `-HCf=<file-name>` switch, the above result will be written in specified file name. Using `-HCd=<path>` will write the `file-name` in the specified `path`.

So, by running,
`dmd -HCf=ab.h -HCd=mypath/ a.d b.d` will write the generated header in `mypath/ab.h`, relative to the current directory.

If you have some spare time and curiosity I would appreciate your `test drive` and bug reports :)

This month
----------

I'll be working on generating the frontend headers, cleaning up the code and fixing issues and addressing PR comments.

Closing note
------------

I deeply apologize for this long overdue post.

Looking forward to your replies,
Edi

[0] - https://github.com/dlang/dmd/pull/9971
[1] - https://github.com/dlang/dmd/pull/8591
July 17
So currently there is no way to restrict it to just extern(C)?
I ask this because -HC makes me think C not C++ headers.
July 16
On Tuesday, 16 July 2019 at 13:16:50 UTC, Eduard Staniloiu wrote:

> So, by running,
> `dmd -HCf=ab.h -HCd=mypath/ a.d b.d` will write the generated header in `mypath/ab.h`, relative to the current directory.

Will it be possible to extend this for other languages? That would be a killer application. For instance

dmd -HRuby=ab.rb -HCd=mypath/ a.d b.d

goes a step further, converting what would have been the generated C header to the file ab.rb containing (taken from https://github.com/ffi/ffi/wiki/Examples)

module ab
  extend FFI::Library
  ffi_lib "path/to/ab.so"
  attach_function :calculate_something, [:int, :float], :double
  attach_function :error_code, [], :int # note empty array for functions taking zero arguments
  attach_function :create_object, [:string], :pointer
  attach_function :calculate_something_else, [:double, :pointer], :double
  attach_function :free_object, [:pointer], :void
end

Then the Ruby script would call ab.rb as

require 'ffi'
require 'ab'

c = ab.calculate_something(42, 98.6) # note FFI handles literals just fine
if ( (errcode = ab.error_code()) != 0)
  puts "error calculating something: #{errcode}"
  exit 1
end

objptr = ab.create_object("my object") # note FFI handles string literals as well
d = ab.calculate_something_else(c, objptr)
ab.free_object(objptr)

puts "calculated #{d}"
July 16
On Tue, Jul 16, 2019 at 6:20 AM Eduard Staniloiu via Digitalmars-d <digitalmars-d@puremagic.com> wrote:
>
> [...]

I'm super-excited to give this a spin. Thanks for having a go at it!! One hopefully trivial request, I wouldn't want the compiler to emit a single header; I would want it to emit one .h per .d file into the output directory.
July 17
On Wednesday, 17 July 2019 at 04:05:57 UTC, Manu wrote:
> One hopefully trivial request, I wouldn't want the compiler to emit a single header; I would want it to emit one .h per .d file into the output directory.

Forward declarations make that a PITA.


July 16
On Tue, Jul 16, 2019 at 9:40 PM Nicholas Wilson via Digitalmars-d <digitalmars-d@puremagic.com> wrote:
>
> On Wednesday, 17 July 2019 at 04:05:57 UTC, Manu wrote:
> > One hopefully trivial request, I wouldn't want the compiler to emit a single header; I would want it to emit one .h per .d file into the output directory.
>
> Forward declarations make that a PITA.

Hmmm...
July 17
On Tuesday, 16 July 2019 at 13:46:56 UTC, rikki cattermole wrote:
> So currently there is no way to restrict it to just extern(C)?
> I ask this because -HC makes me think C not C++ headers.

Currently the outputted headers are C++ headers, but we were thinking of wrapping `extern (C++)` definitions inside an `#ifdef __cplusplus` block, and prefixing any `extern (C)` definitions with the following `EXTERNC` macro

```
#ifdef __cplusplus
#define EXTERNC extern(C)
#else
#define EXTERNC
#endif
```

This way, the generated header could be used in both C and C++.

What do you think?
July 17
On Tuesday, 16 July 2019 at 19:10:41 UTC, bachmeier wrote:
> On Tuesday, 16 July 2019 at 13:16:50 UTC, Eduard Staniloiu wrote:
>
>> So, by running,
>> `dmd -HCf=ab.h -HCd=mypath/ a.d b.d` will write the generated header in `mypath/ab.h`, relative to the current directory.
>
> Will it be possible to extend this for other languages? That would be a killer application. For instance
>
> dmd -HRuby=ab.rb -HCd=mypath/ a.d b.d
>
> goes a step further, converting what would have been the generated C header to the file ab.rb containing (taken from https://github.com/ffi/ffi/wiki/Examples)
>
> [ ... ]

This would best be done as a separate RubyVisitor that visits the AST nodes and
writes the expected FFI interface. If I didn't misunderstand the FFI readme, it only works with C interfaces, so this should probably simplify a great deal from the complexity.

From the example it looks like FFI is meant to work only with opaque pointers, which means that you would only be interested in declaring `struct`s, defining `enum`s and function declarations.

The memory management bindings might be trickier, or they could be easy as I don't know either Ruby or FFI.

This being said, I believe that it should be done as a separate visitor so it wouldn't add more complexity to the C/C++ one.

This would be an interesting project :)
July 17
On Wednesday, 17 July 2019 at 04:43:52 UTC, Manu wrote:
> On Tue, Jul 16, 2019 at 9:40 PM Nicholas Wilson via Digitalmars-d <digitalmars-d@puremagic.com> wrote:
>>
>> On Wednesday, 17 July 2019 at 04:05:57 UTC, Manu wrote:
>> > One hopefully trivial request, I wouldn't want the compiler to emit a single header; I would want it to emit one .h per .d file into the output directory.
>>
>> Forward declarations make that a PITA.
>
> Hmmm...

I'm glad to hear that you are excited and eager to give it a spin :D

As Nicholas pointed out, forward declarations make it a pain.
Say you have the following example modules, `a.d` and `b.d`
```
// a.d
module a;

import b;

extern (C++) struct A
{
  TestEnum e;
}

// b.d
module b;

enum TestEnum
{
  aa,
  bb
}
```

For `a.d` you'll get
```
// a.h
#pragma once

enum TestEnum
{
    TESTENUMaa = 0,
    TESTENUMbb = 1
};

struct A
{
    TestEnum e;
};
```

For `b.d` you'll get
```
// b.h
#pragma once

enum TestEnum
{
    TESTENUMaa = 0,
    TESTENUMbb = 1
};
```

When you'll bring the two headers together, you'll get a re-definition error for the enum.

One way to go with this would be to have wrap the definitions inside an `#ifndef \ #define` block.

Another one would be to check from which module does the forward declaration come from and replace that with an `#include "b.h"`, but I don't know how complicated this would be, as not all forward declarations come from a different module.
July 17
On Wednesday, 17 July 2019 at 11:05:21 UTC, Eduard Staniloiu wrote:
> On Tuesday, 16 July 2019 at 13:46:56 UTC, rikki cattermole wrote:
>> So currently there is no way to restrict it to just extern(C)?
>> I ask this because -HC makes me think C not C++ headers.
>
> Currently the outputted headers are C++ headers, but we were thinking of wrapping `extern (C++)` definitions inside an `#ifdef __cplusplus` block, and prefixing any `extern (C)` definitions with the following `EXTERNC` macro
>
> ```
> #ifdef __cplusplus
> #define EXTERNC extern(C)
> #else
> #define EXTERNC
> #endif
> ```
>
> This way, the generated header could be used in both C and C++.
>
> What do you think?

It should be pretty trivial to just check the linkage of the symbols and only output extern(C) symbols.

Also you've clearly been doing too much D programming! (which is probably a good thing) That would be:
#define EXTERNC extern "C" {

and you need to macro the closing brace as well.
« First   ‹ Prev
1 2 3 4 5 6