Things that make writing a clean binding system more difficult - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » Things that make writing a clean binding system more difficult

Thread overview

Things that make writing a clean binding system more difficult
Jul 28, 2016 Ethan Watson
Jul 28, 2016 Walter Bright
Jul 28, 2016 Ethan Watson
Jul 28, 2016 Walter Bright
Jul 28, 2016 Jonathan M Davis
Jul 28, 2016 jmh530
Jul 28, 2016 Steven Schveighoffer
Aug 04, 2016 Manu
Jul 29, 2016 Kagamin
Jul 29, 2016 Timon Gehr
Jul 29, 2016 Jonathan M Davis
Jul 29, 2016 Timon Gehr
Jul 29, 2016 Jonathan M Davis
Jul 29, 2016 Timon Gehr
Jul 29, 2016 Jonathan M Davis
Jul 29, 2016 Walter Bright
Jul 29, 2016 Jonathan M Davis
Jul 29, 2016 Dicebot
Jul 29, 2016 Jonathan M Davis
Jul 29, 2016 Dicebot
Aug 04, 2016 Manu
Aug 04, 2016 Manu
Aug 04, 2016 Seb
Aug 05, 2016 Manu
Aug 05, 2016 Steven Schveighoffer
Jul 29, 2016 Steven Schveighoffer
Jul 30, 2016 deadalnix
Jul 30, 2016 Timon Gehr
Jul 30, 2016 deadalnix
Jul 28, 2016 Walter Bright
Jul 28, 2016 Kagamin
Aug 04, 2016 Jacob Carlborg
Aug 05, 2016 Ethan Watson
Aug 05, 2016 ketmar
Aug 05, 2016 ketmar

July 28, 2016

Things that make writing a clean binding system more difficult

Posted by Ethan Watson

Ethan Watson

As mentioned in the D blog the other day, the binding system as used by Remedy will both be open sourced and effectively completely rewritten from when we shipped Quantum Break. As I'm still deep within that rewrite, a bunch of things are still fresh in my mind that aren't that great when it comes to D and doing such a system.

These are things I also expect other programmers to come across in one way or another, being that they seem like a simple way to do things but getting them to behave require non-trivial workarounds.

I also assume "lodge a bug" will be the response to these. But there are some cases where I think documentation or easily-googleable articles will be required instead/as well. And in the case of one of these things, it's liable to start a long circular conversation chain.

====================
1) Declaring a function pointer with a ref return value can't be done without workarounds.

Try compiling this:

ref int function( int, int ) functionPointer;

It won't let you, because only parameters and for loop symbols can be ref types. Despite the fact that I intend the function pointer to be of a kind that returns a ref int, I can't declare that easily. Easy, declare an alias, right?

alias RefFunctionPointer = ref int function( int, int );

Alright, cool, that works. But thanks to the binding system making heavy use of function pointers via code-time generated code, that means we then have to come up with a unique name for every function pointer symbol we'll need. Eep.

Rather, I have to do something like this:

template RefFunctionPointer( Params... ) if( Params.length > 1 )
{
  ref Params[ 0 ] dodgyFunction( Params[ 1 .. $ ] );
  alias RefFunctionPointer = typeof( &dodgyFunction );
}
RefFunctionPointer!( int, int, int ) functionPointer;

This can also alternately be done by generating a mixin string for the alias inside of the template and not requiring a dummy function to get the type from. Either way, it gets rid of the unique name requirement but now we have template expansion in the mix. Which is something I'll get to in a second...

Needless to say, this is something I wasted a lot of time on three years ago when I was getting the bindings up to speed originally. Turns out it's not any better in DMD 2.071.

====================
2) Expansion of code (static foreach, templates) is slow to the point where string mixins are a legitimate compile-time optimisation

Take an example of whittling down a tuple/variable argument list. Doing it recursively would look something like this:

template SomeEliminator( Symbols... )
{
  static if( Symbols.length >= 1 )
  {
    static if( SomeCondition!( Symbol[ 0 ] ) )
    {
      alias SomeEliminator = TypeTuple!( Symbol[ 0 ], Symbols[ 1 .. $ ] );
    }
    else
    {
      alias SomeEliminator = TypeTuple!( Symbols[ 1 .. $ ] );
    }
  }
  else
  {
    alias SomeEliminator = TypeTuple!( );
  }
}

Okay, that works, but the template expansion is a killer on compile-time performance. It's legitimately far quicker on the compiler to do this:

template SomeEliminator( Symbols... )
{
  string SymbolSelector()
  {
    string[] strOutputs;
    foreach( iIndex, Symbol; Symbols )
    {
      static if( SomeCondition!( Symbol ) )
      {
        strOutputs ~= "Symbols[ " ~ iIndex.stringof ~ " ]";
      }
    }
    return strOutputs.joinWith( ", " );
  }
  mixin( "alias SomeEliminator = TypeTuple!( " ~ SymbolSelector() ~ " );" );
}

With just a small codebase that I'm working on here, it chops seconds off the compile time. Of course, maybe there's something I'm missing here about variable parameter parsing and doing it without a mixin is quite possible and just as quick as the mixin, but that would make it the third method I know of to achieve the same effect. The idiomatic way of doing this without mixins should at least be defined, and optimised at the compiler level so that people don't get punished for writing natural D code.

Then there was this one that I came across:

outofswitch: switch( symbolName )
{
  foreach( Variable; VariablesOf!( SearchType ) )
  {
    case Variable.Name:
      doSomething!( Variable.Type )();
      break outofswitch;
  }
  default:
    writeln( symbolName, " was not found!" );
    break;
}

This caused compile time to blow way out. How far out? By rewriting it like this, I cut compile times in half (at that point, from 10 seconds to 5):

switch( symbolName )
{
  mixin( generateSwitchFor!( SearchType )() );
  default:
    writeln( symbolName, " was not found!" );
    break;
}

Now, I love mixins, both template form and string form. The binding system uses them extensively. But mixins like this are effectively a hack. Anytime I have to break out a mixin because my compile time doubled from a seemingly simple piece of code is not good.

====================
3) __ctfe is not a CTFE symbol.

This one bit me when I was trying to be efficient for runtime usage while allowing a function to also be usable at compile time.

int[] someArray;
static if( !__ctfe )
{
  someArray.reserve( someAmount );
}

Reserve not working in compile time? Eh, I can live with that. __ctfe not being a symbol I can static if with? Well, sure, I suppose that would work if the compiler wouldn't even try compiling the code inside the __ctfe block. But it doesn't do that. It does symbol resolution, and then your code doesn't run at compile time. It's at that point where I ask why have the __ctfe symbol if you can only use it effectively at runtime? Doesn't that only make it half useful?

I understand this is a longstanding complaint too, so this serves as a reminder.

====================
4) Forward declaring a function prototype means I can never declare that function elsewhere (say, for example, with a mixin)

The binding system works something like this:

* Declare a function, mark it with a @BindImport UDA.
* Compile time code scans over objects and symbols looking for functions tagged @BindImport.
* Generate __gshared function pointers that match the signature (and rewrite parameters to pass this in where applicable).
* Generate function definitions that call the function pointers (with this if it's a method), allowing a programmer to just call the function declaration like it was any old ordinary piece of D code.

That fourth part is where it all falls over.

We shipped Quantum Break by defining your imports with a preceding underscore (ie @BindImport int _doSomeAction();) and generated function definitions with the exact same signature minus the underscore. The new way I'm doing it is to define all these functions in a sub-struct so that all I need to rewrite is the parameters.

All this because I cannot later define a forward-declared function.

.di files are both not a language feature (documentation notes it is explicitly a compiler feature), and don't even match what I need here as they're generated from complete code with no possibility of using them in more of a .cpp/.h paradigm. So they're out of the question. *Unless* they're upgraded to a language feature and allow me to define full class declarations with later implementations of some/all functions.

This also isn't the only use case I have. I'm a game engine programmer. We write a lot of abstracted interfaces with platform specific implementations. I know, I know, version(X){} your code, right? But that's not how everyone works. Some implementations really do require their own file for maintenance and legal purposes. But for an example outside of gaming? Take a look at core.atomic. Two implementations in the same file *AND* a separate one for documentation purposes. LDC's core.atomic also has an LLVM definition in there. And if someone writes native ARM support for DMD, that'll be even more implementations in the same file. Take note of the duplicated enums and deprecations between definitions, the alternative to which is to put a version block inside every function that requires special behaviour. Either way is not clean, I tells ya.

I'm sure there's many cases where declaration and later definition is also a perfectly valid programming pattern, and I don't see at all how these use cases can conflict with D's forward referencing since it doesn't change referencing rules at all, it only changes the definition rules.

July 28, 2016

Re: Things that make writing a clean binding system more difficult

Posted by Walter Bright
in reply to Ethan Watson

Walter Bright

Posted in reply to Ethan Watson

On 7/28/2016 1:33 AM, Ethan Watson wrote:
> 1) Declaring a function pointer with a ref return value can't be done without
> workarounds.
>
> Try compiling this:
>
> ref int function( int, int ) functionPointer;
>
> It won't let you, because only parameters and for loop symbols can be ref types.
> Despite the fact that I intend the function pointer to be of a kind that returns
> a ref int, I can't declare that easily. Easy, declare an alias, right?
>
> alias RefFunctionPointer = ref int function( int, int );

C/C++ have essentially the same problem, if you want to declare a function pointer parameter that has different linkage.

The trouble is there's an ambiguity in the grammar. I don't really have anything better than the two step process you outlined.


> ====================
> 4) Forward declaring a function prototype means I can never declare that
> function elsewhere (say, for example, with a mixin)

Do you mean:

  void foo();
  void foo() { }

?

July 28, 2016

Re: Things that make writing a clean binding system more difficult

Posted by Ethan Watson
in reply to Walter Bright

Ethan Watson

Posted in reply to Walter Bright

On Thursday, 28 July 2016 at 08:49:35 UTC, Walter Bright wrote:
> Do you mean:
>
>   void foo();
>   void foo() { }
>
> ?

Exactly this. I've been unable to get it to work.

July 28, 2016

Re: Things that make writing a clean binding system more difficult

Posted by Walter Bright
in reply to Ethan Watson

Walter Bright

Posted in reply to Ethan Watson

On 7/28/2016 1:54 AM, Ethan Watson wrote:
> On Thursday, 28 July 2016 at 08:49:35 UTC, Walter Bright wrote:
>> Do you mean:
>>
>>   void foo();
>>   void foo() { }
>>
>> ?
>
> Exactly this. I've been unable to get it to work.

https://issues.dlang.org/show_bug.cgi?id=16329

The reason it's an enhancement request rather than a bug is that with D's support for forward referenced functions, having to declare them first followed later by a definition was deemed unnecessary.

This pattern is, of course, necessary in C/C++ because they do not allow forward referenced declarations outside of aggregates.

July 28, 2016

Re: Things that make writing a clean binding system more difficult

Posted by Walter Bright
in reply to Ethan Watson

Walter Bright

Posted in reply to Ethan Watson

On 7/28/2016 1:33 AM, Ethan Watson wrote:
> I also assume "lodge a bug" will be the response to these.

Indeed. That's the process.


> 2) Expansion of code (static foreach, templates) is slow to the point where
> string mixins are a legitimate compile-time optimisation

https://issues.dlang.org/show_bug.cgi?id=16330

July 28, 2016

Re: Things that make writing a clean binding system more difficult

Posted by Kagamin
in reply to Ethan Watson

Kagamin

Posted in reply to Ethan Watson

On Thursday, 28 July 2016 at 08:33:22 UTC, Ethan Watson wrote:
> This also isn't the only use case I have. I'm a game engine programmer. We write a lot of abstracted interfaces with platform specific implementations. I know, I know, version(X){} your code, right? But that's not how everyone works. Some implementations really do require their own file for maintenance and legal purposes.

The usual idea for PAL structure is to put implementations in separate folders:
src/lin/pal/utils.d - module pal.utils;
src/win/pal/utils.d - module pal.utils;
Then you can just import pal.utils; and invoke the compiler with -Isrc/lin option.

July 28, 2016

Re: Things that make writing a clean binding system more difficult

Posted by Jonathan M Davis
in reply to Walter Bright

Jonathan M Davis

Posted in reply to Walter Bright

On Thursday, July 28, 2016 01:49:35 Walter Bright via Digitalmars-d wrote:
> On 7/28/2016 1:33 AM, Ethan Watson wrote:
> > 1) Declaring a function pointer with a ref return value can't be done without workarounds.
> >
> > Try compiling this:
> >
> > ref int function( int, int ) functionPointer;
> >
> > It won't let you, because only parameters and for loop symbols can be ref types. Despite the fact that I intend the function pointer to be of a kind that returns a ref int, I can't declare that easily. Easy, declare an alias, right?
> >
> > alias RefFunctionPointer = ref int function( int, int );
>
> C/C++ have essentially the same problem, if you want to declare a function pointer parameter that has different linkage.
>
> The trouble is there's an ambiguity in the grammar. I don't really have anything better than the two step process you outlined.

Well, if we decided to make parens with ref legal, then we could make it work. e.g.

ref(int) function(int, int) functionPointer;

Now, I don't know of any other case where you'd actually use parens with ref if it were legal, but it would solve this particular case if we wanted to provide a way around the ambiguity.

- Jonathan M Davis

July 28, 2016

Re: Things that make writing a clean binding system more difficult

Posted by jmh530
in reply to Jonathan M Davis

jmh530

Posted in reply to Jonathan M Davis

On Thursday, 28 July 2016 at 20:16:11 UTC, Jonathan M Davis wrote:
>
> Well, if we decided to make parens with ref legal, then we could make it work. e.g.
>
> ref(int) function(int, int) functionPointer;
>
> Now, I don't know of any other case where you'd actually use parens with ref if it were legal, but it would solve this particular case if we wanted to provide a way around the ambiguity.
>
> - Jonathan M Davis

On a somewhat related tangent, I was looking for the history of why ref was included in the language. My recollection of Andrei's book is that it just takes ref as a given, instead of pass by reference or address in C++, rather than say why that decision was made. I found that ref was added in D 1.011 as a replacement for inout. Looking through some of the D 1.0 documentation, I see that
"C++ does not distinguish between in, out and ref (i.e. inout) parameters."
but not much else.

July 28, 2016

Re: Things that make writing a clean binding system more difficult

Posted by Steven Schveighoffer
in reply to Jonathan M Davis

Steven Schveighoffer

Posted in reply to Jonathan M Davis

On 7/28/16 4:16 PM, Jonathan M Davis via Digitalmars-d wrote:
> On Thursday, July 28, 2016 01:49:35 Walter Bright via Digitalmars-d wrote:
>> On 7/28/2016 1:33 AM, Ethan Watson wrote:
>>> 1) Declaring a function pointer with a ref return value can't be done
>>> without workarounds.
>>>
>>> Try compiling this:
>>>
>>> ref int function( int, int ) functionPointer;
>>>
>>> It won't let you, because only parameters and for loop symbols can be ref
>>> types. Despite the fact that I intend the function pointer to be of a
>>> kind that returns a ref int, I can't declare that easily. Easy, declare
>>> an alias, right?
>>>
>>> alias RefFunctionPointer = ref int function( int, int );
>>
>> C/C++ have essentially the same problem, if you want to declare a function
>> pointer parameter that has different linkage.
>>
>> The trouble is there's an ambiguity in the grammar. I don't really have
>> anything better than the two step process you outlined.
>
> Well, if we decided to make parens with ref legal, then we could make it
> work. e.g.
>
> ref(int) function(int, int) functionPointer;
>
> Now, I don't know of any other case where you'd actually use parens with ref
> if it were legal, but it would solve this particular case if we wanted to
> provide a way around the ambiguity.

No, because that implies a type-modifier. ref does not modify the type at all, it just specifies the storage class.

-Steve

July 29, 2016

Re: Things that make writing a clean binding system more difficult

Posted by Timon Gehr
in reply to Walter Bright

Timon Gehr

Posted in reply to Walter Bright

On 28.07.2016 10:49, Walter Bright wrote:
> On 7/28/2016 1:33 AM, Ethan Watson wrote:
>> 1) Declaring a function pointer with a ref return value can't be done
>> without
>> workarounds.
>>
>> Try compiling this:
>>
>> ref int function( int, int ) functionPointer;
>>
>> It won't let you, because only parameters and for loop symbols can be
>> ref types.
>> Despite the fact that I intend the function pointer to be of a kind
>> that returns
>> a ref int, I can't declare that easily. Easy, declare an alias, right?
>>
>> alias RefFunctionPointer = ref int function( int, int );
>
> C/C++ have essentially the same problem, if you want to declare a
> function pointer parameter that has different linkage.
>
> The trouble is there's an ambiguity in the grammar. I don't really have
> anything better than the two step process you outlined.
> ...

My parser accepts the following:

int function(int,int)ref functionPointer;

I wasn't really aware that this was illegal in DMD. (Other function attributes, such as pure, are accepted.)

In fact, even the following is disallowed:
int foo(int)ref{}


Should I file an enhancement request?

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation