Interesting Research Paper on Constructors in OO Languages (page 3) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » Interesting Research Paper on Constructors in OO Languages (page 3)

July 17, 2013

Re: Interesting Research Paper on Constructors in OO Languages

Posted by H. S. Teoh
in reply to Regan Heath

H. S. Teoh

Posted in reply to Regan Heath

On Wed, Jul 17, 2013 at 11:00:38AM +0100, Regan Heath wrote:
> On Tue, 16 Jul 2013 23:01:57 +0100, H. S. Teoh <hsteoh@quickfur.ath.cx> wrote:
> >On Tue, Jul 16, 2013 at 06:17:48PM +0100, Regan Heath wrote:
[...]
> >>>>class Foo
> >>>>{
> >>>>  string name;
> >>>>  int age;
> >>>>
> >>>>  invariant
> >>>>  {
> >>>>    assert(name != null);
> >>>>    assert(age > 0);
> >>>>  }
> >>>>
> >>>>  property string Name...
> >>>>  property int Age...
> >>>>}
> >>>>
> >>>>void main()
> >>>>{
> >>>>  Foo f = new Foo() {
> >>>>    Name = "test",    // calls property Name setter
> >>>>    Age = 12          // calls property Age setter
> >>>>  };
> >>>>}
> >
> >Maybe I'm missing something obvious, but isn't this essentially the same thing as having named ctor parameters?
> 
> Yes, if we're comparing this to ctors with named parameters.  I wasn't doing that however, I was asking this Q:
> 
> "Or, perhaps another way to ask a similar W is.. can the compiler statically verify that a create-set-call style object has been initialised, or rather that an attempt has at least been made to initialise all the required parts."
> 
> Emphasis on "create-set-call" :)  The weakness to create-set-call style is the desire for a valid object as soon as an attempt can be made to use it.  Which implies the need for some sort of enforcement of initialisation and as I mentioned in my first post the issue of preventing this intialisation being spread out, or intermingled with others and thus making the semantics of it harder to see.

Ah, I see. So basically, you need some kind of enforcement of a two-state object, pre-initialization and post-initialization. Basically, the ctor is empty, so you allocate the object first, then set some values into it, then it "officially" becomes a full-fledged instance of the class. To prevent problems with consistency, a sharp transition between setting values and using the object is enforced. Am I right?

I guess my point was that if we boil this down to the essentials, it's basically the same idea as a builder pattern, just implemented slightly differently. In the builder pattern, a separate object (or struct, or whatever) is used to encapsulate the state of the object that we'd like it to be in, which we then pass to the ctor to create the object in that state. The idea is the same, though: set up a bunch of values representing the desired initial state of the object, then, to borrow Perl's terminology, "bless" it into a full-fledged class instance.

> My idea here attempted to solve those issues with create-set-call only.

Fair enough. I guess my approach was from the angle of trying to address the problem from the confines of the current language. So, same idea, different implementation. :)

[...]
> >>The idea was to /use/ the code in the invariant to determine which member fields should be set during the initialisation statement and then statically verify that a call was made to some member function to set them.  The actual values set aren't important, just that some attempt has been made to set them.  That's about the limit of what I think you could do statically, in the general case.
> >[...]
> >
> >This still doesn't address the issue of ctor argument proliferation, though
> 
> It wasn't supposed to :)  create-set-call ctors have no arguments.

True. But if the ctor call requires a code block that initializes mandatory initial values, then isn't it essentially the same thing as ctors that have arguments? If the class hierarchy is deep, and base classes have mandatory fields to be set, then you still have the same problem, just in a different manifestation.

> >if each level of the class hierarchy adds 1-2 additional parameters, you still need to write tons of boilerplate in your derived classes to percolate those additional parameters up the inheritance tree.
> 
> In the create-set-call style additional required 'arguments' would appear as setter member functions whose underlying data member is verified in the invariant and would therefore be enforced by the syntax I detailed.

What happens when base classes also have required setter member functions that you must call?

> >Now imagine if at some point you need to change some base class ctor parameters. Now instead of making a single change to the base class, you have to update every single derived class to make the same change to every ctor, so that the new version of the parameter (or new parameter) is properly percolated up the inheritance tree.
> 
> This is one reason why create-set-call might be desirable, no ctor arguments, no problem.

Right.

> So, to take my idea a little further - WRT class inheritance.  The compiler, for a derived class, would need to inspect the invariants of all classes involved (these are and-ed already), inspect the constructors of the derived classes (for calls to initialise members), and the initialisation block I described and verify statically that an attempt was made to initialise all the members which appear in all the invariants.

I see. So basically the user still has to set up all required values before you can use the object, the advantage being that you don't have to manually percolate these values up the inheritance tree in the ctors.

It seems to be essentially the same thing as my approach, just implemented differently. :) In my approach, ctor arguments are encapsulated inside a struct, currently called Args by convention. So if you have, say, a class hierarchy where class B inherits from class A, and A.this() has 5 parameters and B.this() adds another 5 parameters, then B.Args would have 10 fields. To create an instance of B, the user would do this:

	B.Args args;
	args.field1 = 10;
	args.field2 = 20;
	...
	auto obj = new B(args);

So in a sense, this isn't that much different from your approach, in that the user sets a bunch of values desired for the initial state of the object, then gets a full-fledged object out of it at the end.

In my case, all ctors in the class hierarchy would take a single struct argument encapsulating all ctor arguments for that class (including arguments to its respective base class ctors, etc.). So ctors would look like this:

	class B : A {
		struct Args { ... }
		this(Args args) {
			super(...);
			... // set up object based on values in args
		}
	}

The trick here, then, is that call to super(...). The naïve way of doing this is to (manually) include base class ctor arguments as part of B.Args, then in B's ctor, we collect those arguments together in A.Args, and hand that over to A's ctor. But we can do better. Since A.Args is already defined, there's no need to duplicate all those fields in B.Args; we can simply do this:

	class B : A {
		struct Args {
			A.Args baseClassArgs;
			... // fields specific to B
		}
		this(Args args) {
			super(args.baseClassArgs);
			...
		}
	}

This is ugly, though, 'cos now user code has to know about B.Args.baseClassArgs:

	B.Args args;
	args.baseClassArgs.baseClassParm1 = 123;
	args.derivedClassParm1 = 234;
	...
	auto obj = new B(args);

So the next step is to use alias this to make .baseClassArgs transparent to user code:

	class B : A {
		struct Args {
			A.Args baseClassArgs;
			alias baseClassArgs this; // <--- N.B.
			... // fields specific to B
		}
		this(Args args) {
			// Nice side-effect of alias this: we can pass
			// args to super without needing to explicitly
			// name .baseClassArgs.
			super(args);
			...
		}
	}

	// Now user code doesn't need to know about .baseClassArgs:
	B.Args args;
	args.baseClassParm1 = 123;
	args.derivedClassParm1 = 234;
	...
	auto obj = new B(args);

This is starting to look pretty good. Now the next step is, having to type A.Args baseClassArgs each time is a lot of boilerplate, and could be error-prone. For example, if we accidentally wrote C.Args instead of A.Args:

	class B : A {
		struct Args {
			C.Args baseClassArgs; // <--- oops!
			alias baseClassArgs this;
			...
		}
		...
	}

So the next step is to make the type of baseClassArgs automatically inferred, so that no matter how we move B around in the class hierarchy, it will always be correct:

	class B : A {
		struct Args {
			typeof(super).Args baseClassArgs; // ah, much better!
			alias baseClassArgs this;
			...
		}
		this(Args args) {
			super(args);
			...
		}
	}

This is good, because now, the declaration of B.Args is independent of whatever base class B has. Similarly, thanks to the alias this introduced earlier, the call to super(...) is always written super(args), without any explicit reference to the specific base class. DRY is good. Of course, this is still a lot of boilerplate: you have to keep typing out the first 3 lines of the declaration of Args, in every derived class. But now that we've made this declaration independent of an explicit base class name, we can factor it into a mixin:

	mixin template CtorArgs(string fields) {
		struct Args {
			typeof(super).Args baseClassArgs;
			alias baseClassArgs this;
			mixin(fields);
		}
	}

	class B : A {
		mixin CtorArgs!(q{
			int derivedParm1;
			int derivedParm2;
			...
		});
		this(Args args) {
			super(args);
			...
		}
	}

Now we can simply use CtorArgs!(...) in each derived class to automatically declare the Args struct correctly. The boilerplate is now minimal. Things continue to work even if we move B around in the class hierarchy. Say we want to derive B from C instead of A; then we'd simply write:

	class B : C {	// <-- this is the only line that's different!
		mixin CtorArgs!(q{
			int derivedParm1;
			int derivedParm2;
			...
		});
		this(Args args) {
			super(args);
			...
		}
	}

Finally, we add a little detail to our mixin so that we can use it for the root of the class hierarchy as well. Right now, we still have to explicitly declare A.Args (assuming A is the root of our hierarchy), which is bad, because you may accidentally call it something that doesn't match what CtorArgs expects. We'd like to be able to consistently use CtorArgs even in the root base class, so that if we ever need to re-root the hierarchy, things will continue to Just Work. So we revise CtorArgs thus:

	mixin template CtorArgs(string fields) {
		struct Args {
			static if (!is(typeof(super)==Object)) {
				typeof(super).Args baseClassArgs;
				alias baseClassArgs this;
			}
			mixin(fields);
		}
	}

Basically, the static if just omits the whole baseClassArgs and alias this deal ('cos the root of the hierarchy has no superclass that also has an Args struct). So now we can write:

	class A {
		mixin CtorArgs!(q{ /* ctor fields here */ });
		...
	}

And if we ever re-root the hierarchy, we can simply write:

	class A : B {	// <--- this is the only line that changes
		mixin CtorArgs!(q{ /* ctor fields here */ });
		...
	}

> >I think my approach of using builder structs with a parallel inheritance tree is still better
> 
> It may be, it certainly looked quite neat but I haven't had a detailed look at it TBH.  I think you've missunderstood my idea however, or rather, the issues it was intended to solve :)  Perhaps my idea is too limiting for you?  I could certainly understand that point of view.

Well, I think our approaches are essentially the same thing, just implemented differently. :)

One thing about your implementation that I found limiting was that you *have* to declare all required fields on-the-spot before the compiler will let your 'new' call pass, so if you have to create 5 similar instances of the class, you have to copy-n-paste most of the set-method calls:

	auto obj1 = new C() {
		name = "test1",
		age = 12,
		school = "D Burg High School"
	});

	auto obj2 = new C() {
		name = "test2",
		age = 12,
		school = "D Burg High School"
	}

	auto obj3 = new C() {
		name = "test3",
		age = 12,
		school = "D Burg High School"
	}

	auto obj4 = new C() {
		name = "test4",
		age = 12,
		school = "D Burg High School"
	}

	auto obj5 = new C() {
		name = "test5",
		age = 12,
		school = "D Burg High School"
	}

Whereas using my approach, you can simply reuse the Args struct several times:

	C.Args args;
	args.name = "test1";
	args.age = 12;
	args.school = "D Burg High School";
	auto obj1 = new C(args);

	args.name = "test2";
	auto obj2 = new C(args);

	args.name = "test3";
	auto obj3 = new C(args);

	... // etc.

You can also have different functions setup different parts of C.Args:

	C createObject(C.Args args) {
		// N.B. only need to set a subset of fields
		args.school = "D Burg High School";
		return new C(args);
	}

	void main() {
		C.Args args;
		args.name = "test1";
		args.age = 12;		// partially setup Args
		auto obj = createObject(args); // createObject fills out rest of the fields.
		...

		args.name = "test2";	// modify a few parameters
		auto obj2 = createObject(args); // createObject doesn't need to know about this change
	}

This is nice if there are a lot of parameters and you don't want to collect the setting up of all of them in one place.

> I think another interesting idea is using the builder pattern with create-set-call objects.
> 
> For example, a builder template class could inspect the object for UDA's indicating a data member which is required during initialisation.  It would contain a bool[] to flag each member as not/initialised and expose a setMember() method which would call the underlying object setMember() and return a reference to itself.
> 
> At some point, these setMember() method would want to return another
> template class which contained just a build() member.  I'm not sure
> how/if this is possible in D.
[...]

Hmm, this is an interesting idea indeed. I think it may be possible to implement in the current language. It would solve the problem of mandatory fields, which is currently a main weakness of my approach (the user can neglect to setup a field in Args, and there's no way to enforce that those fields *must* be set -- you could provide sane defaults in the declaration of Args, but if some fields have no sane default value, then you're out of luck). One approach is to use Nullable for mandatory fields (or equivalently, use bool[] as you suggest), then the ctors will throw an exception if a required field hasn't been set yet. Which isn't a bad solution, since ctors in theory *should* vet their input values before creating an instance of the class anyway. But it does require some amount of boilerplate.

Maybe we can make use of UDAs to indicate which fields are mandatory, then have a template (or mixin template) uses compile-time reflection to generate the code that verifies that these fields have indeed been set. Maybe something like:

	struct RequiredAttr {}

	// Warning: have not tried to compile this yet
	mixin template checkCtorArgs(alias args) {
		alias Args = typeof(args);
		foreach (field; __traits(allMembers, Args)) {
			// (Ugh, __traits syntax is so ugly)
			static if (is(__traits(getAttributes,
				__traits(getMember, args,
				field)[0])==RequiredAttr))
			{
				if (__traits(getMember, args, field) is null)
					throw new Exception("...");
			}
		}
	}

	class B : A {
		mixin CtorArgs!(q{
			int myfield1;	// this one is optional
			@(RequiredAttr) Nullable!int myfield2; // this one is mandatory
		});
		this(Args args) {
			mixin checkCtorArgs!(args);
				// throws if any mandatory fields aren't set
			...
		}
	}

Just a rough idea, haven't actually tried to compile this code yet.

On second thoughts, maybe we could just check for an instantiation of Nullable instead of using a UDA, since if you forget to use a nullable value (like int instead of Nullable!int), this code wouldn't work.

Or maybe enhance the CtorArgs template to automatically substitute Nullable!T when it sees a field of type T that's marked with @(RequiredAttr). Or maybe your bool[] idea is better, since it avoids the dependency on Nullable.

In any case, this is an interesting direction to look into.

T

-- 
Тише едешь, дальше будешь.

July 17, 2013

Re: Interesting Research Paper on Constructors in OO Languages

Posted by w0rp
in reply to H. S. Teoh

w0rp

Posted in reply to H. S. Teoh

I always just avoided confusion by limiting myself to a maximum
of 5 arguments for any function or constructor, maybe with a soft
limit of 3. Preferring composition over inheritance helps too.

July 17, 2013

Re: Interesting Research Paper on Constructors in OO Languages

Posted by H. S. Teoh
in reply to w0rp

H. S. Teoh

Posted in reply to w0rp

On Wed, Jul 17, 2013 at 11:19:21PM +0200, w0rp wrote:
> I always just avoided confusion by limiting myself to a maximum of 5 arguments for any function or constructor, maybe with a soft limit of 3. Preferring composition over inheritance helps too.

My original motivation for trying to tackle this problem was when I was experimenting with maze generation algorithms. I had a base class representing all maze generators, and various derived classes representing specific algorithms. Some of these algorithms have quite a large number of configurable parameters, and the algorithms themselves have different flavors, so some classes that already have many parameters would have derived classes that introduce a few more.

Encapsulating all of these parameters inside structs was the only sane way I could think of to manage the large sets of parameters involved.

Also, I agree that 3-5 parameters per function/ctor is about the max for a clean interface -- any more than that and it's a sign that you aren't organizing your code properly.  But in the case of ctors, it's not so much the 3-5 parameters required for the class itself that's the problem, but the fact that these parameters *accumulate* in all derived classes. If you have a 4-level class hierarchy and each level adds 5 more parameters, that's 20 parameters in total, which is clearly unmanageable.


T

-- 
Designer clothes: how to cover less by paying more.

July 17, 2013

Re: Interesting Research Paper on Constructors in OO Languages

Posted by eles
in reply to H. S. Teoh

eles

Posted in reply to H. S. Teoh

On Wednesday, 17 July 2013 at 21:42:16 UTC, H. S. Teoh wrote:
> On Wed, Jul 17, 2013 at 11:19:21PM +0200, w0rp wrote:

This is how it is done in Ecere SDK and the eC language:

"However, constructors particularly do not play a role a
important as in C++, for example. Neither constructors nor destructors
can take in any parameters, and only a single one of each can be
defined within a class."

"Instead, members can be directly assigned a
value through the instantiation syntax initializers (either through the
data members, or the properties which we will describe in next
chapter)."

"They cannot be specified a return type either. A constructor
should never fail, but returning false(they have an implicit bool
return type) will result in a the object instantiated to be null."

July 17, 2013

Re: Interesting Research Paper on Constructors in OO Languages

Posted by eles
in reply to eles

eles

Posted in reply to eles

On Wednesday, 17 July 2013 at 21:59:14 UTC, eles wrote:
> On Wednesday, 17 July 2013 at 21:42:16 UTC, H. S. Teoh wrote:
>> On Wed, Jul 17, 2013 at 11:19:21PM +0200, w0rp wrote:
>
> This is how it is done in Ecere SDK and the eC language:

Example:

import"ecere"
classForm1 : Window
{
text = "Form1";
background = activeBorder;
borderStyle = sizable;
hasMaximize = true;
hasMinimize = true;
hasClose = true;
clientSize = { 400, 300 };
}
Form1 form1 {};

Basically, you assign needed fields first, then call an unique constructor on tthat skeleton.

July 17, 2013

Re: Interesting Research Paper on Constructors in OO Languages

Posted by Ali Çehreli
in reply to deadalnix

Ali Çehreli

Posted in reply to deadalnix

On 07/16/2013 01:30 AM, deadalnix wrote:
> My policy is to require the bare minimum to construct a valid object, in
> order to avoid initialization hell.

+0.33

>
> Not knowing what/when to initialize thing is really painful as well. It
> also introduce sequential coupling and wrongly initialized object tends
> to explode far away from their construction point.

+0.33

>
> What goes in this category ? Any state that can't have any default value
> that make sense, as well as any state that is expansive to initialize.

+0.33

And to complete: +0.01 :p

Ali

July 18, 2013

Re: Interesting Research Paper on Constructors in OO Languages

Posted by Regan Heath
in reply to H. S. Teoh

Regan Heath

Posted in reply to H. S. Teoh

On Wed, 17 Jul 2013 18:58:53 +0100, H. S. Teoh <hsteoh@quickfur.ath.cx> wrote:
> On Wed, Jul 17, 2013 at 11:00:38AM +0100, Regan Heath wrote:
>> Emphasis on "create-set-call" :)  The weakness to create-set-call
>> style is the desire for a valid object as soon as an attempt can be
>> made to use it.  Which implies the need for some sort of enforcement
>> of initialisation and as I mentioned in my first post the issue of
>> preventing this intialisation being spread out, or intermingled with
>> others and thus making the semantics of it harder to see.
>
> Ah, I see. So basically, you need some kind of enforcement of a
> two-state object, pre-initialization and post-initialization. Basically,
> the ctor is empty, so you allocate the object first, then set some
> values into it, then it "officially" becomes a full-fledged instance of
> the class. To prevent problems with consistency, a sharp transition
> between setting values and using the object is enforced. Am I right?

Yes, that's basically it.

> I guess my point was that if we boil this down to the essentials, it's
> basically the same idea as a builder pattern, just implemented slightly
> differently. In the builder pattern, a separate object (or struct, or
> whatever) is used to encapsulate the state of the object that we'd like
> it to be in, which we then pass to the ctor to create the object in that
> state. The idea is the same, though: set up a bunch of values
> representing the desired initial state of the object, then, to borrow
> Perl's terminology, "bless" it into a full-fledged class instance.

It achieves the same ends, but does it differently.  My idea requires compiler support (which makes it unlikely to happen) and doesn't require separate objects (which I think is a big plus).

>> So, to take my idea a little further - WRT class inheritance.  The
>> compiler, for a derived class, would need to inspect the invariants
>> of all classes involved (these are and-ed already), inspect the
>> constructors of the derived classes (for calls to initialise
>> members), and the initialisation block I described and verify
>> statically that an attempt was made to initialise all the members
>> which appear in all the invariants.
>
> I see. So basically the user still has to set up all required values
> before you can use the object, the advantage being that you don't have
> to manually percolate these values up the inheritance tree in the ctors.

Exactly.

> It seems to be essentially the same thing as my approach, just
> implemented differently. :)[...]

Thanks for the description of your idea.

As I understand it, in your approach all the mandatory parameters for all classes in the hierarchy are /always/ passed to the final child constructor.  In my idea a constructor in the hierarchy could chose to set some of the mandatory members of it's parents, and the compiler would detect that and would not require the initialisation block to contain those members.

Also, in your approach there isn't currently any enforcement that the user sets all the mandatory parameters of Args, and this is kinda the main issue my idea solves.

> One thing about your implementation that I found limiting was that you
> *have* to declare all required fields on-the-spot before the compiler
> will let your 'new' call pass, so if you have to create 5 similar
> instances of the class, you have to copy-n-paste most of the set-method
> calls:
>
> 	auto obj1 = new C() {
> 		name = "test1",
> 		age = 12,
> 		school = "D Burg High School"
> 	});
>
> [...]
>
> Whereas using my approach, you can simply reuse the Args struct several
> times:
>
> 	C.Args args;
> 	args.name = "test1";
> 	args.age = 12;
> 	args.school = "D Burg High School";
> 	auto obj1 = new C(args);
>
> 	args.name = "test2";
> 	auto obj2 = new C(args);
>
> 	args.name = "test3";
> 	auto obj3 = new C(args);
>
> 	... // etc.

Or.. you use a mixin, or better still you add a copy-constructor or .dup method to your class to duplicate it :)

> You can also have different functions setup different parts of C.Args:
>
> 	C createObject(C.Args args) {
> 		// N.B. only need to set a subset of fields
> 		args.school = "D Burg High School";
> 		return new C(args);
> 	}
>
> 	void main() {
> 		C.Args args;
> 		args.name = "test1";
> 		args.age = 12;		// partially setup Args
> 		auto obj = createObject(args); // createObject fills out rest of the fields.
> 		...
>
> 		args.name = "test2";	// modify a few parameters
> 		auto obj2 = createObject(args); // createObject doesn't need to know about this change
> 	}
>
> This is nice if there are a lot of parameters and you don't want to
> collect the setting up of all of them in one place.

In my case you can call different functions in the initialisation block, e.g.

void defineObject(C c)
{
  c.school = "...);
}

C c = new C() {
  defineObject()
}

:)

>> I think another interesting idea is using the builder pattern with
>> create-set-call objects.
>>
>> For example, a builder template class could inspect the object for
>> UDA's indicating a data member which is required during
>> initialisation.  It would contain a bool[] to flag each member as
>> not/initialised and expose a setMember() method which would call the
>> underlying object setMember() and return a reference to itself.
>>
>> At some point, these setMember() method would want to return another
>> template class which contained just a build() member.  I'm not sure
>> how/if this is possible in D.
> [...]
>
> Hmm, this is an interesting idea indeed. I think it may be possible to
> implement in the current language.

The issue I think is the step where you want to mutate the return type from the type with setX members to the type with build().

> Maybe we can make use of UDAs to indicate which fields are mandatory

That was what I was thinking.

> [...]
> Just a rough idea, haven't actually tried to compile this code yet.

Worth a go, it doesn't require compiler support like my idea so it's far more likely you'll get something at the end of it.. I can just sit on my hands and/or try to promote my idea.

I still prefer my idea :P.  I think it's cleaner and simpler, this is in part because it requires compiler support and that hides the gory details, but also because create-set-call is a simpler style in itself.  Provided the weaknesses of create-set-call can be addressed I might be tempted to use that style.

R

-- 
Using Opera's revolutionary email client: http://www.opera.com/mail/

July 18, 2013

Re: Interesting Research Paper on Constructors in OO Languages

Posted by H. S. Teoh
in reply to Regan Heath

H. S. Teoh

Posted in reply to Regan Heath

On Thu, Jul 18, 2013 at 10:13:58AM +0100, Regan Heath wrote:
> On Wed, 17 Jul 2013 18:58:53 +0100, H. S. Teoh <hsteoh@quickfur.ath.cx> wrote:
[...]
> >I guess my point was that if we boil this down to the essentials, it's basically the same idea as a builder pattern, just implemented slightly differently. In the builder pattern, a separate object (or struct, or whatever) is used to encapsulate the state of the object that we'd like it to be in, which we then pass to the ctor to create the object in that state. The idea is the same, though: set up a bunch of values representing the desired initial state of the object, then, to borrow Perl's terminology, "bless" it into a full-fledged class instance.
> 
> It achieves the same ends, but does it differently.  My idea requires
> compiler support (which makes it unlikely to happen) and doesn't
> require separate objects (which I think is a big plus).

Why would requiring separate objects be a problem?

[...]
> Thanks for the description of your idea.
> 
> As I understand it, in your approach all the mandatory parameters for all classes in the hierarchy are /always/ passed to the final child constructor.  In my idea a constructor in the hierarchy could chose to set some of the mandatory members of it's parents, and the compiler would detect that and would not require the initialisation block to contain those members.

In my case, the derived class ctor could manually set some of the fields in Args before handing to the superclass. Of course, it's not as ideal, since if user code already sets said fields, then they get silently overridden.

> Also, in your approach there isn't currently any enforcement that the user sets all the mandatory parameters of Args, and this is kinda the main issue my idea solves.

True. One workaround is to use Nullable and check that in the ctor. But I suppose it's not as great as a compile-time check.

> >One thing about your implementation that I found limiting was that you *have* to declare all required fields on-the-spot before the compiler will let your 'new' call pass, so if you have to create 5 similar instances of the class, you have to copy-n-paste most of the set-method calls:
> >
> >	auto obj1 = new C() {
> >		name = "test1",
> >		age = 12,
> >		school = "D Burg High School"
> >	});
> >
> >[...]
> >
> >Whereas using my approach, you can simply reuse the Args struct several times:
> >
> >	C.Args args;
> >	args.name = "test1";
> >	args.age = 12;
> >	args.school = "D Burg High School";
> >	auto obj1 = new C(args);
> >
> >	args.name = "test2";
> >	auto obj2 = new C(args);
> >
> >	args.name = "test3";
> >	auto obj3 = new C(args);
> >
> >	... // etc.
> 
> Or.. you use a mixin, or better still you add a copy-constructor or .dup method to your class to duplicate it :)

But then you end up with the problem of needing to call set methods after the .dup, which may complicate things if the set methods need to do non-trivial initialization of internal structures (caches or internal representations, etc.). Whereas if you hadn't needed to .dup, you could have gotten by without writing any set methods for your class, but now you have to.

[...]
> In my case you can call different functions in the initialisation block, e.g.
> 
> void defineObject(C c)
> {
>   c.school = "...);
> }
> 
> C c = new C() {
>   defineObject()
> }
> 
> :)

So the compiler has to recursively traverse function calls in the initialization block in order to check that all required fields are set? That could have entail some implementational issues, if said function calls can be arbitrarily complex. (If you have complex control logic in said functions, the compiler can't in general determine whether or not some paths will/will not be taken that may assignment statements to the object's fields, since that would be equivalent to the halting problem. Worse, the compiler would have to track aliases of the object being set, in order to know which assignment statements are setting fields in the object, and which are just computations on the side.)

Furthermore, what if defineObject tries to do something with C other than setting up fields? The object would be in an illegal state since it hasn't been fully constructed yet.

> >>I think another interesting idea is using the builder pattern with create-set-call objects.
> >>
> >>For example, a builder template class could inspect the object for UDA's indicating a data member which is required during initialisation.  It would contain a bool[] to flag each member as not/initialised and expose a setMember() method which would call the underlying object setMember() and return a reference to itself.
> >>
> >>At some point, these setMember() method would want to return another
> >>template class which contained just a build() member.  I'm not sure
> >>how/if this is possible in D.
> >[...]
> >
> >Hmm, this is an interesting idea indeed. I think it may be possible to implement in the current language.
> 
> The issue I think is the step where you want to mutate the return type from the type with setX members to the type with build().

I'm not sure I understand that sentence. Could you rephrase it?

> >Maybe we can make use of UDAs to indicate which fields are mandatory
> 
> That was what I was thinking.
> 
> >[...]
> >Just a rough idea, haven't actually tried to compile this code yet.
> 
> Worth a go, it doesn't require compiler support like my idea so it's far more likely you'll get something at the end of it.. I can just sit on my hands and/or try to promote my idea.
> 
> I still prefer my idea :P.  I think it's cleaner and simpler, this is in part because it requires compiler support and that hides the gory details, but also because create-set-call is a simpler style in itself.  Provided the weaknesses of create-set-call can be addressed I might be tempted to use that style.
[...]

One thing I like about your idea is that you can reuse the same chunk of memory that the eventual object is going to sit in. With my approach, the ctors still have to copy the struct fields into the object fields, so there is some overhead there. (Having said that though, that overhead shouldn't be anything worse than the ctor-with-arguments calls it replaces; you're basically just abstracting away the ctor parameters on the stack into a struct. In machine code it's pretty much equivalent.)

Requiring compiler support, though, as you said, makes your idea less likely to actually happen. I still see it as essentially equivalent to my approach; the syntax is different and the usage pattern differs, but at the end of the day, it amounts to the same thing: basically your objects have two phases, a post-creation, pre-usage stage where you set things up, and a post-setup stage where you actually start using it.

Anyway, now that I'm thinking about this problem again, I'd like to take a step back and consider if any other good approaches may exist to tackle this issue. I'm thinking of the general case where the initialization of an object may be arbitrarily complex, such that neither a struct of ctor arguments nor an initialization block may be sufficient.

The problem with the struct approach is, what if you need a complex setup process, say constructing a graph with complex interconnections between nodes? In order to express such a thing, you have to essentially already create the object before you can pass the struct to the ctor, which kinda defeats the purpose. Similarly, your approach of an initialization block suffers from the limitation that the initialization is confined to that block, and you can't allow arbitrary code in that block (otherwise you could end up using an object that hasn't been fully constructed yet -- like the defineObject problem I pointed out above).

Keeping in mind the create-set-call pattern and Perl's approach of "blessing" an object into a full-fledged class instance, I wonder if a more radical approach might be to have the language acknowledge that objects have two phases, a preinitialized state, and a fully-initialized state. These two would have distinct types *in the type system*, such that you cannot, for example, call post-init methods on a pre-initialization object, and you can't call an init method on a post-initialization object. The ctor would be the unique transition point which takes a preinitialized object, verifies compliance with class invariants, and returns a post-initialization object.

In pseudo-code, this might look something like this:

	class MyClass {
	public:
		@preinit void setName(string name);
		@preinit void setAge(int age);

		this() {
			if (!validateFields())
				throw new Exception(...);
		}

		// The following are "normal" methods that cannot be
		// called in a preinit state.
		void computeStatistics();
		void dotDotDotMagic();
	}

	void main() {
		auto obj = new MyClass();
		assert(typeof(obj) == MyClass.preinit);
		/* MyClass.preinit is a special type indicating that the
		 * object isn't fully initialized yet */

		// Compile error: cannot call non-@preinit method on
		// @preinit object.
		//obj.computeStatistics();

		obj.setName(...);	// OK
		obj.setAge(...);	// OK

		// Transition object to full-fledged state
		obj.this();		// not sure about this syntax yet

		assert(typeof(obj) == MyClass);
		/* Now obj is a full-fledged member of the class */

		// Compile error: can't call @preinit method on
		// non-preinit object
		//obj.setName(...);

		obj.computeStatistics();	// OK
	}

MyClass.preinit would be a separate type in the type system, so that you can pass it around without any risk that someone will try to perform illegal operations on it before it's fully initialized:

	void doSetup(MyClass.preinit obj) {
		obj.setName(...);		// OK
		//obj.computeStatistics();	// compile error
	}
	void main() {
		auto obj = new MyClass();
		doSetup(obj);		// OK
		obj.this();		// "promote" to full-fledged object

		// Illegal: can't implicitly convert MyClass into
		// MyClass.preinit.
		//doSetup(obj);

		obj.computeStatistics(); // OK
	}

Maybe "obj.this()" is not a good syntax, perhaps "obj.promote()"?

In any case, this is a rather radical idea which requires language support; I'm not sure how practical it is. :)

T

-- 
"Uhh, I'm still not here." -- KD, while "away" on ICQ.

July 19, 2013

Re: Interesting Research Paper on Constructors in OO Languages

Posted by Regan Heath
in reply to H. S. Teoh

Regan Heath

Posted in reply to H. S. Teoh

On Thu, 18 Jul 2013 19:00:44 +0100, H. S. Teoh <hsteoh@quickfur.ath.cx> wrote:

> On Thu, Jul 18, 2013 at 10:13:58AM +0100, Regan Heath wrote:
>> On Wed, 17 Jul 2013 18:58:53 +0100, H. S. Teoh
>> <hsteoh@quickfur.ath.cx> wrote:
> [...]
>> >I guess my point was that if we boil this down to the essentials,
>> >it's basically the same idea as a builder pattern, just implemented
>> >slightly differently. In the builder pattern, a separate object (or
>> >struct, or whatever) is used to encapsulate the state of the object
>> >that we'd like it to be in, which we then pass to the ctor to create
>> >the object in that state. The idea is the same, though: set up a
>> >bunch of values representing the desired initial state of the object,
>> >then, to borrow Perl's terminology, "bless" it into a full-fledged
>> >class instance.
>>
>> It achieves the same ends, but does it differently.  My idea requires
>> compiler support (which makes it unlikely to happen) and doesn't
>> require separate objects (which I think is a big plus).
>
> Why would requiring separate objects be a problem?

It's not a problem, it's just better not to, if at all possible. K.I.S.S. :)

> In my case, the derived class ctor could manually set some of the fields
> in Args before handing to the superclass. Of course, it's not as ideal,
> since if user code already sets said fields, then they get silently
> overridden.

That's the problem I was imagining.

>> Also, in your approach there isn't currently any enforcement that
>> the user sets all the mandatory parameters of Args, and this is
>> kinda the main issue my idea solves.
>
> True. One workaround is to use Nullable and check that in the ctor. But
> I suppose it's not as great as a compile-time check.

Yeah, I was angling for a static/compile time check, if at all possible.

>> >Whereas using my approach, you can simply reuse the Args struct
>> >several times:
>> >
>> >	C.Args args;
>> >	args.name = "test1";
>> >	args.age = 12;
>> >	args.school = "D Burg High School";
>> >	auto obj1 = new C(args);
>> >
>> >	args.name = "test2";
>> >	auto obj2 = new C(args);
>> >
>> >	args.name = "test3";
>> >	auto obj3 = new C(args);
>> >
>> >	... // etc.
>>
>> Or.. you use a mixin, or better still you add a copy-constructor or
>> .dup method to your class to duplicate it :)
>
> But then you end up with the problem of needing to call set methods
> after the .dup

Which is no different to setting args.name beforehand, the same number of assignments.  In the example above it's N+1 assignments, N args or dup'ed members and 1 more for 'name' before or after the construction.

> which may complicate things if the set methods need to
> do non-trivial initialization of internal structures (caches or internal
> representations, etc.).

Ahh, yes, and in this case you'd want to use the idea below, where you call a method to set the common parts and manually set the differences.

> Whereas if you hadn't needed to .dup, you could
> have gotten by without writing any set methods for your class, but now
> you have to.

create-set-call <- 'set' is kinda an integral part of the whole thing :P

> [...]
>> In my case you can call different functions in the initialisation
>> block, e.g.
>>
>> void defineObject(C c)
>> {
>>   c.school = "...);
>> }
>>
>> C c = new C() {
>>   defineObject()
>> }
>>
>> :)
>
> So the compiler has to recursively traverse function calls in the
> initialization block in order to check that all required fields are set?

Yes.  This was an off the cuff idea, but it /is/ a natural extension of the idea for the compiler to traverse the setters called inside the initialisation block, and ctors in the hierarchy, etc.

> That could have entail some implementational issues, if said function
> calls can be arbitrarily complex. (If you have complex control logic in
> said functions, the compiler can't in general determine whether or not
> some paths will/will not be taken that may assignment statements to the
> object's fields, since that would be equivalent to the halting problem.

All true.  The compiler has a couple of options to (re)solve these issues:
1. It could simply baulk at the complexity and error.
2. It could take the safe route and assume those member assignments it cannot verify are uninitialised, forcing manual init.

In fact, erroring at complexity might make for better code in many ways.  You would have to perform your complex initialisation beforehand, store the result in a variable, and then construct/initblock your object.

It does limit your choice of style, but create-set-call already does that .. and I'm not immediately against style limitations assuming they actually result in better code.

> Worse, the compiler would have to track aliases of the object being set,
> in order to know which assignment statements are setting fields in the
> object, and which are just computations on the side.)

No, aliasing would simply be ignored.  In fact, calling a setter on another object in an initblock should probably be an error.  Part of the whole "don't mix initialisation" goal I started with.  It does require strict properties.

> Furthermore, what if defineObject tries to do something with C other
> than setting up fields? The object would be in an illegal state since it
> hasn't been fully constructed yet.

That's an error.  This is why in my initial post I stated that we'd need explicit/well defined properties.  All you would be allowed to call in an initialisation block, on the object being initialised, are setter properties.. and possibly methods or free function which only call setter properties.

>> >>I think another interesting idea is using the builder pattern with
>> >>create-set-call objects.
>> >>
>> >>For example, a builder template class could inspect the object for
>> >>UDA's indicating a data member which is required during
>> >>initialisation.  It would contain a bool[] to flag each member as
>> >>not/initialised and expose a setMember() method which would call the
>> >>underlying object setMember() and return a reference to itself.
>> >>
>> >>At some point, these setMember() method would want to return another
>> >>template class which contained just a build() member.  I'm not sure
>> >>how/if this is possible in D.
>> >[...]
>> >
>> >Hmm, this is an interesting idea indeed. I think it may be possible to
>> >implement in the current language.
>>
>> The issue I think is the step where you want to mutate the return
>> type from the type with setX members to the type with build().
>
> I'm not sure I understand that sentence. Could you rephrase it?

I am imagining using a template to create a type which wraps the original object.  The created type would expose setter properties for all the mandatory members, and nothing else.  The user would call these setters, using UFCS/chain style, however, only after setting all the mandatory properties do we want to expose an additional member called build() which returns the constructed/initialised object.

So, an example:

class Foo {...}

auto f = Builder!(Foo)().setName("Regan").setAge(33).build();

The type of the object returned from the Builder!(Foo) is our first created type, which exposes setName() and setAge(), however the type returned from setAge (or whichever member assignment is done last) is the second created type, which either has all the set.. members plus build() or only build().  The build() method returns a Foo.

So, the type of 'f' above is Foo.

The goal here is to make build() statically available when Foo is completely initialised and not before.  Of course we could simplify all this by making it available immediately and throwing if some members are uninitialised - but that is a runtime check and I was angling for a compile time one.

If you wanted to enforce a specific init ordering you could even produce a separate type containing only the next member to init, and from each setter return the next type in sequence - like a type state machine :p

The template bloat however..

> The problem with the struct approach is, what if you need a complex
> setup process, say constructing a graph with complex interconnections
> between nodes? In order to express such a thing, you have to essentially
> already create the object before you can pass the struct to the ctor,
> which kinda defeats the purpose. Similarly, your approach of an
> initialization block suffers from the limitation that the initialization
> is confined to that block, and you can't allow arbitrary code in that
> block (otherwise you could end up using an object that hasn't been fully
> constructed yet -- like the defineObject problem I pointed out above).

Yes, neither idea works for all possible use-cases.  Yours is naturally broader and less limiting because I was starting from a limited create-set-call style and imposing further limitation on how it can be used.

> Keeping in mind the create-set-call pattern and Perl's approach of
> "blessing" an object into a full-fledged class instance, I wonder if a
> more radical approach might be to have the language acknowledge that
> objects have two phases, a preinitialized state, and a fully-initialized
> state. These two would have distinct types *in the type system*, such
> that you cannot, for example, call post-init methods on a
> pre-initialization object, and you can't call an init method on a
> post-initialization object.

That is essentially the same idea as the builder template solution I talk about above :)

> The ctor would be the unique transition
> point which takes a preinitialized object, verifies compliance with
> class invariants, and returns a post-initialization object.

AKA build() above :)

R

-- 
Using Opera's revolutionary email client: http://www.opera.com/mail/

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation