Interesting Research Paper on Constructors in OO Languages - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » Interesting Research Paper on Constructors in OO Languages

Thread overview

Interesting Research Paper on Constructors in OO Languages
Jul 15, 2013 Meta
Jul 15, 2013 H. S. Teoh
Jul 16, 2013 Meta
Jul 16, 2013 H. S. Teoh
Jul 16, 2013 Dicebot
Jul 16, 2013 H. S. Teoh
Jul 16, 2013 H. S. Teoh
Jul 16, 2013 Jacob Carlborg
Jul 16, 2013 deadalnix
Jul 17, 2013 Ali Çehreli
Jul 16, 2013 Regan Heath
Jul 16, 2013 Craig Dillabaugh
Jul 16, 2013 Dicebot
Jul 16, 2013 Wyatt
Jul 16, 2013 Craig Dillabaugh
Jul 16, 2013 Craig Dillabaugh
Jul 16, 2013 Regan Heath
Jul 16, 2013 H. S. Teoh
Jul 17, 2013 Regan Heath
Jul 17, 2013 H. S. Teoh
Jul 17, 2013 w0rp
Jul 17, 2013 H. S. Teoh
Jul 17, 2013 eles
Jul 17, 2013 eles
Jul 18, 2013 Regan Heath
Jul 18, 2013 H. S. Teoh
Jul 19, 2013 Regan Heath
Jul 16, 2013 Jérôme M. Berger
Jul 17, 2013 Regan Heath

July 15, 2013

Interesting Research Paper on Constructors in OO Languages

Posted by Meta

Meta

I saw an interesting post on Hacker News about constructors in OO languages. Apparently they are a real stumbling block for some programmers, which was quite a surprise to me. I think this might be relevant to a discussion about named parameters and whether we should ditch constructors for another kind of construct.

Link to the newsgroup post, the link to the paper is near the top:
http://erlang.org/pipermail/erlang-questions/2012-March/065519.html

July 15, 2013

Re: Interesting Research Paper on Constructors in OO Languages

Posted by H. S. Teoh
in reply to Meta

H. S. Teoh

Posted in reply to Meta

On Mon, Jul 15, 2013 at 09:06:38PM +0200, Meta wrote:
> I saw an interesting post on Hacker News about constructors in OO languages. Apparently they are a real stumbling block for some programmers, which was quite a surprise to me. I think this might be relevant to a discussion about named parameters and whether we should ditch constructors for another kind of construct.
> 
> Link to the newsgroup post, the link to the paper is near the top: http://erlang.org/pipermail/erlang-questions/2012-March/065519.html

Thanks for the link; this touches on one of my pet peeves about OO libraries: constructors.

I consider myself to be a "systematic" programmer (according to the definition in the paper); I can work equally well with ctors with arguments vs. create-set-call objects. But I find that mandatory ctors with arguments are a pain to work with, *both* to write and to use.

On the usability side, there's the mental workload of having to remember which order the arguments appear in (or look it up in the IDE, or whatever -- the point is that I can't just type the ctor call straight from my head). Then there's the problem of needing to create objects required by the ctor before you can call the ctor. In some cases, this can be inconvenient -- I always have to remember to setup and create other objects before I can create this one, because its ctor requires said objects as arguments. Then there's the lack of flexibility: no matter what you do, it seems that anything that requires more than a single ctor argument inevitably becomes either (1) too complex, requiring too many arguments, and therefore very difficult to use, or (2) too simplistic, and therefore unable to do some things that I may want to do (e.g. some fields are default-initialized with no way to specify the initial values of the fields, 'cos otherwise the ctor would have too many arguments). No matter what you do, it seems almost impossible to come up with an ideal ctor except in trivial cases where it requires only 1 argument or is a default ctor.

On the writability side, one of my pet peeves is base class ctors that require multiple arguments. Every level of inheritance inevitably adds more arguments each time, and by the time you're 5-6 levels down the class hierarchy, your ctor calls just have an unmanageable number of parameters. Not to mention the violation of DRY by requiring much redundant typing just to pass arguments from the inherited class' ctor up the class hierarchy. Tons of bugs to be had everywhere, given the amount of repeated typing needed.

In the simplest cases, of course, these aren't big issues, but this kind of ctor design is clearly not scalable.

OTOH, the create-set-call pattern isn't panacea either. One of the biggest problems with this pattern is that you can't guarantee your objects are in a consistent state at all times. This is very bad, because all your methods will have to check if some value has been set yet, before it uses it. This adds a lot of complexity that could've been avoided had everything been set at ctor-time. This also makes class invariants needlessly complex. Moreover, I've seen many classes in this category exhibit undefined behaviour if you call a value-setting method after you start using the object. Too many classes falsely assume that you will always call set methods and then "use" methods in that order. If you call a set method after calling a "use" method, you're quite likely to run into bugs in the class, e.g. part of the object's state doesn't reflect the new value you set, because the "use" methods were written with the assumption that when they were called the first time, the values you set earlier won't change thereafter.

I've always found Perl's approach a more balanced way to tackle this problem (even though Perl's OO system as a whole suffers from other, shall we say, idiosyncrasies). In Perl, objects start out as arbitrary key-value pairs, and nothing differentiates them from a regular AA until you call the 'bless' built-in function on them, at which point they become "officially" a member of some particular class. This neatly sidesteps the whole ctor mess: you can initialize the initial AA with whatever values you want, in whatever order you want. When you finally "kicked it into shape", as the cited paper puts it, you "promote" that set of key-value pairs into an "official" member of the class, and thereafter, you can't simply modify fields anymore except through class methods. This means you now have the possibility of enforcing invariants on the object without crippling the flexibility of constructing it. (Well, OK, in Perl, this last bit isn't necessarily true, but in an ideal implementation of this initialize-bless-use approach, the object's fields would become non-public after being blessed and can only be updated by "official" object methods.)

In the spirit of this approach, I've written some C++ code in the past that looked something like this:

	class BaseClass {
	public:
		// Encapsulate ctor arguments
		struct Args {
			int baseparm1, baseparm2;
		};
		BaseClass(Args args) {
			// initialize object based on fields in
			// BaseClass::Args.
		}
	};

	class MyClass : public BaseClass {
	public:
		// Encapsulate ctor arguments
		struct Args : BaseClass::Args {
			int parm1, parm2;
		};

		MyClass(Args args) : BaseClass(args) {
			// initialize object based on fields in args
		}
	};

Basically, the Args structs let the user set up whatever values they want to, in whatever order they wish, then they are "blessed" into real class instances by the ctor. Encapsulating ctor arguments in these structs alleviates the problem of proliferating ctor arguments as the class hierarchy grows: each derived class simply hands off the Args struct (which is itself in a hierarchy that parallels that of the classes) to the base class ctor. All ctors in the class hierarchy needs only a single (polymorphic) argument.

This approach also localizes the changes required when you modify base class arguments -- in the old way of having multiple ctor arguments, adding or changing arguments to the base class ctor requires you to update every single derived class ctor accordingly -- very bad. But here, adding a new field to BaseClass::Args requires zero changes to all derived classes, which is a Good Thing(tm).

In some cases, if the class in relatively simple, the private members of the class can simply be themselves an instance of the Args struct, so the ctor could be nothing more than just:

	MyClass(Args args) : BaseClass(args), myArgs(args) {}

which gets rid of that silly baroque dance of naming ctor arguments as _a, _b, _c, then writing in the ctor body a=_a, b=_b, c=_c (which can be rather error prone if you mistype a _ somewhere or forget to assign one of the members). Since the private copy of Args is not accessible from outside, class methods can use the values freely without having to worry about inconsistent states -- the ctor can check class invariants before creating the class object, ensuring that the internal copy of Args is in a consistent state.

The Args structs themselves, of course, can have ctors that setup sane default values for each field, so that lazy users can simply call:

	MyClass *obj = new MyClass(MyClass::Args());

and get a working, consistent class object with default settings. This way of setting default values also lets the user only change fields that they don't want to use default values for, rather than be constricted by the order of ctor default arguments: if you're unlucky enough to need a non-default value in a later parameter, you're forced to repeat the default values for everything that comes before it.

In D, this approach isn't quite as nice, because D structs don't have inheritance, so you can't simply pass Args from derived class to base class. You'd have to explicitly do something like:

	class BaseClass {
	public:
		struct Args { ...  }
		this(Args args) { ... }
	}

	class MyClass {
	public:
		struct Args {
			BaseClass.Args base;	// <-- explicit inclusion of BaseClass.Args
			...
		}
		this(Args args) {
			super(args.base);	// <-- more verbose than just super(args);
			...
		}
	}

Initializing the args also isn't as nice, since user code will have to know exactly which fields are in .base and which aren't. You can't just write, like in C++:

	// C++
	MyClass::Args args;
	args.basefield1 = 123;
	args.field2 = 321;

you'd have to write, in D:

	// D
	MyClass.Args args;
	args.base.basefield1 = 123;
	args.field2 = 321;

which isn't as nice in terms of encapsulation, since ideally user code should need to care about the exact boundaries between base class and derived class.

I haven't really thought about how this might be made nicer in D, though.

T

-- 
I am Ohm of Borg. Resistance is voltage over current.

July 16, 2013

Re: Interesting Research Paper on Constructors in OO Languages

Posted by Meta
in reply to H. S. Teoh

Meta

Posted in reply to H. S. Teoh

On Monday, 15 July 2013 at 22:29:14 UTC, H. S. Teoh wrote:
> I consider myself to be a "systematic" programmer (according to the
> definition in the paper); I can work equally well with ctors with
> arguments vs. create-set-call objects. But I find that mandatory ctors
> with arguments are a pain to work with, *both* to write and to use.

I also find constructors with multiple arguments a pain to use. They get difficult to maintain as your project grows. One of my pet projects has a very shallow class hierarchy, but the constructors of each object down the tree have many arguments, with descendants adding on even more. It gets to be a real headache when you have more than 3 constructors per class to deal with base class overloads, multiple arguments, etc.

> On the usability side, there's the mental workload of having to remember
> which order the arguments appear in (or look it up in the IDE, or
> whatever -- the point is that I can't just type the ctor call straight
> from my head). Then there's the problem of needing to create objects
> required by the ctor before you can call the ctor. In some cases, this
> can be inconvenient -- I always have to remember to setup and create
> other objects before I can create this one, because its ctor requires
> said objects as arguments. Then there's the lack of flexibility: no
> matter what you do, it seems that anything that requires more than a
> single ctor argument inevitably becomes either (1) too complex,
> requiring too many arguments, and therefore very difficult to use, or
> (2) too simplistic, and therefore unable to do some things that I may
> want to do (e.g. some fields are default-initialized with no way to
> specify the initial values of the fields, 'cos otherwise the ctor would
> have too many arguments). No matter what you do, it seems almost
> impossible to come up with an ideal ctor except in trivial cases where
> it requires only 1 argument or is a default ctor.

Having to create other objects to pass to a constructor is particularly painful. You'd better pray that they have trivial constructors, or else things can get hairy really fast. Multiple nested constructors can also create a large amount of code bloat. Once the constructor grows large enough, I generally put each argument on its own line to ensure that it's clear what I'm calling it with. This has the unfortunate side effect of making the call span multiple lines. In my opinion, a constructor requiring more than 10 lines is an unsightly abomination.

> On the writability side, one of my pet peeves is base class ctors that
> require multiple arguments. Every level of inheritance inevitably adds
> more arguments each time, and by the time you're 5-6 levels down the
> class hierarchy, your ctor calls just have an unmanageable number of
> parameters. Not to mention the violation of DRY by requiring much
> redundant typing just to pass arguments from the inherited class' ctor
> up the class hierarchy. Tons of bugs to be had everywhere, given the
> amount of repeated typing needed.
>
> In the simplest cases, of course, these aren't big issues, but this kind
> of ctor design is clearly not scalable.
>
> OTOH, the create-set-call pattern isn't panacea either. One of the
> biggest problems with this pattern is that you can't guarantee your
> objects are in a consistent state at all times. This is very bad,
> because all your methods will have to check if some value has been set
> yet, before it uses it. This adds a lot of complexity that could've been
> avoided had everything been set at ctor-time. This also makes class
> invariants needlessly complex. Moreover, I've seen many classes in this
> category exhibit undefined behaviour if you call a value-setting method
> after you start using the object. Too many classes falsely assume that
> you will always call set methods and then "use" methods in that order.
> If you call a set method after calling a "use" method, you're quite
> likely to run into bugs in the class, e.g. part of the object's state
> doesn't reflect the new value you set, because the "use" methods were
> written with the assumption that when they were called the first time,
> the values you set earlier won't change thereafter.

I've found that a good way to keep constructors manageable is to use the builder pattern. Create a builder object that has its fields set by the programmer, which is then passed to the 'real' object for construction. You can provide default arguments, optional arguments, etc. Combine this with a fluid interface and I think it looks a lot better. Of course, this has the disadvantage of requiring a *lot* of boilerplate, but I think this could be okay in D, as a builder class is exactly the kind of thing that can be automatically generated.

> I've always found Perl's approach a more balanced way to tackle this
> problem (even though Perl's OO system as a whole suffers from other,
> shall we say, idiosyncrasies). In Perl, objects start out as arbitrary
> key-value pairs, and nothing differentiates them from a regular AA until
> you call the 'bless' built-in function on them, at which point they
> become "officially" a member of some particular class. This neatly
> sidesteps the whole ctor mess: you can initialize the initial AA with
> whatever values you want, in whatever order you want. When you finally
> "kicked it into shape", as the cited paper puts it, you "promote" that
> set of key-value pairs into an "official" member of the class, and
> thereafter, you can't simply modify fields anymore except through class
> methods. This means you now have the possibility of enforcing invariants
> on the object without crippling the flexibility of constructing it.
> (Well, OK, in Perl, this last bit isn't necessarily true, but in an
> ideal implementation of this initialize-bless-use approach, the object's
> fields would become non-public after being blessed and can only be
> updated by "official" object methods.)
>
> In the spirit of this approach, I've written some C++ code in the past
> that looked something like this:
>
> 	class BaseClass {
> 	public:
> 		// Encapsulate ctor arguments
> 		struct Args {
> 			int baseparm1, baseparm2;
> 		};
> 		BaseClass(Args args) {
> 			// initialize object based on fields in
> 			// BaseClass::Args.
> 		}
> 	};
>
> 	class MyClass : public BaseClass {
> 	public:
> 		// Encapsulate ctor arguments
> 		struct Args : BaseClass::Args {
> 			int parm1, parm2;
> 		};
>
> 		MyClass(Args args) : BaseClass(args) {
> 			// initialize object based on fields in args
> 		}
> 	};
>
> Basically, the Args structs let the user set up whatever values they
> want to, in whatever order they wish, then they are "blessed" into real
> class instances by the ctor. Encapsulating ctor arguments in these
> structs alleviates the problem of proliferating ctor arguments as the
> class hierarchy grows: each derived class simply hands off the Args
> struct (which is itself in a hierarchy that parallels that of the
> classes) to the base class ctor. All ctors in the class hierarchy needs
> only a single (polymorphic) argument.
>
> This approach also localizes the changes required when you modify base
> class arguments -- in the old way of having multiple ctor arguments,
> adding or changing arguments to the base class ctor requires you to
> update every single derived class ctor accordingly -- very bad. But
> here, adding a new field to BaseClass::Args requires zero changes to all
> derived classes, which is a Good Thing(tm).
>
> In some cases, if the class in relatively simple, the private members of
> the class can simply be themselves an instance of the Args struct, so
> the ctor could be nothing more than just:
>
> 	MyClass(Args args) : BaseClass(args), myArgs(args) {}
>
> which gets rid of that silly baroque dance of naming ctor arguments as
> _a, _b, _c, then writing in the ctor body a=_a, b=_b, c=_c (which can be
> rather error prone if you mistype a _ somewhere or forget to assign one
> of the members). Since the private copy of Args is not accessible from
> outside, class methods can use the values freely without having to worry
> about inconsistent states -- the ctor can check class invariants before
> creating the class object, ensuring that the internal copy of Args is in
> a consistent state.
>
> The Args structs themselves, of course, can have ctors that setup sane
> default values for each field, so that lazy users can simply call:
>
> 	MyClass *obj = new MyClass(MyClass::Args());
>
> and get a working, consistent class object with default settings. This
> way of setting default values also lets the user only change fields that
> they don't want to use default values for, rather than be constricted by
> the order of ctor default arguments: if you're unlucky enough to need a
> non-default value in a later parameter, you're forced to repeat the
> default values for everything that comes before it.
>
> In D, this approach isn't quite as nice, because D structs don't have
> inheritance, so you can't simply pass Args from derived class to base
> class. You'd have to explicitly do something like:
>
> 	class BaseClass {
> 	public:
> 		struct Args { ...  }
> 		this(Args args) { ... }
> 	}
>
> 	class MyClass {
> 	public:
> 		struct Args {
> 			BaseClass.Args base;	// <-- explicit inclusion of BaseClass.Args
> 			...
> 		}
> 		this(Args args) {
> 			super(args.base);	// <-- more verbose than just super(args);
> 			...
> 		}
> 	}
>
> Initializing the args also isn't as nice, since user code will have to
> know exactly which fields are in .base and which aren't. You can't just
> write, like in C++:
>
> 	// C++
> 	MyClass::Args args;
> 	args.basefield1 = 123;
> 	args.field2 = 321;
>
> you'd have to write, in D:
>
> 	// D
> 	MyClass.Args args;
> 	args.base.basefield1 = 123;
> 	args.field2 = 321;
>
> which isn't as nice in terms of encapsulation, since ideally user code
> should need to care about the exact boundaries between base class and
> derived class.
>
> I haven't really thought about how this might be made nicer in D,
> though.
>
>
> T

See above, this is basically the builder pattern. It's a neat trick, giving your args objects a class hierarchy of their own. I think that one drawback of that, however, is that now you have to maintain *two* class hierarchies. Have you found this to be a problem in practice?

As an aside, you could probably simulate the inheritance of the args objects in D either with alias this or even opDispatch. Still, this means that you need to nest the structs within each-other, and this could get silly after 2-3 "generations" of args objects.

July 16, 2013

Re: Interesting Research Paper on Constructors in OO Languages

Posted by Jacob Carlborg
in reply to H. S. Teoh

Jacob Carlborg

Posted in reply to H. S. Teoh

On 2013-07-16 00:27, H. S. Teoh wrote:

> In the spirit of this approach, I've written some C++ code in the past
> that looked something like this:
>
> 	class BaseClass {
> 	public:
> 		// Encapsulate ctor arguments
> 		struct Args {
> 			int baseparm1, baseparm2;
> 		};
> 		BaseClass(Args args) {
> 			// initialize object based on fields in
> 			// BaseClass::Args.
> 		}
> 	};
>
> 	class MyClass : public BaseClass {
> 	public:
> 		// Encapsulate ctor arguments
> 		struct Args : BaseClass::Args {
> 			int parm1, parm2;
> 		};
>
> 		MyClass(Args args) : BaseClass(args) {
> 			// initialize object based on fields in args
> 		}
> 	};
>
> Basically, the Args structs let the user set up whatever values they
> want to, in whatever order they wish, then they are "blessed" into real
> class instances by the ctor. Encapsulating ctor arguments in these
> structs alleviates the problem of proliferating ctor arguments as the
> class hierarchy grows: each derived class simply hands off the Args
> struct (which is itself in a hierarchy that parallels that of the
> classes) to the base class ctor. All ctors in the class hierarchy needs
> only a single (polymorphic) argument.

That's actually quite cleaver.

> In D, this approach isn't quite as nice, because D structs don't have
> inheritance, so you can't simply pass Args from derived class to base
> class. You'd have to explicitly do something like:
>
> 	class BaseClass {
> 	public:
> 		struct Args { ...  }
> 		this(Args args) { ... }
> 	}
>
> 	class MyClass {
> 	public:
> 		struct Args {
> 			BaseClass.Args base;	// <-- explicit inclusion of BaseClass.Args
> 			...
> 		}
> 		this(Args args) {
> 			super(args.base);	// <-- more verbose than just super(args);
> 			...
> 		}
> 	}
>
> Initializing the args also isn't as nice, since user code will have to
> know exactly which fields are in .base and which aren't. You can't just
> write, like in C++:
>
> 	// C++
> 	MyClass::Args args;
> 	args.basefield1 = 123;
> 	args.field2 = 321;
>
> you'd have to write, in D:
>
> 	// D
> 	MyClass.Args args;
> 	args.base.basefield1 = 123;
> 	args.field2 = 321;
>
> which isn't as nice in terms of encapsulation, since ideally user code
> should need to care about the exact boundaries between base class and
> derived class.
>
> I haven't really thought about how this might be made nicer in D,
> though.

On the other hand D supports the following syntax:

MyClass.Args args = { field1: 1, field2: 2 };

Unfortunately that syntax doesn't work for function calls.

-- 
/Jacob Carlborg

July 16, 2013

Re: Interesting Research Paper on Constructors in OO Languages

Posted by H. S. Teoh
in reply to Meta

H. S. Teoh

Posted in reply to Meta

On Tue, Jul 16, 2013 at 03:54:27AM +0200, Meta wrote:
> On Monday, 15 July 2013 at 22:29:14 UTC, H. S. Teoh wrote:
> >I consider myself to be a "systematic" programmer (according to the definition in the paper); I can work equally well with ctors with arguments vs. create-set-call objects. But I find that mandatory ctors with arguments are a pain to work with, *both* to write and to use.
> 
> I also find constructors with multiple arguments a pain to use. They get difficult to maintain as your project grows. One of my pet projects has a very shallow class hierarchy, but the constructors of each object down the tree have many arguments, with descendants adding on even more. It gets to be a real headache when you have more than 3 constructors per class to deal with base class overloads, multiple arguments, etc.

Yeah, when every level of the hierarchy introduces 2-3 new overloads of the ctor, you get an exponential explosion of derived class ctors if you want to account for all possibilities. Most of the time, you just end up oversimplifying 'cos anything else is simply unmanageable.

[...]
> Having to create other objects to pass to a constructor is particularly painful. You'd better pray that they have trivial constructors, or else things can get hairy really fast. Multiple nested constructors can also create a large amount of code bloat. Once the constructor grows large enough, I generally put each argument on its own line to ensure that it's clear what I'm calling it with. This has the unfortunate side effect of making the call span multiple lines. In my opinion, a constructor requiring more than 10 lines is an unsightly abomination.

I usually bail out way before then. :) A 10-line ctor call is just unpalatable.

[...]
> I've found that a good way to keep constructors manageable is to use the builder pattern. Create a builder object that has its fields set by the programmer, which is then passed to the 'real' object for construction. You can provide default arguments, optional arguments, etc. Combine this with a fluid interface and I think it looks a lot better. Of course, this has the disadvantage of requiring a *lot* of boilerplate, but I think this could be okay in D, as a builder class is exactly the kind of thing that can be automatically generated.

In my C++ version of this, you could even just reuse the builder object directly, since it's just a struct containing ctor arguments. But yeah, there's some amount boilerplate necessary.

[...]
> >In the spirit of this approach, I've written some C++ code in the past that looked something like this:
> >
> >	class BaseClass {
> >	public:
> >		// Encapsulate ctor arguments
> >		struct Args {
> >			int baseparm1, baseparm2;
> >		};
> >		BaseClass(Args args) {
> >			// initialize object based on fields in
> >			// BaseClass::Args.
> >		}
> >	};
> >
> >	class MyClass : public BaseClass {
> >	public:
> >		// Encapsulate ctor arguments
> >		struct Args : BaseClass::Args {
> >			int parm1, parm2;
> >		};
> >
> >		MyClass(Args args) : BaseClass(args) {
> >			// initialize object based on fields in args
> >		}
> >	};
[...]
> See above, this is basically the builder pattern. It's a neat trick, giving your args objects a class hierarchy of their own. I think that one drawback of that, however, is that now you have to maintain *two* class hierarchies. Have you found this to be a problem in practice?

Well, there *is* a certain amount of boilerplate, to be sure, so it isn't a perfect solution. But nesting the structs inside the class they correspond with helps to prevent mismatches between the two hierarchies. It also allows reusing the name "Args" so that you don't have to invent a whole new set of names just for these builders. Minimizing these differences makes it less likely to make a mistake and inherit Args from the wrong base class, for example.

In fact, now that I think of this, in D this could actually work out even better, since you could just write:

	class MyClass : BaseClass {
	public:
		class Args : typeof(super).Args {
			int parm1 = 1;
			int parm2 = 2;
		}

		this(Args args) {
			super(args);
			...
		}
	}

The compile-time introspection allows you to just write "class Args : typeof(super).Args" consistently for all such builders, so you never have to worry about inventing new names or mismatches in the two hierarchies. The "typeof(super).Args" will automatically pick up the correct base class Args to inherit from, even if you shuffle the classes around the hierarchy. Furthermore, since the declaration is exactly identical across the board (except for the actual fields), you could just factor this into a mixin and thereby minimize the boilerplate.

The only major disadvantage in the D version is that you can't use structs, but you have to allocate the Args objects on the GC heap, so you may end up generating lots of GC garbage. If only D structs had inheritance, this would've been a much cleaner solution.

> As an aside, you could probably simulate the inheritance of the args objects in D either with alias this or even opDispatch. Still, this means that you need to nest the structs within each-other, and this could get silly after 2-3 "generations" of args objects.

Hmm. This is a good idea! And with a mixin, this may not turn out so bad after all. Maybe start with something like this:

	class BaseClass {
	public:
		struct Args {
			int baseparm1 = 1;
			int baseparm2 = 2;
			...
		}
	}

	class MyClass : BaseClass {
	public:
		struct Args {
			typeof(super).Args base;
			alias base this;

			int parm1 = 1;
			int parm2 = 2;
			...
		}
		this(Args args) {
			super(args);	// works 'cos of alias this
		}
	}

	void main() {
		MyClass.Args args;
		args.baseparm1 = 2;	// works 'cos of alias this
		args.parm1 = 3;
		auto obj = new MyClass(args);
	}

Using alias this, we have the nice effect that user code no longer needs to refer to the .base member of the structs, and indeed, doesn't need to know about it. So this is effectively like struct inheritance... heh, cool. Just discovered a new trick in D: struct inheritance using alias this. :)

The boilerplate can be put into a mixin, say something like this:

	mixin template BuilderArgs(string fields) {
		struct Args {
			typeof(super).Args base;
			alias base this;
			mixin(fields);
		}
	};

	class MyClass : BaseClass {
	public:
		// Hmm, doesn't look too bad!
		mixin BuilderArgs!(q{
			int parm1 = 1;
			int parm2 = 2;
		});
		this(Args args) {
			super(args);
			...
		}
	}

	class AnotherClass : BaseClass {
	public:
		// N.B. Looks exactly the same like MyClass.args except
		// for the fields! The template automatically picks up
		// the right base class Args to "inherit" from.
		mixin BuilderArgs!(q{
			string anotherparm1 = "abc";
			string anotherparm2 = "def";
		});
		this(Args args) {
			super(args);
			...
		}
	}

Not bad at all!  Though, I haven't actually tested any of this code, so I've no idea if it will actually work yet. But it certainly looks promising! I'll give it a spin tomorrow morning (way past my bedtime now).

T

-- 
Meat: euphemism for dead animal. -- Flora

July 16, 2013

Re: Interesting Research Paper on Constructors in OO Languages

Posted by deadalnix
in reply to H. S. Teoh

deadalnix

Posted in reply to H. S. Teoh

My policy is to require the bare minimum to construct a valid object, in order to avoid initialization hell.

Not knowing what/when to initialize thing is really painful as well. It also introduce sequential coupling and wrongly initialized object tends to explode far away from their construction point.

What goes in this category ? Any state that can't have any default value that make sense, as well as any state that is expansive to initialize.

July 16, 2013

Re: Interesting Research Paper on Constructors in OO Languages

Posted by Dicebot
in reply to H. S. Teoh

Dicebot

Posted in reply to H. S. Teoh

On Tuesday, 16 July 2013 at 08:19:10 UTC, H. S. Teoh wrote:
> Just discovered a new trick in D: struct inheritance using alias
> this. :)

Wasn't this stated in TDPL as one of primary design rationales behind "alias this"? :)

July 16, 2013

Re: Interesting Research Paper on Constructors in OO Languages

Posted by Regan Heath
in reply to Meta

Regan Heath

Posted in reply to Meta

On Mon, 15 Jul 2013 20:06:38 +0100, Meta <jared771@gmail.com> wrote:

> I saw an interesting post on Hacker News about constructors in OO languages. Apparently they are a real stumbling block for some programmers, which was quite a surprise to me. I think this might be relevant to a discussion about named parameters and whether we should ditch constructors for another kind of construct.
>
> Link to the newsgroup post, the link to the paper is near the top:
> http://erlang.org/pipermail/erlang-questions/2012-March/065519.html

First thought;  constructors with positional arguments aren't any different to methods or functions with positional arguments WRT remembering the arguments.  The difficulties with one are the same as with another - you need to remember them, or look them up, or get help from intellisense.

I think the point about constructed objects being in valid states is the important one.  If the object requires N arguments which cannot be sensibly defaulted, then IMO they /have/ to be specified at construction, and should not be delayed as in the create-set-call style mentioned.

Granted, A create-set-call style object could throw detailed/useful messages when used before initialisation, but that's a runtime check so IMO not a great solution to the issue.

Also, I find compelling the issue that a create-set-call style object with N required set calls could be initialised N! ways, and that each of these different orderings have effectively the same semantic meaning.. so it becomes a lot harder to see what is really happening.  Add to that, that someone could interleave the initialisation of another object into the first and .. well .. shudder.

So, given the desire to have objects constructed in a valid state, and given the restriction that this may require N arguments which cannot be defaulted how do you alleviate the problem of having to remember the parameters required and the ordering of those parameters?

Named parameters only help up to a point.  Like ordered parameters you need to remember which parameters are required, all that has changed is that instead of remembering their order you have to remember their names.  So, IMO this doesn't really solve the problem at all.

A lot can be done with sufficiently clever intellisense in either case (ordered/named parameters), but is there anything which can be done without it using just a text editor and compiler?

Or, perhaps another way to ask a similar W is.. can the compiler statically verify that a create-set-call style object has been initialised, or rather that an attempt has at least been made to initialise all the required parts.

We have class invariants.. these define the things which must be initialised to reach a valid state.  If we had compiler recognisable properties as well, then we could have an initialise construct like..

class Foo
{
  string name;
  int age;

  invariant
  {
    assert(name != null);
    assert(age > 0);
  }

  property string Name...
  property int Age...
}

void main()
{
  Foo f = new Foo() {
    Name = "test",    // calls property Name setter
    Age = 12          // calls property Age setter
  };
}

The compiler could statically verify that the variables tested in the invariant (name, age) were set (by setter properies) inside the initialise construct {} following the new Foo().

R

-- 
Using Opera's revolutionary email client: http://www.opera.com/mail/

July 16, 2013

Re: Interesting Research Paper on Constructors in OO Languages

Posted by Craig Dillabaugh
in reply to Regan Heath

Craig Dillabaugh

Posted in reply to Regan Heath

On Tuesday, 16 July 2013 at 09:47:35 UTC, Regan Heath wrote:

clip

>
> We have class invariants.. these define the things which must be initialised to reach a valid state.  If we had compiler recognisable properties as well, then we could have an initialise construct like..
>
> class Foo
> {
>   string name;
>   int age;
>
>   invariant
>   {
>     assert(name != null);
>     assert(age > 0);
>   }
>
>   property string Name...
>   property int Age...
> }
>
> void main()
> {
>   Foo f = new Foo() {
>     Name = "test",    // calls property Name setter
>     Age = 12          // calls property Age setter
>   };
> }
>
> The compiler could statically verify that the variables tested in the invariant (name, age) were set (by setter properies) inside the initialise construct {} following the new Foo().
>
> R

How do you envision this working where Name or Age must be set to
a value not known at compile time?

July 16, 2013

Re: Interesting Research Paper on Constructors in OO Languages

Posted by Dicebot
in reply to Craig Dillabaugh

Dicebot

Posted in reply to Craig Dillabaugh

On Tuesday, 16 July 2013 at 13:35:00 UTC, Craig Dillabaugh wrote:
> How do you envision this working where Name or Age must be set to
> a value not known at compile time?

Contracts are run-time entities (omitted in release AFAIR).

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation