Thread overview | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
April 13, 2003 Commentary on D (or "Using Sather as a Model") | ||||
---|---|---|---|---|
| ||||
To the D designers, I have been looking at D for a while now. As I have been considering a similar venture for almost eight years now (ever since I started working with C++), I thought I might make some commentary. Surely the group receives a lot of similar language suggestions, so for that reason I will try be as clear in my points as possible, including ramifications. Always in my head I have had the concept of a theoretical language called Dali (Salvatore) which has the majority of its goals common with D. Over time, investigating such languages as Java, Prolog, Haskell, and Eiffel, I have found many features which are desirable. (Desirability being a function of how useful these features are to me personally, and how I see these features becoming mainstream in toolkits like STL, and boost.) My initial complaint with D is one that I have seen voiced before; I won't dwell on it: D doesn't present anything new. I group it somewhere between Java/C# and C++ in the space of languages. I have a feeling it will remain an isolated niche as long as it remains in this space. However I still have a dream of Dali becoming a reality. (The difference is that D has an implementation seems to be actively fishing for ideas.) I will praise the addition of contracts and versioning to the language as these are excellent paradigms, although I would extend the latter. So, on to my ideas. None of them are really novel, being inspired by other languages, and I will try to present them in some sort of order. AN EXAMPLE: ITERATORS / CURSORS There is explicit mention of this on the "Future" listing on the D home page, and I think this is my most critical point. In C++ (and apparently in D as well) a complex container which provides an iterator interface will do so via a cursor object (iterator) which has an explicit parallel implementation of the class which it is intended to abstract iteration for. Languages like Lisp, Ruby, and Sather provide better encapsulation of looping control structures. This is implemented via continuations (a feature, also, of stackless python by the way). To provide an example, I will take a binary tree (in Dali). class BinaryTree { private: BinaryTree left, right; int data; public: // a basic iterator over all elements in the tree int elt!() { if (void(self)) quit; // if (!this) terminate iteration yield data; // provide current data to the client loop yield left.elt!(); // pursue left branch loop yield right.elt!(); // pursue right branch } // ... other implementation details }; A client of this code would use it as follows: aMethod() { BinaryTree tree; // .. some operations to populate the tree loop // the "quit" in the elt!() iterator terminates the loop stdout << tree.elt!() << "\n"; } Iterators function by maintaining their execution state, and on subsequent calls resume where they left off; this is in contrast to 'return' which marks the end of the routine's life. This allows such statements as: sum i = 0; loop sum += range!(1, 10); LEARNING FROM SATHER Some of the astute may notice a similarity to Sather in my example. This is because I quickly found that Dali is essentially Sather with a C/C++ inspired syntax. This is an important thing to note. Languages like Python largely became popular due to their C-like syntax. Eiffel and Sather, are restrained due to their PASCAL-ish (academic) syntax. To drive this point home, I will provide an example class from the Sather tutorial: class POINT is attr x, y:INT; create(xvalue, yvalue:INT):POINT is res:POINT := new; res.x := xvalue; res.y := yvalue; return res; end; end; In Dali: class Point { private: int x, y; public: Point(int xvalue, int yvalue) { Point res = new Point; // or, with type inferencing: "res @= new;" res.x = xvalue; res.y = yvalue; return res; } } This example is relatively simple, but to most the C/C++/C#/D/Java/perl/etc. users out there, the Dali example is much more immediately accessible. This has huge implications on the acceptance of a language as programmers can immediately apply their existing semantic usage without drastic changes in syntax. Languages like Eiffel and Sather will likely forever remain niche languages precisely because their syntax is not mainstream. But they simply provide many language features that C++ developers spend huge amounts of time implementing in a variety of ways. Lets take a quick look at the feature set of Sather and you will see the parallels with goals of D. (These are taken almost verbatim from http://www.gnu.org/software/sather.) - as efficient as C, C++, or Fortran - as elegant as and safer than Eiffel - support higher-order functions and iteration abstraction similar to Common Lisp, Scheme, CLU, or Smalltalk - garbage collection - statically-checked strong typing - multiple inheritance - separate implementation and type inheritance - parameterized classes (generic programming) - dynamic dispatch - exception handling - assertions, preconditions, postconditions, and class invariants - compiles to C and links efficiently with C object files I would add to this list: - semantics for distributed programming (an extension) - semantics for parallel programming (also an extension) - out/inout argument types - lazy evaluation (the 'once' keyword) - basic types are objects That's a lot to take in. Some things such as multiple inheritance are implemented a lot differently than one might expect. (There is a big difference between type inheritance and "code inclusion" in the Sather world.) Every feature that could fit under the heading of "functional language features" are things which C++ programmers are striving hard to achieve. Indeed, it is clear that functional programming techniques are becoming much more mainstream. Without support for this in the language you will find the same effort being spent on D. IN CLOSING Now, I find myself asking whether it is better to provide further examples of Sather with C syntax or to refer the reader to Sather documentation. Certainly the Sather documentation provides many examples which I consider of use, though I don't want waste my efforts. So I will try to wind down here, and see if there are any receptive parties to these thoughts. I feel it is notable that I could find no reference to Sather or Eiffel, nor to other, more mainstream languages such as Lisp. I would expect that a new language effort would consciously evaluate such programming models. As I see it, the only design consideration has been to make a better C++ (just like Java before it). The "D vs. Other Languages" page seems to reflect this. (In fact, some C++ problems remain; e.g., templates will still be used as a meta-programming model within D, though this requires exquisite care to use.) I think it would be an interesting exercise for the language implementors to take a close look at languages like Sather and perhaps revise the language comparison web page. D already has a following and may not want to cover such ground. To reiterate a bit, I mention most of this as part of a selfish desire to see the language I want to use. ABOUT DALI I have had a simple Dali->Sather compiler working for myself, but Sather has changed ownership to GNU, and I am still waiting to see what its future will be under new maintainer-ship. Due to its dubious status, and the fact that Dali is just a pet project, I wouldn't expect to see anything come from this. Although, if there are people who would be interested in starting such a project I would certainly like to participate. There are other goals of Dali which I have not had time to realize. I would like to see a replaceable grammar system. My inspiration for this is Haskell, and the ultimate desire is to be able to utilize Prolog from within the context of a C++ variant. Another goal is to realize the applicability of Aspect-Oriented Programming (AOP) without dependence upon unnatural semantics like AspectJ. These should, in my mind, be language features expressive in a manner less arcane than policy classes (Andrei Alexandrescu). ABOUT MYSELF You've not heard of me before, so I'll give a little idea of who it is making these outrageous statements. I'm a C++ programmer who got his professional start working at IBM on distributed management systems (at Tivoli); i.e., I wrote a lot of C++ in a CORBA environment. Since then I have moved on to writing mediation systems for the telco environment, a context where distributed programming, code reliability, and transactional throughput are key. As a hobby I have studied theoretical linguistics, and this influences much of my inclinations. Regards, --thomas |
April 14, 2003 Re: Commentary on D (or "Using Sather as a Model") | ||||
---|---|---|---|---|
| ||||
Posted in reply to Thomas D. Marsh | Hi, Thomas.
It sounds like your intuition lines up well with mine. I've posted on this group in the past supporting Sather-like iterators and multiple-inheritance.
The main point of this reply is to second your call for a strong look at Sather.
That said, I'll re-state my position on specifics of iterators and code-inclusion.
As for multiple inheritance, I love the Sather code inclusion model. It is more powerful than multiple inheritance in many very important situations. In particular, reusing code from a module that has multiple recursive classes is practically impossible in C++ derived languages, including D. Sather allows me to exactly specify the mapping between classes, methods, and fields. That mapping in Sather is in one place, unlike Ocaml, where it gets spread all over the place. I can inherit from the same complex module multiple times, and resolve the mappings easily. I can even eliminate unwanted code. Also, this feature adds exactly 0 overhead for using it (no VFT pointers, for example). That's a good fit for D. Also, it's very easy to implement in a compiler. That's also a good fit for D. Personally, I think Sather's include construct is a great invention, but the world hasn't had quite enough time learn about it.
For anyone reading this that has no idea what power Sather's include construct can bring you, I recomend looking into it. It has amazing
potential for enabling code-reuse. As as a simple example, try inheriting graph functionality from a graph module using the include construct. As an even simpler example, try inheriting linked list functionality from a module of two classes. If you can do that, now try something harder, like reusing a simulating annealing algorithm on new data structures. Sather can do that. The potential for building a reusable code base of common algorithms is much more real in Sather than it ever was in C++.
As for powerful iterators, I think for D, the continuation construct (or multiple threads) is not efficient enough. However, you can still do Sather-like iterators very efficiently (0 overhead in most cases), if you limit iterators to one per loop. I recomended this some time ago in a thread titled something like "cool iterators".
There actually has been a ton of discussion about language features on this group, including many features from all those languages you mentioned. There is also a lot of disagreement, so I don't think there has been a common desire to include features of those languages.
Bill
Thomas D. Marsh wrote:
> To the D designers,
>
> I have been looking at D for a while now. As I have been considering a similar venture for almost eight years now (ever since I started working with C++), I thought I might make some commentary. Surely the group receives a lot of similar language suggestions, so for that reason I will try be as clear in my points as possible, including ramifications.
>
> Always in my head I have had the concept of a theoretical language called Dali (Salvatore) which has the majority of its goals common with D. Over time, investigating such languages as Java, Prolog, Haskell, and Eiffel, I have found many features which are desirable. (Desirability being a function of how useful these features are to me personally, and how I see these features becoming mainstream in toolkits like STL, and boost.)
>
> My initial complaint with D is one that I have seen voiced before; I won't dwell on it: D doesn't present anything new. I group it somewhere between Java/C# and C++ in the space of languages. I have a feeling it will remain an isolated niche as long as it remains in this space. However I still have a dream of Dali becoming a reality. (The difference is that D has an implementation seems to be actively fishing for ideas.)
>
> I will praise the addition of contracts and versioning to the language as these are excellent paradigms, although I would extend the latter.
>
> So, on to my ideas. None of them are really novel, being inspired by other languages, and I will try to present them in some sort of order.
>
> AN EXAMPLE: ITERATORS / CURSORS
>
> There is explicit mention of this on the "Future" listing on the D home page, and I think this is my most critical point. In C++ (and apparently in D as well) a complex container which provides an iterator interface will do so via a cursor object (iterator) which has an explicit parallel implementation of the class which it is intended to abstract iteration for. Languages like Lisp, Ruby, and Sather provide better encapsulation of looping control structures. This is implemented via continuations (a feature, also, of stackless python by the way).
>
> To provide an example, I will take a binary tree (in Dali).
>
> class BinaryTree
> {
> private:
> BinaryTree left, right;
> int data;
> public:
> // a basic iterator over all elements in the tree
> int elt!()
> {
> if (void(self)) quit; // if (!this) terminate iteration
> yield data; // provide current data to the client
> loop
> yield left.elt!(); // pursue left branch
> loop
> yield right.elt!(); // pursue right branch
> }
> // ... other implementation details
> };
>
>
> A client of this code would use it as follows:
>
> aMethod()
> {
> BinaryTree tree;
>
> // .. some operations to populate the tree
>
> loop
> // the "quit" in the elt!() iterator terminates the loop
> stdout << tree.elt!() << "\n";
> }
>
> Iterators function by maintaining their execution state, and on subsequent calls resume where they left off; this is in contrast to 'return' which marks the end of the routine's life. This allows such statements as:
>
> sum i = 0;
> loop sum += range!(1, 10);
>
>
> LEARNING FROM SATHER
>
> Some of the astute may notice a similarity to Sather in my example. This is because I quickly found that Dali is essentially Sather with a C/C++ inspired syntax. This is an important thing to note. Languages like Python largely became popular due to their C-like syntax. Eiffel and Sather, are restrained due to their PASCAL-ish (academic) syntax. To drive this point home, I will provide an example class from the Sather tutorial:
>
> class POINT is
> attr x, y:INT;
>
> create(xvalue, yvalue:INT):POINT is
> res:POINT := new;
> res.x := xvalue;
> res.y := yvalue;
> return res;
> end;
> end;
>
> In Dali:
>
> class Point
> {
> private:
> int x, y;
> public:
> Point(int xvalue, int yvalue)
> {
> Point res = new Point;
> // or, with type inferencing: "res @= new;"
> res.x = xvalue;
> res.y = yvalue;
> return res;
> }
> }
>
> This example is relatively simple, but to most the C/C++/C#/D/Java/perl/etc. users out there, the Dali example is much more immediately accessible. This has huge implications on the acceptance of a language as programmers can immediately apply their existing semantic usage without drastic changes in syntax.
>
> Languages like Eiffel and Sather will likely forever remain niche languages precisely because their syntax is not mainstream. But they simply provide many language features that C++ developers spend huge amounts of time implementing in a variety of ways.
>
> Lets take a quick look at the feature set of Sather and you will see the parallels with goals of D. (These are taken almost verbatim from http://www.gnu.org/software/sather.)
>
> - as efficient as C, C++, or Fortran
> - as elegant as and safer than Eiffel
> - support higher-order functions and iteration abstraction similar
> to Common Lisp, Scheme, CLU, or Smalltalk
> - garbage collection
> - statically-checked strong typing
> - multiple inheritance
> - separate implementation and type inheritance
> - parameterized classes (generic programming)
> - dynamic dispatch
> - exception handling
> - assertions, preconditions, postconditions, and class invariants
> - compiles to C and links efficiently with C object files
>
> I would add to this list:
>
> - semantics for distributed programming (an extension)
> - semantics for parallel programming (also an extension)
> - out/inout argument types
> - lazy evaluation (the 'once' keyword)
> - basic types are objects
>
> That's a lot to take in. Some things such as multiple inheritance are implemented a lot differently than one might expect. (There is a big difference between type inheritance and "code inclusion" in the Sather world.)
>
> Every feature that could fit under the heading of "functional language features" are things which C++ programmers are striving hard to achieve. Indeed, it is clear that functional programming techniques are becoming much more mainstream. Without support for this in the language you will find the same effort being spent on D.
>
>
> IN CLOSING
>
> Now, I find myself asking whether it is better to provide further examples of Sather with C syntax or to refer the reader to Sather documentation. Certainly the Sather documentation provides many examples which I consider of use, though I don't want waste my efforts. So I will try to wind down here, and see if there are any receptive parties to these thoughts.
>
> I feel it is notable that I could find no reference to Sather or Eiffel, nor to other, more mainstream languages such as Lisp. I would expect that a new language effort would consciously evaluate such programming models. As I see it, the only design consideration has been to make a better C++ (just like Java before it). The "D vs. Other Languages" page seems to reflect this. (In fact, some C++ problems remain; e.g., templates will still be used as a meta-programming model within D, though this requires exquisite care to use.) I think it would be an interesting exercise for the language implementors to take a close look at languages like Sather and perhaps revise the language comparison web page.
>
> D already has a following and may not want to cover such ground. To reiterate a bit, I mention most of this as part of a selfish desire to see the language I want to use.
>
>
> ABOUT DALI
>
> I have had a simple Dali->Sather compiler working for myself, but Sather has changed ownership to GNU, and I am still waiting to see what its future will be under new maintainer-ship. Due to its dubious status, and the fact that Dali is just a pet project, I wouldn't expect to see anything come from this. Although, if there are people who would be interested in starting such a project I would certainly like to participate.
>
> There are other goals of Dali which I have not had time to realize. I would like to see a replaceable grammar system. My inspiration for this is Haskell, and the ultimate desire is to be able to utilize Prolog from within the context of a C++ variant.
>
> Another goal is to realize the applicability of Aspect-Oriented Programming (AOP) without dependence upon unnatural semantics like AspectJ. These should, in my mind, be language features expressive in a manner less arcane than policy classes (Andrei Alexandrescu).
>
>
> ABOUT MYSELF
>
> You've not heard of me before, so I'll give a little idea of who it is making these outrageous statements. I'm a C++ programmer who got his professional start working at IBM on distributed management systems (at Tivoli); i.e., I wrote a lot of C++ in a CORBA environment. Since then I have moved on to writing mediation systems for the telco environment, a context where distributed programming, code reliability, and transactional throughput are key. As a hobby I have studied theoretical linguistics, and this influences much of my inclinations.
>
>
> Regards,
>
> --thomas
>
>
|
April 15, 2003 Re: Commentary on D (or "Using Sather as a Model") | ||||
---|---|---|---|---|
| ||||
Posted in reply to Bill Cox | Hi Bill, Your previous posts are no longer available on the Digital Mars NNTP server, so thank you for reposting the content. One thing I don't understand is your statement that iterators would be simpler if there is only one allowed per loop. On the contrary, I don't think they are so complicated (implementationally - perhaps they are comprehensionally) that they can not be implemented via standard mechanisms that the compiler would use when emitting control flow structures in assembly. Iterators are actually just a specific application of co-routines, and as such have a long history in computing (Knuth, et al). Keld Helsgaun has written a small portable library (1999) called "COROUTINE". It can be found at the following location: http://www.dat.ruc.dk/~keld/research/COROUTINE/ I suggest taking a look at this now as I will use his terminology from here. Notable is that his library is using setjmp/longjmp, but is not without pitfalls. I will take a look at another approach (used in Sather) further down, but I think this lays a good foundation for now. Lets say that an iterator can be defined in D as follows: iterator int adder() { int i = 0; for(;;) { yield i; i++; } } In the COROUTINE library I might implement this as: class adder : public Coroutine { private: int i; public: int getValue() { Call(this); // invokes the virt Routine() return i; } void Routine() { i = 0; for (;;) { Detach(); i++; } } }; Now we have a simple class which exactly emulates this iterator. To use this in C++: int main() { // initialize the COROUTINE library; we should specify // a stack size to use.. InitSequencing(); // allocate our adder. adder *a = new adder; for (;;) std::cout << adder->getValue() << std::endl; } This will print out an endless stream of digits: 0 1 2 ... Notable is that the adder::Routine method is picking up where it left off due to the setjmp/longjmp magic spawned from the Call(Coroutine*) function. There are several drawbacks to this model. As far as C++ goes, you will always be calling the virtual method which has its overhead. Definitely not the stuff for a low level iterator. Secondly, you need to specify the stack size (or use the default), which leads to wastage on over-estimates and runtime errors on under-estimates. Now that we see this sort of stuff is possible without resorting to threads we can consider a more traditional C implementation. It may be useful to refer to Simon Tatham's article: http://www.chiark.greenend.org.uk/~sgtatham/coroutines.html It is a messy implementation, but it works. It is also closer to how the Sather compiler implements it's iterators. If we look at his implementation model for our adder: int adder() { static int i, state = 0; switch (state) { case 0: // start of function i = 0; state = 1; return i; case 1: i++; return i; } } Okay, that's pretty simple, but as Simon says, those statics are basically a hack. He provides a reentrant version of his macros for manipulating these structures, but what you get to at the end of the day, when you implement it cleanly, is exactly what Sather implements, a frame structure. typedef struct adder_frame_struct { int state; int i; } *adder_frame; void adder(adder_frame frame) { switch (frame->state) { case 0: goto state0; case 1: goto state1; } state0: { frame->i = 0; for (;;) { frame->i++; frame->state = 1; return; state1: ; } } } You can now test this as such: int main() { struct adder_frame_struct f; for (;;) { adder(&f); printf("%d\n", f.i); } } It has the benefits of being non-wasteful, efficient, reentrant, and allows you to use multiple iterators within the same loop context. Why isn't it more popular? Well, basically, without native language support for it, you have to jump through hoops to make these things work. You can see from the explicit states that have to be maintained as well as the plethora of frame structures for different local variables (not to mention those parameters which are evaluated once, or each time through iteration) that things can indeed get quite hairy when coding these by hand. Maybe in C++ this can be done with meta-template expressions; I will look into it (I just thought of it now). But in the meantime, I think when examining the C implementation, there is a clear correlation between the proposed syntax for D and the implementation in the assembly emitter. I would like to close with a reminder of why iterators are so important. Sather users found that their code translated from C++ to Sather immediately saw a performance boost. Much of this can be attributed to the implementation of iterators in the language. C++ iterators (we should call them "cursors" or "riders") must be initialized, and often have to do a lot of work that the class they model already implements. This overhead adds up when you, e.g., have a big project using these cursors throughout the implementation. In addition, the implementation of tree structures and the like have to maintain their own stacks in order to compensate for their meta-statefulness. Iterators are a strange concept to most at first site, but they are an architectural solution that brings great rewards. Yours, --thomas Bill Cox wrote: > Hi, Thomas. > > It sounds like your intuition lines up well with mine. I've posted on this group in the past supporting Sather-like iterators and multiple-inheritance. > > The main point of this reply is to second your call for a strong look at Sather. > > That said, I'll re-state my position on specifics of iterators and code-inclusion. > > As for multiple inheritance, I love the Sather code inclusion model. It is more powerful than multiple inheritance in many very important situations. In particular, reusing code from a module that has multiple recursive classes is practically impossible in C++ derived languages, including D. Sather allows me to exactly specify the mapping between classes, methods, and fields. That mapping in Sather is in one place, unlike Ocaml, where it gets spread all over the place. I can inherit from the same complex module multiple times, and resolve the mappings easily. I can even eliminate unwanted code. Also, this feature adds exactly 0 overhead for using it (no VFT pointers, for example). That's a good fit for D. Also, it's very easy to implement in a compiler. That's also a good fit for D. Personally, I think Sather's include construct is a great invention, but the world hasn't had quite enough time learn about it. > > For anyone reading this that has no idea what power Sather's include construct can bring you, I recomend looking into it. It has amazing potential for enabling code-reuse. As as a simple example, try inheriting graph functionality from a graph module using the include construct. As an even simpler example, try inheriting linked list functionality from a module of two classes. If you can do that, now try something harder, like reusing a simulating annealing algorithm on new data structures. Sather can do that. The potential for building a reusable code base of common algorithms is much more real in Sather than it ever was in C++. > > As for powerful iterators, I think for D, the continuation construct (or multiple threads) is not efficient enough. However, you can still do Sather-like iterators very efficiently (0 overhead in most cases), if you limit iterators to one per loop. I recomended this some time ago in a thread titled something like "cool iterators". > > There actually has been a ton of discussion about language features on this group, including many features from all those languages you mentioned. There is also a lot of disagreement, so I don't think there has been a common desire to include features of those languages. > > Bill -- Many are called, few are chosen. Fewer still get to do the choosing. |
April 16, 2003 Re: Commentary on D (or "Using Sather as a Model") | ||||
---|---|---|---|---|
| ||||
Posted in reply to Thomas D. Marsh | Hi, Thomas.
Thanks for the detailed description of the Sather implementation of iterators. It's very cool. Given the efficiency, I'd support the more general version.
I'd still like to see the compiler optimize away the overhead in simple cases (which are the most common). For example, most simple foreach style loops have a single iterator, and a single body of statements. It is often possible to in-line the iterator function, and replace the yield statement with the body of the loop. Then, there's pretty much no overhead at all.
The more I learn about Sather, the more impressed I've become.
Bill
Thomas D. Marsh wrote:
> Hi Bill,
>
> Your previous posts are no longer available on the Digital Mars NNTP server,
> so thank you for reposting the content. One thing I don't understand is
> your statement that iterators would be simpler if there is only one allowed
> per loop. On the contrary, I don't think they are so complicated
> (implementationally - perhaps they are comprehensionally) that they can not
> be implemented via standard mechanisms that the compiler would use when
> emitting control flow structures in assembly.
>
> Iterators are actually just a specific application of co-routines, and as
> such have a long history in computing (Knuth, et al). Keld Helsgaun has
> written a small portable library (1999) called "COROUTINE". It can be found
> at the following location:
>
> http://www.dat.ruc.dk/~keld/research/COROUTINE/
>
> I suggest taking a look at this now as I will use his terminology from here.
> Notable is that his library is using setjmp/longjmp, but is not without
> pitfalls. I will take a look at another approach (used in Sather) further
> down, but I think this lays a good foundation for now.
>
> Lets say that an iterator can be defined in D as follows:
>
> iterator int adder()
> {
> int i = 0;
> for(;;)
> {
> yield i;
> i++;
> }
> }
>
> In the COROUTINE library I might implement this as:
>
> class adder : public Coroutine
> {
> private:
> int i;
> public:
> int getValue()
> {
> Call(this); // invokes the virt Routine()
> return i;
> }
>
> void Routine()
> {
>
> i = 0;
> for (;;)
> {
> Detach();
> i++;
> }
> }
> };
>
> Now we have a simple class which exactly emulates this iterator. To use this
> in C++:
>
> int main()
> {
> // initialize the COROUTINE library; we should specify
> // a stack size to use..
>
> InitSequencing();
>
> // allocate our adder.
> adder *a = new adder;
>
> for (;;)
> std::cout << adder->getValue() << std::endl;
> }
>
> This will print out an endless stream of digits:
>
> 0
> 1
> 2
> ...
>
> Notable is that the adder::Routine method is picking up where it left off
> due to the setjmp/longjmp magic spawned from the Call(Coroutine*) function.
>
> There are several drawbacks to this model. As far as C++ goes, you will
> always be calling the virtual method which has its overhead. Definitely not
> the stuff for a low level iterator. Secondly, you need to specify the stack
> size (or use the default), which leads to wastage on over-estimates and
> runtime errors on under-estimates.
>
> Now that we see this sort of stuff is possible without resorting to threads
> we can consider a more traditional C implementation. It may be useful to
> refer to Simon Tatham's article:
>
> http://www.chiark.greenend.org.uk/~sgtatham/coroutines.html
>
> It is a messy implementation, but it works. It is also closer to how the
> Sather compiler implements it's iterators. If we look at his implementation
> model for our adder:
>
> int adder()
> {
> static int i, state = 0;
> switch (state)
> {
> case 0: // start of function
> i = 0;
> state = 1;
> return i;
> case 1:
> i++;
> return i;
> }
> }
>
> Okay, that's pretty simple, but as Simon says, those statics are basically a
> hack. He provides a reentrant version of his macros for manipulating these
> structures, but what you get to at the end of the day, when you implement
> it cleanly, is exactly what Sather implements, a frame structure.
>
> typedef struct adder_frame_struct
> {
> int state;
> int i;
> } *adder_frame;
>
> void adder(adder_frame frame)
> {
> switch (frame->state)
> {
> case 0: goto state0;
> case 1: goto state1;
> }
>
> state0:
> {
> frame->i = 0;
> for (;;)
> {
> frame->i++;
> frame->state = 1;
> return;
> state1: ;
> }
> }
> }
>
>
> You can now test this as such:
>
> int main()
> {
> struct adder_frame_struct f;
>
> for (;;)
> {
> adder(&f);
> printf("%d\n", f.i);
> }
> }
>
>
> It has the benefits of being non-wasteful, efficient, reentrant, and allows
> you to use multiple iterators within the same loop context. Why isn't it
> more popular? Well, basically, without native language support for it, you
> have to jump through hoops to make these things work. You can see from the
> explicit states that have to be maintained as well as the plethora of frame
> structures for different local variables (not to mention those parameters
> which are evaluated once, or each time through iteration) that things can
> indeed get quite hairy when coding these by hand.
>
> Maybe in C++ this can be done with meta-template expressions; I will look
> into it (I just thought of it now). But in the meantime, I think when
> examining the C implementation, there is a clear correlation between the
> proposed syntax for D and the implementation in the assembly emitter.
>
> I would like to close with a reminder of why iterators are so important.
> Sather users found that their code translated from C++ to Sather
> immediately saw a performance boost. Much of this can be attributed to the
> implementation of iterators in the language. C++ iterators (we should call
> them "cursors" or "riders") must be initialized, and often have to do a lot
> of work that the class they model already implements. This overhead adds up
> when you, e.g., have a big project using these cursors throughout the
> implementation. In addition, the implementation of tree structures and the
> like have to maintain their own stacks in order to compensate for their
> meta-statefulness.
>
> Iterators are a strange concept to most at first site, but they are an
> architectural solution that brings great rewards.
>
> Yours,
>
> --thomas
>
>
> Bill Cox wrote:
>
>
>>Hi, Thomas.
>>
>>It sounds like your intuition lines up well with mine. I've posted on
>>this group in the past supporting Sather-like iterators and
>>multiple-inheritance.
>>
>>The main point of this reply is to second your call for a strong look at
>>Sather.
>>
>>That said, I'll re-state my position on specifics of iterators and
>>code-inclusion.
>>
>>As for multiple inheritance, I love the Sather code inclusion model. It
>>is more powerful than multiple inheritance in many very important
>>situations. In particular, reusing code from a module that has multiple
>>recursive classes is practically impossible in C++ derived languages,
>>including D. Sather allows me to exactly specify the mapping between
>>classes, methods, and fields. That mapping in Sather is in one place,
>>unlike Ocaml, where it gets spread all over the place. I can inherit
>>from the same complex module multiple times, and resolve the mappings
>>easily. I can even eliminate unwanted code. Also, this feature adds
>>exactly 0 overhead for using it (no VFT pointers, for example). That's
>>a good fit for D. Also, it's very easy to implement in a compiler.
>>That's also a good fit for D. Personally, I think Sather's include
>>construct is a great invention, but the world hasn't had quite enough
>>time learn about it.
>>
>>For anyone reading this that has no idea what power Sather's include
>>construct can bring you, I recomend looking into it. It has amazing
>>potential for enabling code-reuse. As as a simple example, try
>>inheriting graph functionality from a graph module using the include
>>construct. As an even simpler example, try inheriting linked list
>>functionality from a module of two classes. If you can do that, now try
>>something harder, like reusing a simulating annealing algorithm on new
>>data structures. Sather can do that. The potential for building a
>>reusable code base of common algorithms is much more real in Sather than
>>it ever was in C++.
>>
>>As for powerful iterators, I think for D, the continuation construct (or
>>multiple threads) is not efficient enough. However, you can still do
>>Sather-like iterators very efficiently (0 overhead in most cases), if
>>you limit iterators to one per loop. I recomended this some time ago in
>>a thread titled something like "cool iterators".
>>
>>There actually has been a ton of discussion about language features on
>>this group, including many features from all those languages you
>>mentioned. There is also a lot of disagreement, so I don't think there
>>has been a common desire to include features of those languages.
>>
>>Bill
>
>
|
April 17, 2003 Re: Commentary on D (or "Using Sather as a Model") | ||||
---|---|---|---|---|
| ||||
Posted in reply to Bill Cox | COROUTINES are essentially a Windows Fiber. A fiber is like a thread but without potential for concurrent execution. However a coroutine could be implemented in a way that allowed more than one processor to be involved automatically, but the only communication possible is via function calls at which point execution is synchronized for the tranfer. This whole idea is very similar to a method call. Think of a function as an object that lives on the stack and can receive, and in turn send, messages to other objects that live in the stack. When it gets messy is when the objects are destroyed explicitly. Yield is a very primitive control transfer instruction. It's even lower tech than goto. Does that mean we shouldn't use it? Sean "Bill Cox" <bill@viasic.com> wrote in message news:3E9D8DD5.7010908@viasic.com... > Hi, Thomas. > > Thanks for the detailed description of the Sather implementation of iterators. It's very cool. Given the efficiency, I'd support the more general version. > > I'd still like to see the compiler optimize away the overhead in simple cases (which are the most common). For example, most simple foreach style loops have a single iterator, and a single body of statements. It is often possible to in-line the iterator function, and replace the yield statement with the body of the loop. Then, there's pretty much no overhead at all. > > The more I learn about Sather, the more impressed I've become. > > Bill > > Thomas D. Marsh wrote: > > Hi Bill, > > > > Your previous posts are no longer available on the Digital Mars NNTP server, > > so thank you for reposting the content. One thing I don't understand is your statement that iterators would be simpler if there is only one allowed > > per loop. On the contrary, I don't think they are so complicated (implementationally - perhaps they are comprehensionally) that they can not > > be implemented via standard mechanisms that the compiler would use when emitting control flow structures in assembly. > > > > Iterators are actually just a specific application of co-routines, and as > > such have a long history in computing (Knuth, et al). Keld Helsgaun has > > written a small portable library (1999) called "COROUTINE". It can be found > > at the following location: > > > > http://www.dat.ruc.dk/~keld/research/COROUTINE/ > > > > I suggest taking a look at this now as I will use his terminology from here. > > Notable is that his library is using setjmp/longjmp, but is not without pitfalls. I will take a look at another approach (used in Sather) further > > down, but I think this lays a good foundation for now. > > > > Lets say that an iterator can be defined in D as follows: > > > > iterator int adder() > > { > > int i = 0; > > for(;;) > > { > > yield i; > > i++; > > } > > } > > > > In the COROUTINE library I might implement this as: > > > > class adder : public Coroutine > > { > > private: > > int i; > > public: > > int getValue() > > { > > Call(this); // invokes the virt Routine() > > return i; > > } > > > > void Routine() > > { > > > > i = 0; > > for (;;) > > { > > Detach(); > > i++; > > } > > } > > }; > > > > Now we have a simple class which exactly emulates this iterator. To use this > > in C++: > > > > int main() > > { > > // initialize the COROUTINE library; we should specify > > // a stack size to use.. > > > > InitSequencing(); > > > > // allocate our adder. > > adder *a = new adder; > > > > for (;;) > > std::cout << adder->getValue() << std::endl; > > } > > > > This will print out an endless stream of digits: > > > > 0 > > 1 > > 2 > > ... > > > > Notable is that the adder::Routine method is picking up where it left off > > due to the setjmp/longjmp magic spawned from the Call(Coroutine*) function. > > > > There are several drawbacks to this model. As far as C++ goes, you will always be calling the virtual method which has its overhead. Definitely not > > the stuff for a low level iterator. Secondly, you need to specify the stack > > size (or use the default), which leads to wastage on over-estimates and runtime errors on under-estimates. > > > > Now that we see this sort of stuff is possible without resorting to threads > > we can consider a more traditional C implementation. It may be useful to refer to Simon Tatham's article: > > > > http://www.chiark.greenend.org.uk/~sgtatham/coroutines.html > > > > It is a messy implementation, but it works. It is also closer to how the Sather compiler implements it's iterators. If we look at his implementation > > model for our adder: > > > > int adder() > > { > > static int i, state = 0; > > switch (state) > > { > > case 0: // start of function > > i = 0; > > state = 1; > > return i; > > case 1: > > i++; > > return i; > > } > > } > > > > Okay, that's pretty simple, but as Simon says, those statics are basically a > > hack. He provides a reentrant version of his macros for manipulating these > > structures, but what you get to at the end of the day, when you implement > > it cleanly, is exactly what Sather implements, a frame structure. > > > > typedef struct adder_frame_struct > > { > > int state; > > int i; > > } *adder_frame; > > > > void adder(adder_frame frame) > > { > > switch (frame->state) > > { > > case 0: goto state0; > > case 1: goto state1; > > } > > > > state0: > > { > > frame->i = 0; > > for (;;) > > { > > frame->i++; > > frame->state = 1; > > return; > > state1: ; > > } > > } > > } > > > > > > You can now test this as such: > > > > int main() > > { > > struct adder_frame_struct f; > > > > for (;;) > > { > > adder(&f); > > printf("%d\n", f.i); > > } > > } > > > > > > It has the benefits of being non-wasteful, efficient, reentrant, and allows > > you to use multiple iterators within the same loop context. Why isn't it more popular? Well, basically, without native language support for it, you > > have to jump through hoops to make these things work. You can see from the > > explicit states that have to be maintained as well as the plethora of frame > > structures for different local variables (not to mention those parameters > > which are evaluated once, or each time through iteration) that things can > > indeed get quite hairy when coding these by hand. > > > > Maybe in C++ this can be done with meta-template expressions; I will look > > into it (I just thought of it now). But in the meantime, I think when examining the C implementation, there is a clear correlation between the proposed syntax for D and the implementation in the assembly emitter. > > > > I would like to close with a reminder of why iterators are so important. Sather users found that their code translated from C++ to Sather immediately saw a performance boost. Much of this can be attributed to the > > implementation of iterators in the language. C++ iterators (we should call > > them "cursors" or "riders") must be initialized, and often have to do a lot > > of work that the class they model already implements. This overhead adds up > > when you, e.g., have a big project using these cursors throughout the implementation. In addition, the implementation of tree structures and the > > like have to maintain their own stacks in order to compensate for their meta-statefulness. > > > > Iterators are a strange concept to most at first site, but they are an architectural solution that brings great rewards. > > > > Yours, > > > > --thomas > > > > > > Bill Cox wrote: > > > > > >>Hi, Thomas. > >> > >>It sounds like your intuition lines up well with mine. I've posted on this group in the past supporting Sather-like iterators and multiple-inheritance. > >> > >>The main point of this reply is to second your call for a strong look at Sather. > >> > >>That said, I'll re-state my position on specifics of iterators and code-inclusion. > >> > >>As for multiple inheritance, I love the Sather code inclusion model. It is more powerful than multiple inheritance in many very important situations. In particular, reusing code from a module that has multiple recursive classes is practically impossible in C++ derived languages, including D. Sather allows me to exactly specify the mapping between classes, methods, and fields. That mapping in Sather is in one place, unlike Ocaml, where it gets spread all over the place. I can inherit from the same complex module multiple times, and resolve the mappings easily. I can even eliminate unwanted code. Also, this feature adds exactly 0 overhead for using it (no VFT pointers, for example). That's a good fit for D. Also, it's very easy to implement in a compiler. That's also a good fit for D. Personally, I think Sather's include construct is a great invention, but the world hasn't had quite enough time learn about it. > >> > >>For anyone reading this that has no idea what power Sather's include construct can bring you, I recomend looking into it. It has amazing potential for enabling code-reuse. As as a simple example, try inheriting graph functionality from a graph module using the include construct. As an even simpler example, try inheriting linked list functionality from a module of two classes. If you can do that, now try something harder, like reusing a simulating annealing algorithm on new data structures. Sather can do that. The potential for building a reusable code base of common algorithms is much more real in Sather than it ever was in C++. > >> > >>As for powerful iterators, I think for D, the continuation construct (or multiple threads) is not efficient enough. However, you can still do Sather-like iterators very efficiently (0 overhead in most cases), if you limit iterators to one per loop. I recomended this some time ago in a thread titled something like "cool iterators". > >> > >>There actually has been a ton of discussion about language features on this group, including many features from all those languages you mentioned. There is also a lot of disagreement, so I don't think there has been a common desire to include features of those languages. > >> > >>Bill > > > > > |
April 17, 2003 Re: Commentary on D (or "Using Sather as a Model") | ||||
---|---|---|---|---|
| ||||
Posted in reply to Bill Cox | Hi Bill, According to the Sather FAQ and my tests with the compiler, it is indeed possible to inline the simple cases. As per the FAQ: "Iterators are as efficient as standard loops when they are built-in or inlined (they are converted into standard C loops). In other cases they are probably significantly better than the "iterator objects" i.e. cursors, that might otherwise be required. Simple iterators are inlined. In order to be inlined, an iterator must have at most one yield, and the yield must not be in a conditional statement. i.e. the iterator definition must be of the form: iter_def!( ) is iter_init; loop ... code before the yield yield (only one, and not in an if statement) ... code after the yield end; other_code; end; Indeed, the following Dali code: class IteratorTest { iterator int adder() { int i = 0; loop { yield i; i++; } } iterator int doubler() { loop yield adder() * 2; } main() { loop stdout << doubler() << "\n"; } } produces the following (cleaned up) code: void sather_main(MAIN self) { INT i =0; BOOL multiplierb_if f = TRUE; BOOL adderb_if_first = TRUE; INT adder_tmp; INT mul_tmp1; INT mul_tmp2; /* ... other locals here */ while (1) { if (adderb_if_first) adderb_if_first = FALSE; else { adder_tmp = i+1; i = adder_tmp; } mul_tmp1 = i; mul_tmp2 = mul_tmp1*2; /* on to operations to output the integer */ } after_loop: ; } The only notable thing about this rendering is the excessive use of temporaries that should be optimized away (leading to bloated stacks for each routine as the programs get larger). Of course this simply means a little more work for the compiler writer, but completely within the domain of standard compiler optimization techniques. I'm not sure what you mean by the "more general version", but if you are referring to the COROUTINE library, then you start to get into the Simula world which provides the same semantics as the COROUTINE library in the language itself (e.g., Detach, Resume(obj), Call(obj).) I think this is beyond the scope of D as regards language primitives, and people who need this functionality can implement their own COROUTINE library in D. --thomas Bill Cox wrote: > Hi, Thomas. > > Thanks for the detailed description of the Sather implementation of iterators. It's very cool. Given the efficiency, I'd support the more general version. > > I'd still like to see the compiler optimize away the overhead in simple cases (which are the most common). For example, most simple foreach style loops have a single iterator, and a single body of statements. It is often possible to in-line the iterator function, and replace the yield statement with the body of the loop. Then, there's pretty much no overhead at all. > > The more I learn about Sather, the more impressed I've become. > > Bill -- If a 6600 used paper tape instead of core memory, it would use up tape at about 30 miles/second. -- Grishman, Assembly Language Programming |
April 17, 2003 Re: Commentary on D (or "Using Sather as a Model") | ||||
---|---|---|---|---|
| ||||
Posted in reply to Sean L. Palmer | Hi Sean, Although they can be useful as a comparison, or an introduction, Windows fibers are essentially support for the UNIX style of program managed threads (i.e., non-kernel level), and hence are distinct from coroutines. The latter predating the notion of threads (or indeed any concurrency whatsoever). Knuth has a whole section devoted to them (The Art of Computer Programming, Vol. I., 3rd edition, Section 1.4.2, p. 193-200.) A coroutine is like a routine (function), except that a routine always starts at its beginning, whereas a coroutine picks up where it left off. That means that a yield is equivalent to a return, with the caveat that the yielding function must store the point to jump to on re-entry. Coroutines are the stuff of assembly language, requiring slightly different use of jumping than normal routines. The can, however, be emulated in C with switch statements. In the case of our yielding iterators, which are intended to be used by many pieces of client code (perhaps even concurrently) this means that the entire stack of the iterator must actually be allocated and maintained by the caller. Where to store this information, stack or heap, is up to the compiler, or possibly the programmer. But the point is that the caller is the one that maintains this storage. This has the advantage that it can even be allocated on the heap and garbage collected. At the user's/language-designer's option, there could even be room for explicit deletions, but they are not necessarily messy. Lets take an example (I love examples): iterator int range(once int start, once int upto) { int i = start; loop while(i <= max) { yield i; i++; } } This could be called as such: int main() { loop printf("%d\n", range(10, 20)); } The key thing is that the "loop" keyword indicates a looping construct that will not end until an iterator involved calls a "quit", or reaches the end of its execution. Important is that no synchronization is involved because at each point in time the range iterator provides on demand the next value. You can unroll the loop and it starts to look like the hand coded while loop. To address the point you made about explit deletion, we could allow iterators to be used as described above, or perhaps as such: int main() { iterator iter = new range(10, 20); loop printf("%d\n", iter()); delete iter; } Okay, the "new <iterator_name>(<args>)" syntax is probably not the most clear, but this model also gives us some interesting possibilities. Trying to emulate this in C, we end up with a struct to hold the execution state of the iterator, the "start", and "upto" arguments, and all the locals (in this case, just "i"). Therefore: typedef struct range_frame_struct { int state; int start; int upto; int i; } *range_frame; The range iterator itself has no stack variables (okay, I've put one in, but it is just to handle the invalid iterator case) and looks like the following code: int range(range_frame frame) { int dummy = 0; switch (frame->state) { case 0: frame->i = frame->start; frame->state = 1; while (frame->i < frame->upto) { return frame->i; case 1: frame->i++; } frame->state = -1; return frame->i; case -1: default: return dummy; } } Our invocation ("range(10,20)") actually will involve the following steps in the main routine: - allocate the frame - initialize the frame state to 0 - set the arguments in the frame Hence: struct range_frame_struct r; r.state = 0; r.start = 10; r.upto = 20; while (r.state != -1) printf("%d\n", range(&r)); In the version where we explicitly allocated the iterator and wish to delete it, the implementation is obvious in C: struct range_frame_struct *r = (struct range_frame_struct *) GC_ALLOC(sizeof(struct range_frame_struct)); Now, in the case that the iterator calls other objects, we must keep in mind that all of the iterator's local variables are not declared physically on its stack. This would include the pointer to self/this in the case that the iterator is defined within a class. So, there are certainly issues with explicit deletion of, for example, elements in an array, while at the same time iterating over elements. You can shoot yourself in foot this way just as with C++ STL iterators. The difference being that the iterators are quite simple and require less time to debug. We have discussed trivial iterators in this thread, but they're true expressive power becomes clear when used in combination with complex structures. I refer you to this great paper written by the Sather guys: http://www.icsi.berkeley.edu/~sather/Publications/toplas.html Note, that the even have an elegant implementation of the Sieve of Eratosthenes. Could that code sample for D on the web pages be alternately implemented similarly to the listing I have at the end of this post I wonder? Regards, --thoams P.S. The Dali implementation of The Sieve of Eratosthenes. As the author states, this is more an example of the expressive powers of iterators rather than an efficient implementation. iterator bool sieve(int aprime) { int d = aprime; yield true; loop yield (((aprime % d) == 0) ? false : sieve(aprime)); } iterator int primes { int r = 2; loop { if (sieve(r)) yield r; r++; } } int main() { loop printf("%d\n", primes()); } Sean L. Palmer wrote: > COROUTINES are essentially a Windows Fiber. A fiber is like a thread but without potential for concurrent execution. However a coroutine could be implemented in a way that allowed more than one processor to be involved automatically, but the only communication possible is via function calls at which point execution is synchronized for the transfer. > > This whole idea is very similar to a method call. Think of a function as an object that lives on the stack and can receive, and in turn send, messages to other objects that live in the stack. When it gets messy is when the objects are destroyed explicitly. > > Yield is a very primitive control transfer instruction. It's even lower tech than goto. Does that mean we shouldn't use it? > > Sean -- BE ALOOF! (There has been a recent population explosion of lerts.) |
April 17, 2003 Re: Commentary on D (or "Using Sather as a Model") | ||||
---|---|---|---|---|
| ||||
Posted in reply to Thomas D. Marsh | First mention of Sather on D news was from yours truly, http://www.digitalmars.com/drn-bin/wwwnews?D/10827 Sather might be getting more attention than it deserves, but the interest is gratifying anyway. Coroutines are not the most general concept, but continuations. My earlier posts about the subject are linked below. They are pertinent to D as a low-level systems language. http://www.digitalmars.com/drn-bin/wwwnews?D/11434 http://www.digitalmars.com/drn-bin/wwwnews?D/11439 "Microkernel operating systems should expose continuations, not threads, as the low-level interface to the kernel. Threads should be provided as a library in user-space layered on top of the microkernel. Richard Draves' thesis demonstrated that using continuations as a programming technique within the MACH 3.0 kernel simplified the code and vastly improved performance." http://web.access.net.au/felixadv/files/output/book/c2344.html "Continuations ... are a primitive notion that can be used to model any flow of control, including long-distance transfers such as raising exceptions." Sather's "include" construct is purely syntactical. Develop what opinions you will from that fact. Mark |
April 18, 2003 Re: Commentary on D (or "Using Sather as a Model") | ||||
---|---|---|---|---|
| ||||
Posted in reply to Mark Evans | I'm sorry but I posted about it before... http://www.digitalmars.com/drn-bin/wwwnews?D/10010 And some links at... http://www.digitalmars.com/drn-bin/wwwnews?D/10086 I'm pretty sure that there's older posts mentioning Sather. Perhaps Nobert Nemec, was one of the Sather maintainers, posted something before, or maybe I've mentioned it. "Mark Evans" <Mark_member@pathlink.com> escreveu na mensagem news:b7msc6$117j$1@digitaldaemon.com... > First mention of Sather on D news was from yours truly, http://www.digitalmars.com/drn-bin/wwwnews?D/10827 > > Sather might be getting more attention than it deserves, but the interest is > gratifying anyway. > > Coroutines are not the most general concept, but continuations. My earlier > posts about the subject are linked below. They are pertinent to D as a low-level systems language. > > http://www.digitalmars.com/drn-bin/wwwnews?D/11434 > http://www.digitalmars.com/drn-bin/wwwnews?D/11439 > "Microkernel operating systems should expose continuations, not threads, as the > low-level interface to the kernel. Threads should be provided as a library in > user-space layered on top of the microkernel. Richard Draves' thesis demonstrated that using continuations as a programming technique within the MACH > 3.0 kernel simplified the code and vastly improved performance." > > http://web.access.net.au/felixadv/files/output/book/c2344.html "Continuations ... are a primitive notion that can be used to model any flow of > control, including long-distance transfers such as raising exceptions." > > > Sather's "include" construct is purely syntactical. Develop what opinions you > will from that fact. > > Mark > > --- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.471 / Virus Database: 269 - Release Date: 10/4/2003 |
April 18, 2003 Re: Commentary on D (or "Using Sather as a Model") | ||||
---|---|---|---|---|
| ||||
Posted in reply to Daniel Yokomiso | Fair enough. We were withing days of each other and Sather was the main topic of my post, not just a passing reference. Mark |
Copyright © 1999-2021 by the D Language Foundation