Jump to page: 1 2 3
Thread overview
Re: RFC on range design for D2
Sep 09, 2008
bearophile
Sep 09, 2008
superdan
Sep 10, 2008
Walter Bright
Sep 10, 2008
Walter Bright
Sep 10, 2008
Benji Smith
Sep 09, 2008
Benji Smith
Sep 09, 2008
Benji Smith
Sep 09, 2008
Benji Smith
Sep 10, 2008
bearophile
Sep 10, 2008
bearophile
Sep 10, 2008
bearophile
Sep 10, 2008
superdan
Sep 10, 2008
bearophile
Sep 10, 2008
bearophile
Sep 10, 2008
Sean Kelly
Sep 10, 2008
Sean Kelly
Jan 08, 2009
dsimcha
Jan 08, 2009
bearophile
Jan 09, 2009
jq
September 09, 2008
I don't have enough experience to have a clear view on the whole subject, so I write only some general comments:

1) In time I have seen that everyone that likes/loves language X wants D to become like X. This is true for Andrei too, you clearly like C++ a lot. The worse fault of C++ is excessive complexity, that's the single fault that's killing it, that has pushed people to invent Java, and so on. Walter has clearly tried hard to make D a less complex language. This means that D is sometimes less powerful that C++, and every smart programmer can see that it's less powerful, but having 15% less power is a good deal if you have 1/3 of the complexity. As you may guess I don't like C++ (even if I like its efficiency I'm not going to use it for real coding), so I don't like D to become just a resyntaxed C++. Note that originally D was more like a statically compiled Java, and while that's not perfect, that's probably better than a resyntaxed C++. So I suggest less power, if this decreases a lot what the programmer has to keep in the mind while programming. If you don't believe me take a look at what the blurb of D says:

>D is a systems programming language. Its focus is on combining the power and high performance of C and C++ with the programmer productivity of modern languages like Ruby and Python. Special attention is given to the needs of quality assurance, documentation, management, portability and reliability.<

As you see it's a matter of balance.

2) Keep in mind that not all D programmers are lovers of C++ and they don't all come from C++, some of them are actually lovers of other languages, like Java, C#, Python, Ruby, Lisp, etc. And as you may expect, they may want to push D to look more lisp, Python, C#, etc.

3) opApply has some problems, but it isn't infamous.

4) Syntax matters, and function/ method /attribute names too. So they have to be chosen with care. I suggest to use single words when possible, and to consider "alllowercase" names too (I know this contrasts with D style guide).

4b) Coping function/method/attribute names from C++ isn't bad if people think such names are good. Inventing different names just to be different is not good.

5) The source code of the current algorithm module of D2 is already very complex to follow, it smells of over-generalization here and there. Sometimes it's better to reduce the generality of things, even if that reduces their power a little, to reduce complexity, etc. Tango code too isn't perfect, but it often looks more human. While you have created the algorithm module I too have created something similar, but based on different grounds.

6) Built-in data types are important, they aren't meant to replace a good standard library, where you can find more specialized and more efficient data structures. The built-in data types are meant to:
- offer a very handy syntax, easy to use and remember, short to type too.
- They have to be efficient in a very wide variety of situations, so they must avoid having really bad peformance in a large number of situations, while it's okay for them to be not much fast in any situation.
- They are useful for example when you have little data, in 90% of the code where max performance isn't so important. In the other situations you are supposed able to import things like an IntrusiveRedBlackHashMap from the std lib.

7) Take a look at the lazy "views" of keys/values/items of Python3, how they fit into your view of such ranges. Java too has something similar. (I can give a summary if you want. But in few words if you have an associative array (dict) d.keys() doesn't return an array, but a "view", a lazy iterable that's an object that can be sliced lazily, iterated, etc. This is way more efficient than the .keys/.values of the currenct D implementation).

8) Lot of functions/thinghies in my mostly-functional library are designed to be lazy, that is they generate items on the fly. At the moment D lacks a *lot* such view of the world. Again, take a good look at how Python 3 has shifted from eager to lazy in lot of its style. How do such things fit in your range view? I think they can fit well, but the D language has to shift its phylosophy a little to support such lazy generators/iterators more often and commonly in the built-ins too.

9) I have a long discussion in the main D newsgroup, a partial conclusion was that a struct of 2 items may not be enough for the current implementation of dynamic arrays, because withot a capacity field, the append is dead-slow. I presume this is ortogonal to your range propostal, but I have mentioned it here because you keep talking about dynamic arrays as 2-pointer structs, and that may be changed in the future.

Bye,
bearophile
September 09, 2008
bearophile wrote:
> I don't have enough experience to have a clear view on the whole
> subject, so I write only some general comments:

Uh-oh. That doesn't quite bode well :o).

> 1) In time I have seen that everyone that likes/loves language X
> wants D to become like X. This is true for Andrei too, you clearly
> like C++ a lot.

Stop right there. This is just a presupposition. I think I could say I *know* C++ a lot, which is quite different. But then I know a few other languages quite well, along with the advantages and disadvantages that made them famous.

Maybe I could say I do like the STL. Not the quirks it takes to implement it in C++, but for the fact that it brings clarity and organization in defining fundamental data structures and algorithms. For my money, other collection/algorithms designs don't hold a candle to STL's.

> The worse fault of C++ is excessive complexity,
> that's the single fault that's killing it, that has pushed people to
> invent Java, and so on. Walter has clearly tried hard to make D a
> less complex language. This means that D is sometimes less powerful
> that C++, and every smart programmer can see that it's less powerful,
> but having 15% less power is a good deal if you have 1/3 of the
> complexity. As you may guess I don't like C++ (even if I like its
> efficiency I'm not going to use it for real coding), so I don't like
> D to become just a resyntaxed C++. Note that originally D was more
> like a statically compiled Java, and while that's not perfect, that's
> probably better than a resyntaxed C++. So I suggest less power, if
> this decreases a lot what the programmer has to keep in the mind
> while programming. If you don't believe me take a look at what the
> blurb of D says:
> 
>> D is a systems programming language. Its focus is on combining the
>> power and high performance of C and C++ with the programmer
>> productivity of modern languages like Ruby and Python. Special
>> attention is given to the needs of quality assurance,
>> documentation, management, portability and reliability.<
> 
> As you see it's a matter of balance.

I know it's a matter of balance. I am not sure what gave you the idea that I didn't.

> 2) Keep in mind that not all D programmers are lovers of C++ and they
> don't all come from C++, some of them are actually lovers of other
> languages, like Java, C#, Python, Ruby, Lisp, etc. And as you may
> expect, they may want to push D to look more lisp, Python, C#, etc.

I like many things in all of the languages above. Don't forget it was me who opened Walter to Lisp :o).

> 3) opApply has some problems, but it isn't infamous.

opApply has two problems:

1. It can't save the state of the iteration for later resumption.

2. It uses repeated function calls through a pointer, which is measurably slow.

Both disadvantages are major. The first fosters container design without true iteration. That's just bad design. (Look at built-in hashes.) The second is even worse because it makes foreach a toy useful when you read lines from the console, but not in inner loops. Which is kind of ironic because they are inner *loops*.

I think many would agree that foreach minus the disadvantages would be better.

> 4) Syntax matters, and function/ method /attribute names too. So they
> have to be chosen with care. I suggest to use single words when
> possible, and to consider "alllowercase" names too (I know this
> contrasts with D style guide).

It's very easy to give very general advice (which to be honest your post is replete of). It would help if you followed with a few concrete points.

> 4b) Coping function/method/attribute names from C++ isn't bad if
> people think such names are good. Inventing different names just to
> be different is not good.
> 
> 5) The source code of the current algorithm module of D2 is already
> very complex to follow, it smells of over-generalization here and
> there. Sometimes it's better to reduce the generality of things, even
> if that reduces their power a little, to reduce complexity, etc.
> Tango code too isn't perfect, but it often looks more human. While
> you have created the algorithm module I too have created something
> similar, but based on different grounds.

I am sure you like your library more than mine, because it's your design and your realization.

The code in std.algorithm is complex because it implements algorithms that are complex. I know if someone would look over partition, rotate, topN, or sort, without knowing how they work, they wouldn't have an easy job picking it up. That is fine. The purpose of std.algorithm's implementation is not to be a tutorial on algorithms.

On occasion std.algorithm does a few flip-flops, but always for a good reason. For example it takes some effort to allow multiple functions in reduce. Consider:

double[] a = [ 3.0, 4, 7, 11, 3, 2, 5 ];
// Compute minimum and maximum in one pass
auto r = reduce!(min, max)(double.max, -double.max, a);
// The type of r is Tuple!(double, double)
assert(r._0 == 2);  // minimum
assert(r._1 == 11); // maximum

A simpler reduce would have only allowed one function, so I would have had to write:

auto m = reduce!(min)(double.max, a);
auto M = reduce!(max)(-double.max, a);

On the face of it, this looks reasonable. After all, come one, why can't one write two lines instead of one. However, min and max are so simple functions that the cost of computing them is drown by the cost of looping alone. On a large array, things become onerous so I had to write the loop by hand.

How do I know that? Because I measured. Why did I measure? Because I care. Why do I care? Because the typical runtime of my programs measure in hours of sheer computation, and because things like reduce and others in std.algorithm are in the core loops. And I am not alone.

If I can help it, I wouldn't want to write an algorithms library for a systems-level programming language with fundamental limitations that makes them unsuitable for efficient computation. Costly abstractions are a dime a dozen. It's efficient abstractions that are harder to come across.

If you believe I am wasting time with cutesies in std.algorithm, please let me know of the places and of the suggested improvements.

> 6) Built-in data types are important, they aren't meant to replace a
> good standard library, where you can find more specialized and more
> efficient data structures. The built-in data types are meant to: -
> offer a very handy syntax, easy to use and remember, short to type
> too. - They have to be efficient in a very wide variety of
> situations, so they must avoid having really bad peformance in a
> large number of situations, while it's okay for them to be not much
> fast in any situation. - They are useful for example when you have
> little data, in 90% of the code where max performance isn't so
> important. In the other situations you are supposed able to import
> things like an IntrusiveRedBlackHashMap from the std lib.

I agree.

> 7) Take a look at the lazy "views" of keys/values/items of Python3,
> how they fit into your view of such ranges. Java too has something
> similar. (I can give a summary if you want. But in few words if you
> have an associative array (dict) d.keys() doesn't return an array,
> but a "view", a lazy iterable that's an object that can be sliced
> lazily, iterated, etc. This is way more efficient than the
> .keys/.values of the currenct D implementation).

To quote a classic, Lisp has had them all for 50 years. Well-defined ranges are the perfect stepping stone towards lazy computation. In particular input ranges are really generators. I plan to implement things like lazyMap and lazyReduce, and also generators that are so popular in functional languages.

> 8) Lot of functions/thinghies in my mostly-functional library are
> designed to be lazy, that is they generate items on the fly. At the
> moment D lacks a *lot* such view of the world. Again, take a good
> look at how Python 3 has shifted from eager to lazy in lot of its
> style. How do such things fit in your range view? I think they can
> fit well, but the D language has to shift its phylosophy a little to
> support such lazy generators/iterators more often and commonly in the
> built-ins too.

I agree that D could use more lazy iteration instead of eager computation.

> 9) I have a long discussion in the main D newsgroup, a partial
> conclusion was that a struct of 2 items may not be enough for the
> current implementation of dynamic arrays, because withot a capacity
> field, the append is dead-slow. I presume this is ortogonal to your
> range propostal, but I have mentioned it here because you keep
> talking about dynamic arrays as 2-pointer structs, and that may be
> changed in the future.

I saw that discussion. In my opinion that discussion clarifies why slices shouldn't be conflated with full-fledged containers. Handling storage strategy should not be the job of a slice. That's why there's a need for true containers (including straight block arrays as a particular case). Ranges are perfect for their internal implementation and for iterating them.


Andrei
September 09, 2008
Andrei Alexandrescu Wrote:

> bearophile wrote:
> > I don't have enough experience to have a clear view on the whole subject, so I write only some general comments:
> 
> Uh-oh. That doesn't quite bode well :o).

u could've stopped readin'. general comments without experience are oxpoop. my mailman could give'em.
September 09, 2008
bearophile wrote:
> 6) Built-in data types are important, they aren't meant to replace a good standard library, where you can find more specialized and more efficient data structures. The built-in data types are meant to:
> - offer a very handy syntax, easy to use and remember, short to type too.
> - They have to be efficient in a very wide variety of situations, so they must avoid having really bad peformance in a large number of situations, while it's okay for them to be not much fast in any situation.
> - They are useful for example when you have little data, in 90% of the code where max performance isn't so important. In the other situations you are supposed able to import things like an IntrusiveRedBlackHashMap from the std lib.

I'd also add this:

A built-in data-type can't implement an interface, so there should be one and only one obvious implementation of its interface. For example, it would be a mistake to have a "File" as a built-in type, because a "File" really ought to implement an "InputStream" interface, so that people can write stream-consumer code generically.

It's one of the reasons I think dynamic arrays and associate arrays belong in the standard library (as List and Map interfaces) rather than as built-in types.

On a separate note...

I'm also a little skeptical about the range proposal (though I'll have to give it more thought any maybe play with an implementation before I can come to any firm conclusion). The proposal seems to cover the standard cases, of bounded and unbounded forward iteration, reverse iteration, etc.

But I can think of a few examples where the concept of "iteration" needs to be even more generalized.

For example, I worked on a project last year (in Java) that used a hidden markov model to navigate through the nodes of a graph, driving a montel carlo simulation.

I designed the MarkovModel<T> class to implement the Collection<T> interface so that I could use foreach iteration to crawl through the nodes of the graph. It basically looked like this:

    MarkovModel<SimulationState> model = ...;

    for (SimulationState state : model) {
       state.doStuff();
    }

The cool thing about this is that the simulation would continue running until it reached a natural termination point (a simulation state with no outbound state transition probabilities).

Depending upon the particulars of the model, it might never end. And although the ordering of the nodes was significant, it was never deterministic. Iterating through the states of a markov model is essentially a probabilistically guided random-walk through the elements of the collection.

For me, java iterators made a very natural design choice, since the iterator is such a dirt-simple interface (with just a "hasNext" and "next" method).

How would the D range proposal address the sorts of problems that require non-deterministic iteration? To me, the metaphor of a "range" is nice, but it doesn't cover all the cases I'd like to see in a general-purpose iteration metaphor.

Thanks!

--benji
September 09, 2008
bearophile wrote:
> 5) The source code of the current algorithm module of D2 is already very complex to follow, it smells of over-generalization here and there. Sometimes it's better to reduce the generality of things, even if that reduces their power a little, to reduce complexity, etc. Tango code too isn't perfect, but it often looks more human. While you have created the algorithm module I too have created something similar, but based on different grounds.

Along these same lines, while D is still young, the documentation is often thin, and code examples are scarce.

I've been doing a lot of programming lately with Tango, and despite the growing documentation, I've had to refer directly to the Tango source code on more occasions than I can count. Luckily, the Tango sources are well-written and pretty easy to read and understand, and I've had very few problems figuring out how to use the library.

I hope the range implementation makes readability a high priority.

--benji
September 09, 2008
Benji Smith wrote:
> bearophile wrote:
>> 5) The source code of the current algorithm module of D2 is already very complex to follow, it smells of over-generalization here and there. Sometimes it's better to reduce the generality of things, even if that reduces their power a little, to reduce complexity, etc. Tango code too isn't perfect, but it often looks more human. While you have created the algorithm module I too have created something similar, but based on different grounds.
> 
> Along these same lines, while D is still young, the documentation is often thin, and code examples are scarce.
> 
> I've been doing a lot of programming lately with Tango, and despite the growing documentation, I've had to refer directly to the Tango source code on more occasions than I can count. Luckily, the Tango sources are well-written and pretty easy to read and understand, and I've had very few problems figuring out how to use the library.
> 
> I hope the range implementation makes readability a high priority.

This is a valid concern. The sample ranges I have coded so far look deceptively simple, and that's a good sign. It only takes a dozen lines to write an interesting range. (This is very unlike STL iterators.)

Andrei
September 09, 2008
Benji Smith wrote:
> bearophile wrote:
>> 6) Built-in data types are important, they aren't meant to replace a good standard library, where you can find more specialized and more efficient data structures. The built-in data types are meant to:
>> - offer a very handy syntax, easy to use and remember, short to type too.
>> - They have to be efficient in a very wide variety of situations, so they must avoid having really bad peformance in a large number of situations, while it's okay for them to be not much fast in any situation.
>> - They are useful for example when you have little data, in 90% of the code where max performance isn't so important. In the other situations you are supposed able to import things like an IntrusiveRedBlackHashMap from the std lib.
> 
> I'd also add this:
> 
> A built-in data-type can't implement an interface, so there should be one and only one obvious implementation of its interface. For example, it would be a mistake to have a "File" as a built-in type, because a "File" really ought to implement an "InputStream" interface, so that people can write stream-consumer code generically.
> 
> It's one of the reasons I think dynamic arrays and associate arrays belong in the standard library (as List and Map interfaces) rather than as built-in types.
> 
> On a separate note...
> 
> I'm also a little skeptical about the range proposal (though I'll have to give it more thought any maybe play with an implementation before I can come to any firm conclusion). The proposal seems to cover the standard cases, of bounded and unbounded forward iteration, reverse iteration, etc.
> 
> But I can think of a few examples where the concept of "iteration" needs to be even more generalized.
> 
> For example, I worked on a project last year (in Java) that used a hidden markov model to navigate through the nodes of a graph, driving a montel carlo simulation.
> 
> I designed the MarkovModel<T> class to implement the Collection<T> interface so that I could use foreach iteration to crawl through the nodes of the graph. It basically looked like this:
> 
>     MarkovModel<SimulationState> model = ...;
> 
>     for (SimulationState state : model) {
>        state.doStuff();
>     }
> 
> The cool thing about this is that the simulation would continue running until it reached a natural termination point (a simulation state with no outbound state transition probabilities).
> 
> Depending upon the particulars of the model, it might never end. And although the ordering of the nodes was significant, it was never deterministic. Iterating through the states of a markov model is essentially a probabilistically guided random-walk through the elements of the collection.
> 
> For me, java iterators made a very natural design choice, since the iterator is such a dirt-simple interface (with just a "hasNext" and "next" method).
> 
> How would the D range proposal address the sorts of problems that require non-deterministic iteration? To me, the metaphor of a "range" is nice, but it doesn't cover all the cases I'd like to see in a general-purpose iteration metaphor.

Hmm, HMMs :o). If you could do it with Java's hasNext and next, you can do it with D's isEmpty and next. There's no difference.


Andrei
September 09, 2008
Andrei Alexandrescu wrote:
> Hmm, HMMs :o). If you could do it with Java's hasNext and next, you can do it with D's isEmpty and next. There's no difference.
> 
> 
> Andrei

Oh. Okay. Good to know :)

I guess all the talk about "ranges" has me visualizing contiguous items and sequential ordering and determinism. It's a good word for a well-defined set of items with a true beginning and end, but I wonder whether "cursor" might be a better word than "range", especially for input consumption and non-deterministic iteration.

One thing I definitely like better about the new range proposal is the notion that the container is not responsible for providing iteration logic or, more importantly, maintaining the state of any iteration. I think it'll be a welcome change.

Nice work, and I'm looking forward to working with the new stuff :)

--benji
September 10, 2008
superdan wrote:
> u could've stopped readin'. general comments without experience are
> oxpoop. my mailman could give'em.

The thing about iterators and collections is that they look so simple, but getting the right design is fiendishly difficult.
September 10, 2008
Walter Bright wrote:
> superdan wrote:
>> u could've stopped readin'. general comments without experience are
>> oxpoop. my mailman could give'em.
> 
> The thing about iterators and collections is that they look so simple, but getting the right design is fiendishly difficult.

And when a correct design is devised, the mark of genius is everyone will think it in retrospect to be simple and obvious!
« First   ‹ Prev
1 2 3