Jump to page: 1 27  
Page
Thread overview
Yet another strike against the current AA implementation
Apr 26, 2009
dsimcha
Apr 26, 2009
Michel Fortin
Apr 26, 2009
dsimcha
Apr 26, 2009
Michel Fortin
Iterables (Was: Yet another strike against the current AA implementation)
Apr 26, 2009
dsimcha
Apr 27, 2009
Christopher Wright
Apr 27, 2009
dsimcha
Apr 27, 2009
Robert Fraser
Apr 27, 2009
dsimcha
Apr 27, 2009
Michel Fortin
Apr 27, 2009
Daniel Keep
Apr 27, 2009
bearophile
Apr 27, 2009
Frits van Bommel
Apr 27, 2009
bearophile
Apr 27, 2009
bearophile
Apr 27, 2009
Frits van Bommel
Apr 28, 2009
Michel Fortin
Apr 28, 2009
Daniel Keep
Apr 28, 2009
Michel Fortin
Apr 28, 2009
downs
Apr 28, 2009
grauzone
Apr 28, 2009
Joel C. Salomon
Apr 28, 2009
downs
Apr 29, 2009
Georg Wrede
Apr 29, 2009
Georg Wrede
Apr 29, 2009
Joel C. Salomon
Apr 29, 2009
bearophile
Apr 29, 2009
Daniel Keep
Apr 29, 2009
Leandro Lucarella
Apr 29, 2009
Sean Kelly
Apr 29, 2009
Georg Wrede
Apr 29, 2009
Georg Wrede
Apr 29, 2009
Don
Apr 30, 2009
Georg Wrede
Apr 30, 2009
Georg Wrede
Apr 30, 2009
Georg Wrede
Apr 30, 2009
bearophile
Apr 30, 2009
Georg Wrede
Apr 29, 2009
Sean Kelly
Apr 29, 2009
Georg Wrede
Apr 29, 2009
Walter Bright
Phobos2: zip, integral ranges, map, Any, All, Map
Apr 28, 2009
bearophile
Apr 28, 2009
dsimcha
Apr 28, 2009
bearophile
Apr 26, 2009
Steve Teale
Apr 26, 2009
dsimcha
Apr 26, 2009
Leandro Lucarella
Apr 27, 2009
Georg Wrede
Apr 27, 2009
Derek Parnell
Apr 27, 2009
bearophile
Apr 27, 2009
Simen Kjaeraas
Apr 27, 2009
Christopher Wright
Apr 27, 2009
Denis Koroskin
April 26, 2009
So I tried to create a struct to allow iteration over the keys or values of a builtin associative array using a lazy forward range.  This would allow one to pass the keys or values of an AA to any function that expects a regular old forward range w/o any heap activity, eager copying, etc. and with O(1) auxiliary memory usage.  Now that ranges are the lingua franca of iteration in D, this seems like a pretty necessary feature.

The problem is that D's current AAs are based on a hash table with collision resolution done using binary trees.  I guess this decision was made to allow for O(log N) worst case performance even when there are ridiculous amounts of hash collisions.  However, this is a very bad decision because, in exchange for more bulletproof performance in corner cases, you lose:

1.  Every element is now (void*).sizeof bytes larger because you have to store
a left and right pointer.
2.  You can't iterate over a binary tree in O(1) auxiliary space, at least not
in any way I'm aware of.  This means that a lazy range to iterate over the
keys or values is relatively useless because it would generate heap activity
anyhow to store the stack or queue necessary to iterate over the tree, so you
may as well just use the current .keys or .values properties, which just
return an array.  (Because of the way that ranges are designed, you can't just
use the call stack, unless you use fibers or something, which is obviously not
a win from an efficiency point of view.)

Both of these are important in the more common, general case of sane hashing performance.
April 26, 2009
On 2009-04-26 11:21:54 -0400, dsimcha <dsimcha@yahoo.com> said:

> 2.  You can't iterate over a binary tree in O(1) auxiliary space, at least not
> in any way I'm aware of.  This means that a lazy range to iterate over the
> keys or values is relatively useless because it would generate heap activity
> anyhow to store the stack or queue necessary to iterate over the tree, so you
> may as well just use the current .keys or .values properties, which just
> return an array.  (Because of the way that ranges are designed, you can't just
> use the call stack, unless you use fibers or something, which is obviously not
> a win from an efficiency point of view.)

Hum, are you saying opApply superior when it comes to iterating in a tree? It seems that with opApply you could use the stack by recursively calling opApply with the given delegate on each one of the branches.

-- 
Michel Fortin
michel.fortin@michelf.com
http://michelf.com/

April 26, 2009
== Quote from Michel Fortin (michel.fortin@michelf.com)'s article
> On 2009-04-26 11:21:54 -0400, dsimcha <dsimcha@yahoo.com> said:
> > 2.  You can't iterate over a binary tree in O(1) auxiliary space, at least not in any way I'm aware of.  This means that a lazy range to iterate over the keys or values is relatively useless because it would generate heap activity anyhow to store the stack or queue necessary to iterate over the tree, so you may as well just use the current .keys or .values properties, which just return an array.  (Because of the way that ranges are designed, you can't just use the call stack, unless you use fibers or something, which is obviously not a win from an efficiency point of view.)
> Hum, are you saying opApply superior when it comes to iterating in a tree? It seems that with opApply you could use the stack by recursively calling opApply with the given delegate on each one of the branches.

Exactly.  On the other hand, you lose a lot of flexibility with opApply.  If you tried to port most of std.range to operate on the opApply interface instead fo the forward range interface, I doubt you'd get very far.

IMHO, though, opApply should *not* be deprecated.  opApply and ranges attempt to solve similar problems, but not the same problem.  Ranges attempt to solve the problem of iterating over some object with maximum flexibility from the point of view of the user of the object.  opApply attempts to solve the problem of iterating with maximum flexibility from the point of view of the implementer of the object.  In some cases, like the one you just described, the latter can be better.  However, it might make sense to document and give examples of this tradeoff somewhere.
April 26, 2009
dsimcha Wrote:

> So I tried to create a struct to allow iteration over the keys or values of a builtin associative array using a lazy forward range.  This would allow one to pass the keys or values of an AA to any function that expects a regular old forward range w/o any heap activity, eager copying, etc. and with O(1) auxiliary memory usage.  Now that ranges are the lingua franca of iteration in D, this seems like a pretty necessary feature.
> 
> The problem is that D's current AAs are based on a hash table with collision resolution done using binary trees.  I guess this decision was made to allow for O(log N) worst case performance even when there are ridiculous amounts of hash collisions.  However, this is a very bad decision because, in exchange for more bulletproof performance in corner cases, you lose:
> 
> 1.  Every element is now (void*).sizeof bytes larger because you have to store
> a left and right pointer.
> 2.  You can't iterate over a binary tree in O(1) auxiliary space, at least not
> in any way I'm aware of.  This means that a lazy range to iterate over the
> keys or values is relatively useless because it would generate heap activity
> anyhow to store the stack or queue necessary to iterate over the tree, so you
> may as well just use the current .keys or .values properties, which just
> return an array.  (Because of the way that ranges are designed, you can't just
> use the call stack, unless you use fibers or something, which is obviously not
> a win from an efficiency point of view.)
> 
> Both of these are important in the more common, general case of sane hashing performance.

Ranges, ranges! That's all I hear these days, and it looks to me like the continuing advance of D toward being a complete meta-language.

Where do I see ranges described in terms that an old hand can understand?

I'm constantly having to roll my own in many areas when I see how the meta stuff is implemented - like x ~= c to add a character to the end of an array - reallocation every time?

I thought D was supposed to be a systems programming language, not something that was guaranteed to win the universe obfuscated code  competition.

I've been trying to keep my projects up to date and compilable with D2.0xx, but I think I'm going to give up on that and rewrite them for whatever the current version of D1 is.

I seriously think that the crew who are driving the development should realize that not all computer programmers are going to have an IQ of 140, and that any practical language should be comprehensible to a majority of its users.

Maybe the problem I'm complaining about is just a lack of documentation. Generating said from comments really does not hack it - comments are always skimped, and usually lie.

Before something like Ranges are implemented, there should be some sort of RFC process where they are properly described rather than a reliance on D users to have read every thread of the newsgroup, and remembered it all.


April 26, 2009
== Quote from Steve Teale (steve.teale@britseyeview.com)'s article
> dsimcha Wrote:
> > So I tried to create a struct to allow iteration over the keys or values of a builtin associative array using a lazy forward range.  This would allow one to pass the keys or values of an AA to any function that expects a regular old forward range w/o any heap activity, eager copying, etc. and with O(1) auxiliary memory usage.  Now that ranges are the lingua franca of iteration in D, this seems like a pretty necessary feature.
> >
> > The problem is that D's current AAs are based on a hash table with collision resolution done using binary trees.  I guess this decision was made to allow for O(log N) worst case performance even when there are ridiculous amounts of hash collisions.  However, this is a very bad decision because, in exchange for more bulletproof performance in corner cases, you lose:
> >
> > 1.  Every element is now (void*).sizeof bytes larger because you have to store
> > a left and right pointer.
> > 2.  You can't iterate over a binary tree in O(1) auxiliary space, at least not
> > in any way I'm aware of.  This means that a lazy range to iterate over the
> > keys or values is relatively useless because it would generate heap activity
> > anyhow to store the stack or queue necessary to iterate over the tree, so you
> > may as well just use the current .keys or .values properties, which just
> > return an array.  (Because of the way that ranges are designed, you can't just
> > use the call stack, unless you use fibers or something, which is obviously not
> > a win from an efficiency point of view.)
> >
> > Both of these are important in the more common, general case of sane hashing performance.
> Ranges, ranges! That's all I hear these days, and it looks to me like the
continuing advance of D toward being a complete meta-language.
> Where do I see ranges described in terms that an old hand can understand?
> I'm constantly having to roll my own in many areas when I see how the meta stuff
is implemented - like x ~= c to add a character to the end of an array - reallocation every time?
> I thought D was supposed to be a systems programming language, not something
that was guaranteed to win the universe obfuscated code  competition.
> I've been trying to keep my projects up to date and compilable with D2.0xx, but
I think I'm going to give up on that and rewrite them for whatever the current version of D1 is.
> I seriously think that the crew who are driving the development should realize
that not all computer programmers are going to have an IQ of 140, and that any practical language should be comprehensible to a majority of its users.
> Maybe the problem I'm complaining about is just a lack of documentation.
Generating said from comments really does not hack it - comments are always skimped, and usually lie.
> Before something like Ranges are implemented, there should be some sort of RFC
process where they are properly described rather than a reliance on D users to have read every thread of the newsgroup, and remembered it all.

But there was an RFC, and it was even called that.  It just happened to take place on this newsgroup.  Also, keep in mind that D2 is still in alpha.  Worrying about how to explain this stuff to people who aren't heavily involved with D at this point would slow down the pace of evolution and generate more confusion.  The time to do this is when the dust has settled a little.
April 26, 2009
On 2009-04-26 11:46:51 -0400, dsimcha <dsimcha@yahoo.com> said:

> == Quote from Michel Fortin (michel.fortin@michelf.com)'s article
>> Hum, are you saying opApply superior when it comes to iterating in a
>> tree? It seems that with opApply you could use the stack by recursively
>> calling opApply with the given delegate on each one of the branches.
> 
> Exactly.  On the other hand, you lose a lot of flexibility with opApply.  If you
> tried to port most of std.range to operate on the opApply interface instead fo the
> forward range interface, I doubt you'd get very far.
> 
> IMHO, though, opApply should *not* be deprecated.  opApply and ranges attempt to
> solve similar problems, but not the same problem.  Ranges attempt to solve the
> problem of iterating over some object with maximum flexibility from the point of
> view of the user of the object.  opApply attempts to solve the problem of
> iterating with maximum flexibility from the point of view of the implementer of
> the object.  In some cases, like the one you just described, the latter can be
> better.

Indeed. I certainly agree that both ranges and opApply have their place.

So what the implementer can do with opApply is write an optimized iteration algorithm for use with foreach. Which may mean that when both opApply and ranges are available for generating foreach's code, the compiler should favor opApply. Currently, I believe it's the reverse.


-- 
Michel Fortin
michel.fortin@michelf.com
http://michelf.com/

April 26, 2009
== Quote from Michel Fortin (michel.fortin@michelf.com)'s article
> Indeed. I certainly agree that both ranges and opApply have their place. So what the implementer can do with opApply is write an optimized iteration algorithm for use with foreach. Which may mean that when both opApply and ranges are available for generating foreach's code, the compiler should favor opApply. Currently, I believe it's the reverse.

You probably have a point.  If the range interface is "good enough" for the implementer of the object to write an optimal implementation without defining opApply, then the implementer will probably not define opApply.  On the other hand, it could make sense to define a range interface for when the user of the object needs that flexibility and an opApply interface that is more efficient internally in the same object.

What I think it really boils down to, and what should be emphasized in the documentation, is control of the stack.  If the implementer of the object needs control of the stack during iteration, use opApply to get this control. Otherwise, use ranges to allow the user more control over the iteration process.

However, what I care more about, upon thinking more about it, is that the concept of an iterable gets defined "officially" in Phobos and in the documentation.  An iterable is any object of type T such that the following code works:

foreach(elem; T.init) { // do stuff. }

This can be considered a superset of input ranges, since all input ranges are iterables but not all iterables are input ranges.  Of course, stuff that works like:

foreach(key, value; T.init) { // do stuff. }

uses opApply but would not be considered an iterable because it iterates more than one item.  Therefore, strictly speaking, the model is broken, but these kinds of oddball situations are few enough and far enough between that I think they can be ignored, at least until the issue of making ranges do this kind of thing too is addressed.

The idea, then, is that if all you need is an iterable for some generic function in Phobos or in any 3rd party library, make the constraint be that it only requires an iterable.  To encourage this, an isIterable template should be included in Phobos, and std.range.ElementType should be generalized to return the element type for all iterables, not just ranges.
April 26, 2009
Steve Teale, el 26 de abril a las 14:39 me escribiste:
> Before something like Ranges are implemented, there should be some sort of RFC process where they are properly described rather than a reliance on D users to have read every thread of the newsgroup, and remembered it all.

A PEP[1]-like process would be great.

[1] http://www.python.org/dev/peps/

-- 
Leandro Lucarella (luca) | Blog colectivo: http://www.mazziblog.com.ar/blog/
----------------------------------------------------------------------------
GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145  104C 949E BFB6 5F5A 8D05)
----------------------------------------------------------------------------
<o_O> parakenotengobarraespaciadora
<o_O> aver
<o_O> estoyarreglandolabarraporkeserompiounapatita
April 27, 2009
dsimcha wrote:
> == Quote from Michel Fortin (michel.fortin@michelf.com)'s article
>> On 2009-04-26 11:21:54 -0400, dsimcha <dsimcha@yahoo.com> said:
>>> 2.  You can't iterate over a binary tree in O(1) auxiliary space, at least not
>>> in any way I'm aware of.  This means that a lazy range to iterate over the
>>> keys or values is relatively useless because it would generate heap activity
>>> anyhow to store the stack or queue necessary to iterate over the tree, so you
>>> may as well just use the current .keys or .values properties, which just
>>> return an array.  (Because of the way that ranges are designed, you can't just
>>> use the call stack, unless you use fibers or something, which is obviously not
>>> a win from an efficiency point of view.)
>> Hum, are you saying opApply superior when it comes to iterating in a
>> tree? It seems that with opApply you could use the stack by recursively
>> calling opApply with the given delegate on each one of the branches.
> 
> Exactly.  On the other hand, you lose a lot of flexibility with opApply.  If you
> tried to port most of std.range to operate on the opApply interface instead fo the
> forward range interface, I doubt you'd get very far.
> 
> IMHO, though, opApply should *not* be deprecated.  opApply and ranges attempt to
> solve similar problems, but not the same problem.  Ranges attempt to solve the
> problem of iterating over some object with maximum flexibility from the point of
> view of the user of the object.  opApply attempts to solve the problem of
> iterating with maximum flexibility from the point of view of the implementer of
> the object.  In some cases, like the one you just described, the latter can be
> better.  However, it might make sense to document and give examples of this
> tradeoff somewhere.

1. Depth of iteration is low, so a short vector optimization will most of the time solve the problem without allocating extra memory.

2. I haven't measured, but the cost of the indirect call is large enough to make me suspect that opApply is not as efficient as it's cracked to be, even when compared with an iterator.


Andrei
April 27, 2009
Steve Teale wrote:
> Before something like Ranges are implemented, there should be some
> sort of RFC process where they are properly described rather than a
> reliance on D users to have read every thread of the newsgroup, and
> remembered it all.

There was. Incidentally, it was called "RFC on range design for D2".

I understand your frustration, but if you think for a minute, you realize your comments are uncalled for. We have discussed ranges at length and the community has had a long time to establish semantics and even names. I have (early, often, and repeatedly) warned the community that there will be a Phobos update that is bound to break a lot of code. I actively tried to massage several breaking changes into one single release so as to sweeten the pill. That all took a lot of time and thought from several people (Walter, Sean, and myself) who have better things to do. Now it would be great if you put yourself in our shoes, read your comments again, and tell me how they reflect on the writer.

You mention you want to see ranges described in terms that an old hand can understand. That's great. But that one good request was marred by comments that show you are more interested in reaffirming your prejudice, than in overcoming it.


Andrei
« First   ‹ Prev
1 2 3 4 5 6 7