| Thread overview | ||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
March 04, 2010 Container hierarchy vs. container types | ||||
|---|---|---|---|---|
| ||||
Hi everyone, I'm on a trip to Romania visiting relatives. Being disconnected for a while, I thought some about container types and started writing some code. It didn't take long to figure that a classic class hierarchy with inheritance etc. would contain prohibitively many types (with awkward names too), or would lose a lot of expressiveness. There are just so many types of containers, and they have so different characteristics. My second shot was with a capability-based architecture suggested (in an unrelated context) by Scott Meyers (http://www.artima.com/cppsource/codefeatures.html - good article, recommended regardless). Some capabilities I came up with were: enum ContCaps : uint { hasLength = 1, hasAssignableLength = 2, hasPushBack = 4, hasRemoveBack = 8, hasPushFront = 16, hasRemoveFront = 32, linear = 64, hasFront = 128, hasBack = 256, randomAccess = 512, } A container would look like this: interface Container(T, uint caps = 0) ... The trick is that any container could contain more or less any free combination of capabilities. There may be a few combinations that don't quite make sense, but most do. The capability-based design prescribes that a container with N capabilities inherits each and every container with N-1 capabilities obtained by chopping away one capability in turn. For example, a container with Container!(int, 19) would inherit Container!(int, 18), Container!(int, 17), and Container!(int, 3). That was a bit difficult to implement, but I pulled it off with a string mixin. The problem is that the number of effective base interfaces grows exponentially with the number of capabilities. (That's a problem also experienced by Scott.) If I instantiated a Container with all capabilities, the compiler would hang forever. If I had 6-7 capabilities, the code would compile after a while but would generate interfaces that have size 140KB-260KB excluding payload, which is prohibitive. The joys of inheritance hierarchies! The design doesn't even address an important aspect that I'm still mulling over - how do I model whether the container exposes lvalues or is completely encapsulated? For example a trie is unable to expose the lvalues of the arrays it holds because it distributes storage. But wait, there's more. Containers must expose ranges which face similar issues. Here are my incomplete range capabilities: enum RangeCaps { rvalueElem = 1, assignableElem = 2, swappableElem = 4, infinite = 8, noLength = 16, } These capabilities are orthogonal on the fundamental range kinds (input, forward, double-ended, random). So we're looking at not one thick hierarchy, but two! I decided to let no assumption unchallenged and got back to the question: do we really need a container _hierarchy_? How much joy and usefulness do people derive from dynamically changing an array with a list under the same interface? Or a rb-tree vs. a binary tree vs. a left-leaning binary tree? As far as I can tell such changes of containers are a design-time decision, and extremely rarely (never as far as I remember) a runtime decision to hide behind a binary interface. I'm starting to think that it's quite the other way around, and a Sapir-Whorf kind of thing: The fact that Java lacks symbol aliasing capability conditions the thinking that a hierarchy is the right thing to do, whereas the true solution is to design a flat federation of container types with name- and semantics-compatible primitives, to then use them directly or via design-time aliases. Obviously I might be wrong in any number of ways. Switching containers dynamically might be very useful. The precision, flexibility, and beauty I'm aiming for may be useless. Container hierarchies may have some other awesomeness I'm failing to see. But for the time being I'm very seriously considering a simple design: * Each specific container is an unconnected class (or struct, haven't decided yet - probably structs to accommodate reference-counted containers); * Naming matches dictate semantic matches. There should be no false friends - if a container defines removeFront, that should mean the same thing as for other containers (i.e. O(1) removal of the first element). This design follows and enriches STL's design even after we have, as a community, seen many hierarchy-based designs, and even though it looks like hierarchies have "won". The main improvements I want to bring to STL's design are (a) a better categorization of containers, (b) of course replace iterators with ranges, (c) either eliminate allocators or make them work. Needless to say, I'd be very curious to hear other opinions in this matter. Please speak up - the future of D depends, in small part, on this too. Andrei | ||||
March 05, 2010 Re: Container hierarchy vs. container types | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | Andrei Alexandrescu: > As far as I can tell such changes of containers are a design-time decision, and extremely rarely (never as far as I remember) a runtime decision to hide behind a binary interface. In your post there's lot of stuff to think about. I can see you are trying to innovate, so here's a comment about data structures of the future. In Gcc 4.5 they have added a profile mode: http://gcc.gnu.org/onlinedocs/libstdc++/manual/profile_mode.html It gives you simple suggestions, based on simple statistics collected at run-time, about data structures choice. Today data structures are mostly chosen at compile time by the programmer, but once the runtime has those statistics collected at run-time, you can think of feeding that data into the program itself at runtime, so data structures can adapt themselves. This can be useful for programs that run on servers, for minutes, hours or days, where they have to face different workloads as time passes. Bye, bearophile | |||
March 05, 2010 Re: Container hierarchy vs. container types | ||||
|---|---|---|---|---|
| ||||
Posted in reply to bearophile | bearophile wrote:
> Andrei Alexandrescu:
>> As far as I can tell such changes of containers are a design-time decision, and extremely rarely (never as far as I remember) a runtime decision to hide behind a binary interface.
>
> In your post there's lot of stuff to think about. I can see you are trying to innovate, so here's a comment about data structures of the future. In Gcc 4.5 they have added a profile mode:
>
> http://gcc.gnu.org/onlinedocs/libstdc++/manual/profile_mode.html
> It gives you simple suggestions, based on simple statistics collected at run-time, about data structures choice.
>
> Today data structures are mostly chosen at compile time by the programmer, but once the runtime has those statistics collected at run-time, you can think of feeding that data into the program itself at runtime, so data structures can adapt themselves. This can be useful for programs that run on servers, for minutes, hours or days, where they have to face different workloads as time passes.
>
> Bye,
> bearophile
Good point, perfect. I agree with that, and I meant to massage that point within the message. A structure that's an e.g. set that adapts its strategy dynamically depending on load is a perfect example. I still don't think that's the job of a hierarchy - it's the encapsulation of an algebraic type together with a strategy for choosing the type currently used.
Apologies for the dense message. I wanted to concentrate as much info in as little time/space as possible.
Andrei
| |||
March 05, 2010 Re: Container hierarchy vs. container types | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | On Thu, 04 Mar 2010 18:22:55 -0500, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote: [snip] I like this idea. I've gotten used to using template constraints and ranges when coding which matches this strategy naturally. A lot of the capabilities seem to match ranges, but why did you use different names for some of them? removeBack : popBack removeFront: popFront pushBack : put pushFront : putFront? | |||
March 05, 2010 Re: Container hierarchy vs. container types | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | /*Me ends lurking*/ As far as allocators go, you may want to take a look at how eastl handles them: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2271.html#eastl_allocator /*Resumes Lurking*/ Andrei Alexandrescu Wrote: > bearophile wrote: > > Andrei Alexandrescu: > >> As far as I can tell such changes of containers are a design-time decision, and extremely rarely (never as far as I remember) a runtime decision to hide behind a binary interface. > > > > In your post there's lot of stuff to think about. I can see you are trying to innovate, so here's a comment about data structures of the future. In Gcc 4.5 they have added a profile mode: > > > > http://gcc.gnu.org/onlinedocs/libstdc++/manual/profile_mode.html > > It gives you simple suggestions, based on simple statistics collected at run-time, about data structures choice. > > > > Today data structures are mostly chosen at compile time by the programmer, but once the runtime has those statistics collected at run-time, you can think of feeding that data into the program itself at runtime, so data structures can adapt themselves. This can be useful for programs that run on servers, for minutes, hours or days, where they have to face different workloads as time passes. > > > > Bye, > > bearophile > > Good point, perfect. I agree with that, and I meant to massage that point within the message. A structure that's an e.g. set that adapts its strategy dynamically depending on load is a perfect example. I still don't think that's the job of a hierarchy - it's the encapsulation of an algebraic type together with a strategy for choosing the type currently used. > > Apologies for the dense message. I wanted to concentrate as much info in as little time/space as possible. > > > Andrei | |||
March 05, 2010 Re: Container hierarchy vs. container types | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | Andrei Alexandrescu Wrote: > I decided to let no assumption unchallenged and got back to the question: do we really need a container _hierarchy_? How much joy and usefulness do people derive from dynamically changing an array with a list under the same interface? Or a rb-tree vs. a binary tree vs. a left-leaning binary tree? > As far as I can tell such changes of containers are a design-time decision, and extremely rarely (never as far as I remember) a runtime decision to hide behind a binary interface. I'm starting to think that it's quite the other way around, and a Sapir-Whorf kind of thing: The fact that Java lacks symbol aliasing capability conditions the thinking that a hierarchy is the right thing to do, whereas the true solution is to design a flat federation of container types with name- and semantics-compatible primitives, to then use them directly or via design-time aliases. I've rarely found it helpful that an ArrayList is a List, or that HashMap is a Map. If an API I'm passing to takes the base class, it could easily have been done as a template. And dynamic switching hasn't ever been useful to me. As long as the API is the same, recompiling will work and the containers will continue to behave well. If dynamic switching is really necessary for an application, it can be done by wrapping the container with a class hierarchy as needed. > Obviously I might be wrong in any number of ways. Switching containers dynamically might be very useful. The precision, flexibility, and beauty I'm aiming for may be useless. Container hierarchies may have some other awesomeness I'm failing to see. But for the time being I'm very seriously considering a simple design: > > * Each specific container is an unconnected class (or struct, haven't decided yet - probably structs to accommodate reference-counted containers); Also, structs would reduce the indirections required to access the data, when placing them inside other aggregates as well as reduce memory allocation traffic. I'd rather not pay for 2 allocations to create an instance of: class C { vector!(int) v; } > * Naming matches dictate semantic matches. There should be no false friends - if a container defines removeFront, that should mean the same thing as for other containers (i.e. O(1) removal of the first element). This I would hesitate a bit on. I recently changed STL maps for custom hashmaps and not changing operator[] for different code is useful. I'm changing the computational complexity of an operation but conceptually it's the same kind of access. I'm doing so because I know that for my application, the tradeoffs elsewhere are acceptable. > This design follows and enriches STL's design even after we have, as a community, seen many hierarchy-based designs, and even though it looks like hierarchies have "won". The main improvements I want to bring to STL's design are (a) a better categorization of containers, (b) of course replace iterators with ranges, (c) either eliminate allocators or make them work. > > Needless to say, I'd be very curious to hear other opinions in this matter. Please speak up - the future of D depends, in small part, on this too. The concept of allocators is nice in principal, though I've never had to put them into action. A couple of places I could see having used STL allocators: wrapping JNI memory or GC memory in an STL container. Neither is particularly necessary in D. Another place I could envision an allocator is if I load a large static data structure with mmap and want to wrap an STL structure around it. I've wanted to do this in particular with strings. This is where C++ allocators fall down at least. Trying to mix custom allocated data structures with standard ones is going to be painful. Jerry | |||
March 05, 2010 Re: Container hierarchy vs. container types | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | I'd definitely go with the unconnected class solution. Sometimes it's useful to be able to have a class hierarchy where you don't have to care whether about things like whether a Map is a HashMap or a SortedMap, but it frequently comes back to bite you - especially when they require that different operations be implemented by the elements they hold. It can easily lead to an inefficient use of containers - especially after changing one type for another. Additionally, from a performance perspective, by avoiding the class hierarchy, it's possible for them to be final and allow inlining and such. So, there are performance gains by avoiding the class hiercharchy. Also, I recall Scott Meyers talking in Effective STL how containers aren't truly interchangeable and how you can get yourself into trouble when you try and swap one out for another. You really should be aware of what container type your using and what it's strengths and weaknesses are. What it generally comes down to is that you want containers to have the same functions when they do the same thing, so that it's possible to swap one for another when you want to. The same API is generally enough for that. The fact that D has auto makes it that much easier. Being able to swap out vector and list or ArrayList and LinkedList can be useful in some circumstances, but having a List which could be either (or yet another unknown type) is not all that useful, generally speaking. And, as I said before, it makes it really easy to use such containers in inefficient ways since the performance of each operation of each operation can differ significantly depending on the type. It also leads to using only the functions which are the lowest common denominator, which means that the containers are not used to their full potential and are probably being used less efficiently. If you have a set of unconnected container classes and someone wants a hierarchy of containers for their project, it would be quite simple to set up a set of wrapper classes for the containers that they want to be in a hierarchy. On the other hand, you can't take a class hierarchy and use a wrapper class to turn it into a flat set of unconnected containers. I say that it makes sense for functions with the same behavior to have the same names, so that you can effectively duck type the containers, but I don't see much benefit in putting them in a hierarchy where you supposedly don't have to care about what the actual container is that you're using as long as it implements a particular interface. In any case, I'm glad to see that you're finally working on the container problem. I think that it's easily the thing that phobos lacks most, and it would definitely be easier to get people to use D, if there were containers. Most programmers I know would probably not touch D as long as there are no standard containers. It'll be good once we have them. And hopefully, they'll be awesome. - Jonathan M Davis | |||
March 05, 2010 Re: Container hierarchy vs. container types | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | == Quote from Andrei Alexandrescu (SeeWebsiteForEmail@erdani.org)'s article > Hi everyone, > I'm on a trip to Romania visiting relatives. Being disconnected for a > while, I thought some about container types and started writing some code. > It didn't take long to figure that a classic class hierarchy with > inheritance etc. would contain prohibitively many types (with awkward > names too), or would lose a lot of expressiveness. There are just so > many types of containers, and they have so different characteristics. > My second shot was with a capability-based architecture suggested (in an > unrelated context) by Scott Meyers > (http://www.artima.com/cppsource/codefeatures.html - good article, > recommended regardless). Some capabilities I came up with were: > enum ContCaps : uint > { > hasLength = 1, > hasAssignableLength = 2, > hasPushBack = 4, > hasRemoveBack = 8, > hasPushFront = 16, > hasRemoveFront = 32, > linear = 64, > hasFront = 128, > hasBack = 256, > randomAccess = 512, > } This is why I hate nominative typing (http://en.wikipedia.org/wiki/Nominative_type_system). | |||
March 05, 2010 Re: Container hierarchy vs. container types | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | Andrei Alexandrescu wrote:
> Apologies for the dense message. I wanted to concentrate as much info in as little time/space as possible.
Actually, thank you for making a concentrated message.
| |||
March 05, 2010 Re: Container hierarchy vs. container types | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | Andrei Alexandrescu wrote:
> I decided to let no assumption unchallenged and got back to the question: do we really need a container _hierarchy_? How much joy and usefulness do people derive from dynamically changing an array with a list under the same interface? Or a rb-tree vs. a binary tree vs. a left-leaning binary tree?
I've never, in 35 years of programming, ever had a need to change the container type at runtime. I don't think the thought has ever even crossed my mind. I appreciate bearophile's point of runtime profiling being used to change the container, but I suggest that such containers be internally self-modifying, rather than externally using user-visible virtual functions.
I more and more like the idea of having a set of names defined by convention that have conventional behaviors, in other words, duck typing. It's infinitely extendable and flexible without needing to modify the core library in any way. It also fits right in with D's template constraints.
The bit flag approach is not pretty. Furthermore, to add a capability means one has to redo that enum in the core library, which is not scalable and people don't like making their own custom core libraries. Even if they did, it makes for mutually incompatible third party libraries. Yuk.
| |||
Copyright © 1999-2021 by the D Language Foundation
Permalink
Reply