May 23, 2012
On Wed, 23 May 2012 17:54:42 -0400, Roman D. Boiko <rb@d-coding.com> wrote:

> On Wednesday, 23 May 2012 at 19:51:02 UTC, Steven Schveighoffer wrote:
>> If you need a sorted tree structure that supports sharing immutable state, you likely have to find a different algorithm than redblack, since adding nodes can modify the tree structure via rotates.
>>
>> -Steve
>
> I don't need to invent here, and it is definitely feasible. Okasaki provided efficient algorithm for inserting nodes, and, IIRC, rotations (not for deleting, though). But OCaml doesn't map to D easily (neither does Haskel, I think).

I looked up how it could be done, the one thing I was not sure of was the parent node.  If you can have multiple references to the same subtree, how do you keep track of the parents.  But if you only are concerned about the ancestry of a single node, you can dynamically determine the line in O(lgn) time, and use that for the running of the redblack algorithm.

I'm not sure how efficient it would be.

-Steve
May 23, 2012
On Wednesday, 23 May 2012 at 22:25:35 UTC, Steven Schveighoffer wrote:
> I looked up how it could be done, the one thing I was not sure of was the parent node.  If you can have multiple references to the same subtree, how do you keep track of the parents.  But if you only are concerned about the ancestry of a single node, you can dynamically determine the line in O(lgn) time, and use that for the running of the redblack algorithm.
>
> I'm not sure how efficient it would be.
>
> -Steve

My problem is the following:

n elements in some given order, each has an associated value; I need to retrieve elements for which the sum of values associated with previous elements lies in a given ragne; this should work efficiently when elements are added or removed, and history should be preserved.

Storing elements as tree leafs and sums of children values in other nodes gives me this easily.

Lookup price is 0(log N) number comparisons, which should be very fast. Insert / delete / rotate are also logarithmic, but I don't know the constant factors.

No need to retrieve parents.

So far it seems like RBT is perfect for me (if cache friendly implementation will be possible. Hopefully it will, because nodes are small and I can preallocate memory for them).
May 23, 2012
Roman D. Boiko:

> The end goal is to have something as easy to work with as collections of Scala or F# (I don't know Haskell), and, of course, efficient.

If you plan in creating a start of purely functional collections library then I suggest you to also take a look at Clojure and various Haskell collections. Clojure is very based on persistent purely functional data structures, used in modern ways. And in the Haskell tribe there are some very smart persons that have worked on those things for fifteen years. So spending one extra week reading and looking at those two worlds may help you a lot later, and give you many good/better ideas.

Bye,
bearophile
May 24, 2012
Andrei Alexandrescu:

> Yah, FP doesn't like arrays and immutable containers essentially eschew arrays altogether.

In the Chapel/X10 worlds I have read about various strategies to use mutable arrays too in parallel/concurrent situations. Probably some of those ideas will be quite useful in D too.


> (The most surprising thing to me is that the FP community
> generally fails to acknowledge that as a problem.)

Again and again I have seen how modern CPUs like linear scans on (small) arrays of values. Sometimes the performance is surprising, and beats almost everything else, including binary searches on the same small array.

This sometimes means that using a single core on such mutable arrays from C/C++/D leads to faster code (and far less electricity consumed) than using 4-8 cores on less optimal data structures (like structures with many pointers).

This is also why your idea of "small string optimization" was probably leading to significantly higher performance, and why I think the built-in D AAs will need a "small associative array" optimization (that even Python dicts have, for a number of items less than 9), BigInts will need a small value (no heap) optimization, and so on for other Phobos data structures (including lists!). Often such optimizations means just storing the few values in a small array instead of a fancier data structure (like a list), so adding such optimizations is usually not hard.

Bye,
bearophile
May 24, 2012
On Wednesday, 23 May 2012 at 23:51:45 UTC, bearophile wrote:
> Roman D. Boiko:
>
>> The end goal is to have something as easy to work with as collections of Scala or F# (I don't know Haskell), and, of course, efficient.
>
> If you plan in creating a start of purely functional collections library then I suggest you to also take a look at Clojure and various Haskell collections. Clojure is very based on persistent purely functional data structures, used in modern ways. And in the Haskell tribe there are some very smart persons that have worked on those things for fifteen years. So spending one extra week reading and looking at those two worlds may help you a lot later, and give you many good/better ideas.
>
> Bye,
> bearophile

Thanks, I will!
May 24, 2012
On Wednesday, May 23, 2012 23:58:42 Roman D. Boiko wrote:
> On Wednesday, 23 May 2012 at 21:12:04 UTC, Jonathan M Davis wrote:
> > In my personal experience, that's not true at all. I've seen
> > _huge_
> > performance problems in Haskell when using a map. There may be
> > cases where an
> > immutable container may not pose large performance problems,
> > but I would be
> > _very_ wary of using immutable containers, and _very_ careful
> > with them when I
> > did.
> > 
> > - Jonathan M Davis
> 
> I expect problems with eliminating tail calls (especially for mutually recursive functions), but cannot find any other reason, theoretical or implementation, that would necessarily make it execute slowly in D. I think that cache friendliness is possible to achieve, at least in my use cases. What else could go wrong?

As part of my thesis work, I wrote a program which was counting possible tokens in code (as part of an AI attempting to learn about a programming language), which required a map of tokens to the number of times that they'd been seen. Because I had previously been doing stuff in Haskell for my thesis, I continued to use Haskell for that portion, and it was a _huge_ mistake, because the performance was _terrible_ - as in it could only process a few files in a day kind of terrible. I ultimately had to switch over to another language to get it to work (it ended up being Java, but I could have used a variety of other languages).

Maybe if I had simply been adding elements to the map, it wouldn't have been anywhere near as bad, but since I had to mutate them (and both the map and its elments were technically immutable), that become _insanely_ expensive.

So, it's quite possible that you have a use case where using immutable containers works really well and isn't a problem, but in what I've dealt with personally, I've found it to be a huge performance problem and so am very leery of immutable containers.

- Jonathan M Davis
May 24, 2012
On Thursday, 24 May 2012 at 01:00:52 UTC, Jonathan M Davis wrote:
> As part of my thesis work, I wrote a program which was counting possible
> tokens in code (as part of an AI attempting to learn about a programming
> language), which required a map of tokens to the number of times that they'd
> been seen. Because I had previously been doing stuff in Haskell for my thesis,
> I continued to use Haskell for that portion, and it was a _huge_ mistake,
> because the performance was _terrible_ - as in it could only process a few
> files in a day kind of terrible. I ultimately had to switch over to another
> language to get it to work (it ended up being Java, but I could have used a
> variety of other languages).
>
> Maybe if I had simply been adding elements to the map, it wouldn't have been
> anywhere near as bad, but since I had to mutate them (and both the map and its
> elments were technically immutable), that become _insanely_ expensive.
>
> So, it's quite possible that you have a use case where using immutable
> containers works really well and isn't a problem, but in what I've dealt with
> personally, I've found it to be a huge performance problem and so am very
> leery of immutable containers.
>
> - Jonathan M Davis

Is source code available?
May 24, 2012
On Wednesday, 23 May 2012 at 21:12:04 UTC, Jonathan M Davis wrote:
> On Wednesday, May 23, 2012 20:51:31 Roman D. Boiko wrote:
>> On Wednesday, 23 May 2012 at 18:39:02 UTC, Stewart Gordon wrote:
>> > On 23/05/2012 16:05, Roman D. Boiko wrote:
>> > <snip>
>> > 
>> >> I need some immutable collections for my D Compiler Tools
>> >> project, namely linked list,
>> >> red-black tree, and possibly some others.
>> > 
>> > <snip>
>> > 
>> > What's the point of an immutable red-black tree?
>> > 
>> > The whole point of a red-black tree is to facilitate fast
>> > dynamic addition, removal and lookup of elements.
>> > 
>> > When the tree is immutable, only lookup speed is of any real
>> > relevance, so you might as well create a perfectly balanced
>> > binary tree. Or even better, an array.
>> > 
>> > Stewart.
>> 
>> Immutable collections in most cases have the same performance
>> characteristics as mutable ones. Space consumption may be higher,
>> but not necessarily.
>> 
>> Instead of mutating a collection, a new one is returned each time
>> you add or remove an element. It shares most nodes with a
>> previous one.
>
> In my personal experience, that's not true at all. I've seen _huge_
> performance problems in Haskell when using a map. There may be cases where an
> immutable container may not pose large performance problems, but I would be
> _very_ wary of using immutable containers, and _very_ careful with them when I
> did.
>
> - Jonathan M Davis


While I am an Haskell newbie, I think most of those problems are
related to how hard it is for many developers to learn how to properly
optimize Haskell code.

The straightforward way to do some things is not the most performant, and
some optimizations require fine tuning of strict/lazy semantics on the
data structures manipulations.

--
Paulo

May 24, 2012
On Wednesday, 23 May 2012 at 22:39:33 UTC, Roman D. Boiko wrote:
> On Wednesday, 23 May 2012 at 22:25:35 UTC, Steven Schveighoffer wrote:
>> I looked up how it could be done, the one thing I was not sure of was the parent node.  If you can have multiple references to the same subtree, how do you keep track of the parents.  But if you only are concerned about the ancestry of a single node, you can dynamically determine the line in O(lgn) time, and use that for the running of the redblack algorithm.
>>
>> I'm not sure how efficient it would be.
>>
>> -Steve
>
> My problem is the following:
>
> n elements in some given order, each has an associated value; I need to retrieve elements for which the sum of values associated with previous elements lies in a given ragne; this should work efficiently when elements are added or removed, and history should be preserved.
>
> Storing elements as tree leafs and sums of children values in other nodes gives me this easily.
>
> Lookup price is 0(log N) number comparisons, which should be very fast. Insert / delete / rotate are also logarithmic, but I don't know the constant factors.
>
> No need to retrieve parents.
>
> So far it seems like RBT is perfect for me (if cache friendly implementation will be possible. Hopefully it will, because nodes are small and I can preallocate memory for them).

I am not sure if I entirely understand your problem, but would a
persistent search tree work? See:

http://www.sciencedirect.com/science/article/pii/0022000089900342

I have a small write up on them from a course I took years ago:

http://cg.scs.carleton.ca/~cdillaba/comp5008/sarnak.html

Regards,
Craig

 -Steve

May 24, 2012
On Thursday, 24 May 2012 at 13:15:30 UTC, Craig Dillabaugh wrote:
> I am not sure if I entirely understand your problem, but would a
> persistent search tree work? See:
>
> http://www.sciencedirect.com/science/article/pii/0022000089900342
>
> I have a small write up on them from a course I took years ago:
>
> http://cg.scs.carleton.ca/~cdillaba/comp5008/sarnak.html
Yes, it would. Actually, what I described is how to use RBT as
search tree. RBT is selected because it bounds number of levels
to the optimum, and thus gives log N worst case for all
operations.

I'll take a look at the links, thanks!
1 2 3
Next ›   Last »