June 05, 2012
On Tuesday, 5 June 2012 at 04:42:07 UTC, Dmitry Olshansky wrote:
> On 05.06.2012 2:28, Roman D. Boiko wrote:
>> My example concern was about fundamental problem of range APIs for
>> immutable data structures, which is not possible to emulate: popFront is
>> mutating by design.
>
> Keep in mind Ranges are temporary object most of the time. They are grease for wheels of algorithms. Given data structure S, it's  range is R(element of S). Thus for immutable data structure range will be mutable entity of immutable element type.
>
> Interesting example is immutable strings, that still have ranges over them, that even return dchar not an immutable(char).

I'm currently trying this approach (instead of working with immutable data structures directly, which would require recursion everywhere) in my experimental functional data structures, and it looks promising. :)
June 05, 2012
On Monday, 4 June 2012 at 22:22:34 UTC, Roman D. Boiko wrote:
> On Monday, 4 June 2012 at 22:06:49 UTC, Dmitry Olshansky wrote:
>> On 05.06.2012 1:56, Roman D. Boiko wrote:
>>> so range API doesn't fit that (but could with a tweak - returning tail instead of mutating in popFront). If trie API will have similar problems, then I need to invent my own. I understand that immutability is not your priority for GSoC, though.
>>
>> Well I might remove obstacles, if you outline your design more clearly.
> OK, thanks! I'll go through your code first to understand it better. But even before that I need to finish an important support request from my past customer...
Basically, the most important aspect of immutability is returning a new instance of data structure on each insert / delete / update, and keeping the old one unchanged, instead of performing update in-place. This may not fit your most common use cases, though. Would be great to enable such semantics via policies, but without deep analysis I can't come up with a good API / design for that (without overcomplicating it). Probably keeping mutable and immutable APIs separate is the best choice. Will return to this problem once I get a bit of free time.
June 05, 2012
On Tuesday, 5 June 2012 at 05:28:48 UTC, Roman D. Boiko wrote:
> ... without deep analysis I can't come up with a good API / design for that (without overcomplicating it). Probably keeping mutable and immutable APIs separate is the best choice. Will return to this problem once I get a bit of free time.
Simplest and possibly the best approach is to provide an immutable wrapper over mutable implementation, but that may be difficult to make efficient given the need to support insert / delete as common operations.

June 05, 2012
On 05.06.2012 9:33, Roman D. Boiko wrote:
> On Tuesday, 5 June 2012 at 05:28:48 UTC, Roman D. Boiko wrote:
>> ... without deep analysis I can't come up with a good API / design for
>> that (without overcomplicating it). Probably keeping mutable and
>> immutable APIs separate is the best choice. Will return to this
>> problem once I get a bit of free time.
> Simplest and possibly the best approach is to provide an immutable
> wrapper over mutable implementation, but that may be difficult to make
> efficient given the need to support insert / delete as common operations.
>

I suspect I would have to add another policy like Persistent that will preallocate some slack space implicitly so that some pages can be shallowly copied on each assign(a-la COW) and immutable parts still reused.
Another simpler way would be to use an separate field "pointer-to-actual base" for each level, thus allowing it to redirect it away to new (modified) copy. Still looks like policy as it _maybe_  slightly slower.

Anyway as a start this should work:

auto modifyDupGlobalImmutableTrie(Trie(immutable(T), ...) t
	, scope delegate(Trie(immutable(T), ...) )pure dg) __pure__
{
	auto copy = t.dup;//this would be one day a shallow copy
	with(copy)
	{	
		dg(copy);
	}
	return copy;
}

//later on
{
...
immutable newTrie = modifyDupGlobalImmutableTrie(yourTrie);
...
}

-- 
Dmitry Olshansky
June 05, 2012
On Tuesday, 5 June 2012 at 12:57:10 UTC, Dmitry Olshansky wrote:
> On 05.06.2012 9:33, Roman D. Boiko wrote:
>> On Tuesday, 5 June 2012 at 05:28:48 UTC, Roman D. Boiko wrote:
>>> ... without deep analysis I can't come up with a good API / design for
>>> that (without overcomplicating it). Probably keeping mutable and
>>> immutable APIs separate is the best choice. Will return to this
>>> problem once I get a bit of free time.
>> Simplest and possibly the best approach is to provide an immutable
>> wrapper over mutable implementation, but that may be difficult to make
>> efficient given the need to support insert / delete as common operations.
>>
>
> I suspect I would have to add another policy like Persistent that will preallocate some slack space implicitly so that some pages can be shallowly copied on each assign(a-la COW) and immutable parts still reused.
> Another simpler way would be to use an separate field "pointer-to-actual base" for each level, thus allowing it to redirect it away to new (modified) copy. Still looks like policy as it _maybe_  slightly slower.
>
> Anyway as a start this should work:
>
> auto modifyDupGlobalImmutableTrie(Trie(immutable(T), ...) t
> 	, scope delegate(Trie(immutable(T), ...) )pure dg) __pure__
> {
> 	auto copy = t.dup;//this would be one day a shallow copy
> 	with(copy)
> 	{	
> 		dg(copy);
> 	}
> 	return copy;
> }
>
> //later on
> {
> ...
> immutable newTrie = modifyDupGlobalImmutableTrie(yourTrie);
> ...
> }

Yes,something like that should work. I finished support request and will investigate this and your std.uni. maybe it is better to avoid immutability... or do bulk ins /del before  copy.
June 11, 2012
On Tuesday, 5 June 2012 at 13:27:12 UTC, Roman D. Boiko wrote:
> maybe it is better to avoid immutability... or do bulk ins /del before  copy.
Would it be difficult to introduce two optimizations:
* have strings of all lengths above configurable threshold go to the same bucket (is it reasonable?)
* have ability to store a checkpoint, then do a series of modifications, and store another checkpoint but reuse data which have not been affected.

For the second optimization a possible design would be to store internal state as snapshot + diffs, and apply diffs when creating another snapshot. Diff format should allow efficiently performing trie interface operations.


June 11, 2012
On 11.06.2012 16:11, Roman D. Boiko wrote:
> On Tuesday, 5 June 2012 at 13:27:12 UTC, Roman D. Boiko wrote:
>> maybe it is better to avoid immutability... or do bulk ins /del before
>> copy.
> Would it be difficult to introduce two optimizations:
> * have strings of all lengths above configurable threshold go to the
> same bucket (is it reasonable?)

The sample I listed did just that: for length >= 64 it goes to bucket of 0-length strings. After all it's your prefix function, with any distribution you like ;)

> * have ability to store a checkpoint, then do a series of modifications,
> and store another checkpoint but reuse data which have not been affected.
>
yea, that's a rough sketch of what I want to support.

> For the second optimization a possible design would be to store internal
> state as snapshot + diffs, and apply diffs when creating another
> snapshot. Diff format should allow efficiently performing trie interface
> operations.

It may be. Diff could be a bunch of pages that are XOR-ed on top of snapshot. Dunno if it's worthwhile trick yet.


-- 
Dmitry Olshansky
June 11, 2012
On Monday, 11 June 2012 at 14:48:27 UTC, Dmitry Olshansky wrote:
> On 11.06.2012 16:11, Roman D. Boiko wrote:
>> On Tuesday, 5 June 2012 at 13:27:12 UTC, Roman D. Boiko wrote:
>>> maybe it is better to avoid immutability... or do bulk ins /del before
>>> copy.
>> Would it be difficult to introduce two optimizations:
>> * have strings of all lengths above configurable threshold go to the same bucket (is it reasonable?)
>
> The sample I listed did just that: for length >= 64 it goes to bucket of 0-length strings. After all it's your prefix function, with any distribution you like ;)
Great

>> * have ability to store a checkpoint, then do a series of modifications,
>> and store another checkpoint but reuse data which have not been affected.
>>
> yea, that's a rough sketch of what I want to support.
That's wonderful :)

>> For the second optimization a possible design would be to store internal state as snapshot + diffs, and apply diffs when creating another snapshot. Diff format should allow efficiently performing trie interface operations.
>
> It may be. Diff could be a bunch of pages that are XOR-ed on top of snapshot. Dunno if it's worthwhile trick yet.
It should be, provided that a single insert / delete doesn't affect many pages and size of page is reasonably small. Only need to create pages corresponding to those which are affected; and there is no need to track separate diffs between snapshots, so they can be combined.

Another option might be creating a separate trie for insertions and tracking deletions somehow, provided that tries can be merged efficiently.

Actually, in my case, deletions could be deferred and performed in bulk. OTOH, I will need to count how many times a string is inserted minus number of times it has been deleted. Alternatively, I could just check from time to time which strings are not needed any more (micro-GC :) ).

There are many possible data structures, but the one you mentioned seems to be the most reasonable.
June 11, 2012
On 11.06.2012 20:43, Roman D. Boiko wrote:
[snip]
>
> Actually, in my case, deletions could be deferred and performed in bulk.
> OTOH, I will need to count how many times a string is inserted minus
> number of times it has been deleted. Alternatively, I could just check
> from time to time which strings are not needed any more (micro-GC :) ).
>
> There are many possible data structures, but the one you mentioned seems
> to be the most reasonable.

I meant an operation pseudo-XOR x P^ y where x is part of snapshot and y is part of diff page.

x P^ y == x      when y == T.init
x P^ y == y      when y != T.init

-- 
Dmitry Olshansky
June 11, 2012
On Monday, 11 June 2012 at 18:16:45 UTC, Dmitry Olshansky wrote:
> On 11.06.2012 20:43, Roman D. Boiko wrote:
> [snip]
>>
>> Actually, in my case, deletions could be deferred and performed in bulk.
>> OTOH, I will need to count how many times a string is inserted minus
>> number of times it has been deleted. Alternatively, I could just check
>> from time to time which strings are not needed any more (micro-GC :) ).
>>
>> There are many possible data structures, but the one you mentioned seems
>> to be the most reasonable.
>
> I meant an operation pseudo-XOR x P^ y where x is part of snapshot and y is part of diff page.
>
> x P^ y == x      when y == T.init
> x P^ y == y      when y != T.init
I understood, just mentioned some alternatives (maybe not reasonable given your trie implementation).