June 04, 2012
On Monday, 4 June 2012 at 20:40:03 UTC, Dmitry Olshansky wrote:
> [snip]
> Sorry my Trie implementation focused on "constructe once - read everywhere" case.
My cases will likely have quite low amortized number of reads per insert / delete.

>> How do you handle situations when not existent, etc., is needed?
>>
> Easy say you have 2-level 4 entry each level (for sake of simplicity), say the final value is int.
>
> Then in highly redundant or just when there is little amount of values in initial data set (both cases are equivalent, thus tries are easily invertible btw):
>
> LVL 0: [0, 1, 0, 0]
> This first one always occupies full size (it's asserted that there is no index >= 4)
> LVL 1: [0, 0, 0, 0] [1, 0, 2, 3]
> Note again - only full pages, no cheating and half-pages, but we can save on the amount of them (note almost obligatory zero page)
>
> So it contains only 1, 2 and 3 at indexes of 4, 6, 7 respectively, T.init is our way of not EXISTS ... yes, I think that should be user definable.
> This way there is no checks anywhere, only shift, add dereference, shift, add, dereference...
Smart

>> For example, one by
>> one would allow ignoring key encoding (and thus using multiple encodings
>> simultaneously just as easily as single).
>
> It's just as easy with the whole thing. Treat it as bytes ;)
Except when equivalent keys in different encodings should be treated as equal. But now I can see that my counter-example is only partially valid - walklength could be used instead of length (more expensive, though), and dchars everywhere.

>> I guess making my own mistakes is necessary anyway.
>
> It could enlightening just don't give up too quickly and don't jump to conclusions. In fact try to be sympathetic with "loosing party", like in ... "hm this way is much slower, so bad - I have to optimize it somehow". In other words make sure you squeezed all you can from "slow" method.
This deserves quoting somewhere :) Thanks a lot and have a good night! (It's late in Russia, isn't it?)
June 04, 2012
On Monday, 4 June 2012 at 21:07:02 UTC, Roman D. Boiko wrote:
>>> For example, one by
>>> one would allow ignoring key encoding (and thus using multiple encodings
>>> simultaneously just as easily as single).
>>
>> It's just as easy with the whole thing. Treat it as bytes ;)
> Except when equivalent keys in different encodings should be treated as equal. But now I can see that my counter-example is only partially valid - walklength could be used instead of length (more expensive, though), and dchars everywhere.
Another counter-example is searching for strings with specified prefix. One-by-one fits better here. I didn't understand whether such use cases are supported at both API and implementation levels.
June 04, 2012
On 05.06.2012 1:16, Roman D. Boiko wrote:
> On Monday, 4 June 2012 at 21:07:02 UTC, Roman D. Boiko wrote:
>>>> For example, one by
>>>> one would allow ignoring key encoding (and thus using multiple
>>>> encodings
>>>> simultaneously just as easily as single).
>>>
>>> It's just as easy with the whole thing. Treat it as bytes ;)
>> Except when equivalent keys in different encodings should be treated
>> as equal.

I believe that once encoidng is established your compiler tool should use the best code for that encoding. And that means templates,  tailored per encoding in my book.
If anything I plan to use Tries on strings without decoding codepoint, just using length of it (as first stage, might need some tweak).

>> But now I can see that my counter-example is only partially
>> valid - walklength could be used instead of length (more expensive,
>> though), and dchars everywhere.
> Another counter-example is searching for strings with specified prefix.
> One-by-one fits better here. I didn't understand whether such use cases
> are supported at both API and implementation levels.

They are not... *yawn*. Okay, I'll make it support InputRange of typeof(Key.init[0]) along with specific key type iff key type is RandomAccessRange :) It will not work however with SetAsSlot and MapAsSlot.

And before you run away with that horrible idea of ever decoding UTF in lexer... Just don't do that. Trust me, it's not as small a price as it seems at first. At least keep it only at prototype stage as it simplifies things.

-- 
Dmitry Olshansky
June 04, 2012
On Monday, 4 June 2012 at 21:39:50 UTC, Dmitry Olshansky wrote:
> And before you run away with that horrible idea of ever decoding UTF in lexer... Just don't do that. Trust me, it's not as small a price as it seems at first. At least keep it only at prototype stage as it simplifies things.
I didn't plan to convert input into some other encoding. But missed the idea that it is possible to create finite automata as a template and avoid decoding altogether. IIRC, I rejected this approach when decided to convert everything into UTF-8 long ago, and didn't reconsider after discarding that idea after your previous suggestion to avoid converting. Thus your idea was used only partially, and now I wonder how did I not discover this myself! :)
June 04, 2012
On Monday, 4 June 2012 at 21:39:50 UTC, Dmitry Olshansky wrote:
> On 05.06.2012 1:16, Roman D. Boiko wrote:
> I believe that once encoidng is established your compiler tool should use the best code for that encoding. And that means templates,  tailored per encoding in my book.
> If anything I plan to use Tries on strings without decoding codepoint, just using length of it (as first stage, might need some tweak).
Will it be difficult to adapt your API for immutable tries? E.g., it is not possible to implement immutable sequence (linked list) as a range, so range API doesn't fit that (but could with a tweak - returning tail instead of mutating in popFront). If trie API will have similar problems, then I need to invent my own. I understand that immutability is not your priority for GSoC, though.
June 04, 2012
On 05.06.2012 1:56, Roman D. Boiko wrote:
> On Monday, 4 June 2012 at 21:39:50 UTC, Dmitry Olshansky wrote:
>> On 05.06.2012 1:16, Roman D. Boiko wrote:
>> I believe that once encoidng is established your compiler tool should
>> use the best code for that encoding. And that means templates,
>> tailored per encoding in my book.
>> If anything I plan to use Tries on strings without decoding codepoint,
>> just using length of it (as first stage, might need some tweak).
> Will it be difficult to adapt your API for immutable tries? E.g., it is
> not possible to implement immutable sequence (linked list) as a range,

Linked list? I'm horrified. Though I'd need some info on where and why you'd need that ;)
Despite some claims to the contrary small arrays (like e-hm 10_000 elements) are faster in nearly all operations possible.

> so range API doesn't fit that (but could with a tweak - returning tail
> instead of mutating in popFront). If trie API will have similar
> problems, then I need to invent my own. I understand that immutability
> is not your priority for GSoC, though.

Well I might remove obstacles, if you outline your design more clearly.

-- 
Dmitry Olshansky
June 04, 2012
On 05.06.2012 2:06, Dmitry Olshansky wrote:
> On 05.06.2012 1:56, Roman D. Boiko wrote:
>> On Monday, 4 June 2012 at 21:39:50 UTC, Dmitry Olshansky wrote:
>>> On 05.06.2012 1:16, Roman D. Boiko wrote:
>>> I believe that once encoidng is established your compiler tool should
>>> use the best code for that encoding. And that means templates,
>>> tailored per encoding in my book.
>>> If anything I plan to use Tries on strings without decoding codepoint,
>>> just using length of it (as first stage, might need some tweak).
>> Will it be difficult to adapt your API for immutable tries? E.g., it is
>> not possible to implement immutable sequence (linked list) as a range,
>
> Linked list? I'm horrified. Though I'd need some info on where and why
> you'd need that ;)
> Despite some claims to the contrary small arrays (like e-hm 10_000
> elements) are faster in nearly all operations possible.

And you can fake immutability, buy always picking up an unused slot btw. No need to go beyond logical immutability.


-- 
Dmitry Olshansky
June 04, 2012
On Monday, 4 June 2012 at 22:06:49 UTC, Dmitry Olshansky wrote:
> On 05.06.2012 1:56, Roman D. Boiko wrote:
>> Will it be difficult to adapt your API for immutable tries? E.g., it is
>> not possible to implement immutable sequence (linked list) as a range,
>
> Linked list? I'm horrified. Though I'd need some info on where and why you'd need that ;)
> Despite some claims to the contrary small arrays (like e-hm 10_000 elements) are faster in nearly all operations possible.
That was for illustration only, as the most trivial example of immutable data structure (and probably the most widely used structure in functional programming).

>> so range API doesn't fit that (but could with a tweak - returning tail
>> instead of mutating in popFront). If trie API will have similar
>> problems, then I need to invent my own. I understand that immutability
>> is not your priority for GSoC, though.
>
> Well I might remove obstacles, if you outline your design more clearly.
OK, thanks! I'll go through your code first to understand it better. But even before that I need to finish an important support request from my past customer...
June 04, 2012
On Monday, 4 June 2012 at 22:21:42 UTC, Dmitry Olshansky wrote:
> And you can fake immutability, buy always picking up an unused slot btw. No need to go beyond logical immutability.
That's applicable in some cases, not in general. But I agree that often it is possible to optimize if use cases are known.

My example concern was about fundamental problem of range APIs for immutable data structures, which is not possible to emulate: popFront is mutating by design.
June 05, 2012
On 05.06.2012 2:28, Roman D. Boiko wrote:
> On Monday, 4 June 2012 at 22:21:42 UTC, Dmitry Olshansky wrote:
>> And you can fake immutability, buy always picking up an unused slot
>> btw. No need to go beyond logical immutability.
> That's applicable in some cases, not in general. But I agree that often
> it is possible to optimize if use cases are known.
>
> My example concern was about fundamental problem of range APIs for
> immutable data structures, which is not possible to emulate: popFront is
> mutating by design.

Keep in mind Ranges are temporary object most of the time. They are grease for wheels of algorithms. Given data structure S, it's  range is R(element of S). Thus for immutable data structure range will be mutable entity of immutable element type.

Interesting example is immutable strings, that still have ranges over them, that even return dchar not an immutable(char).

-- 
Dmitry Olshansky