View mode: basic / threaded / horizontal-split · Log in · Help
June 04, 2012
Re: Making generalized Trie type in D
On Monday, 4 June 2012 at 20:40:03 UTC, Dmitry Olshansky wrote:
> [snip]
> Sorry my Trie implementation focused on "constructe once - read 
> everywhere" case.
My cases will likely have quite low amortized number of reads per 
insert / delete.

>> How do you handle situations when not existent, etc., is 
>> needed?
>>
> Easy say you have 2-level 4 entry each level (for sake of 
> simplicity), say the final value is int.
>
> Then in highly redundant or just when there is little amount of 
> values in initial data set (both cases are equivalent, thus 
> tries are easily invertible btw):
>
> LVL 0: [0, 1, 0, 0]
> This first one always occupies full size (it's asserted that 
> there is no index >= 4)
> LVL 1: [0, 0, 0, 0] [1, 0, 2, 3]
> Note again - only full pages, no cheating and half-pages, but 
> we can save on the amount of them (note almost obligatory zero 
> page)
>
> So it contains only 1, 2 and 3 at indexes of 4, 6, 7 
> respectively, T.init is our way of not EXISTS ... yes, I think 
> that should be user definable.
> This way there is no checks anywhere, only shift, add 
> dereference, shift, add, dereference...
Smart

>> For example, one by
>> one would allow ignoring key encoding (and thus using multiple 
>> encodings
>> simultaneously just as easily as single).
>
> It's just as easy with the whole thing. Treat it as bytes ;)
Except when equivalent keys in different encodings should be 
treated as equal. But now I can see that my counter-example is 
only partially valid - walklength could be used instead of length 
(more expensive, though), and dchars everywhere.

>> I guess making my own mistakes is necessary anyway.
>
> It could enlightening just don't give up too quickly and don't 
> jump to conclusions. In fact try to be sympathetic with 
> "loosing party", like in ... "hm this way is much slower, so 
> bad - I have to optimize it somehow". In other words make sure 
> you squeezed all you can from "slow" method.
This deserves quoting somewhere :) Thanks a lot and have a good 
night! (It's late in Russia, isn't it?)
June 04, 2012
Re: Making generalized Trie type in D
On Monday, 4 June 2012 at 21:07:02 UTC, Roman D. Boiko wrote:
>>> For example, one by
>>> one would allow ignoring key encoding (and thus using 
>>> multiple encodings
>>> simultaneously just as easily as single).
>>
>> It's just as easy with the whole thing. Treat it as bytes ;)
> Except when equivalent keys in different encodings should be 
> treated as equal. But now I can see that my counter-example is 
> only partially valid - walklength could be used instead of 
> length (more expensive, though), and dchars everywhere.
Another counter-example is searching for strings with specified 
prefix. One-by-one fits better here. I didn't understand whether 
such use cases are supported at both API and implementation 
levels.
June 04, 2012
Re: Making generalized Trie type in D
On 05.06.2012 1:16, Roman D. Boiko wrote:
> On Monday, 4 June 2012 at 21:07:02 UTC, Roman D. Boiko wrote:
>>>> For example, one by
>>>> one would allow ignoring key encoding (and thus using multiple
>>>> encodings
>>>> simultaneously just as easily as single).
>>>
>>> It's just as easy with the whole thing. Treat it as bytes ;)
>> Except when equivalent keys in different encodings should be treated
>> as equal.

I believe that once encoidng is established your compiler tool should 
use the best code for that encoding. And that means templates,  tailored 
per encoding in my book.
If anything I plan to use Tries on strings without decoding codepoint, 
just using length of it (as first stage, might need some tweak).

>> But now I can see that my counter-example is only partially
>> valid - walklength could be used instead of length (more expensive,
>> though), and dchars everywhere.
> Another counter-example is searching for strings with specified prefix.
> One-by-one fits better here. I didn't understand whether such use cases
> are supported at both API and implementation levels.

They are not... *yawn*. Okay, I'll make it support InputRange of 
typeof(Key.init[0]) along with specific key type iff key type is 
RandomAccessRange :) It will not work however with SetAsSlot and MapAsSlot.

And before you run away with that horrible idea of ever decoding UTF in 
lexer... Just don't do that. Trust me, it's not as small a price as it 
seems at first. At least keep it only at prototype stage as it 
simplifies things.

-- 
Dmitry Olshansky
June 04, 2012
Re: Making generalized Trie type in D
On Monday, 4 June 2012 at 21:39:50 UTC, Dmitry Olshansky wrote:
> And before you run away with that horrible idea of ever 
> decoding UTF in lexer... Just don't do that. Trust me, it's not 
> as small a price as it seems at first. At least keep it only at 
> prototype stage as it simplifies things.
I didn't plan to convert input into some other encoding. But 
missed the idea that it is possible to create finite automata as 
a template and avoid decoding altogether. IIRC, I rejected this 
approach when decided to convert everything into UTF-8 long ago, 
and didn't reconsider after discarding that idea after your 
previous suggestion to avoid converting. Thus your idea was used 
only partially, and now I wonder how did I not discover this 
myself! :)
June 04, 2012
Re: Making generalized Trie type in D
On Monday, 4 June 2012 at 21:39:50 UTC, Dmitry Olshansky wrote:
> On 05.06.2012 1:16, Roman D. Boiko wrote:
> I believe that once encoidng is established your compiler tool 
> should use the best code for that encoding. And that means 
> templates,  tailored per encoding in my book.
> If anything I plan to use Tries on strings without decoding 
> codepoint, just using length of it (as first stage, might need 
> some tweak).
Will it be difficult to adapt your API for immutable tries? E.g., 
it is not possible to implement immutable sequence (linked list) 
as a range, so range API doesn't fit that (but could with a tweak 
- returning tail instead of mutating in popFront). If trie API 
will have similar problems, then I need to invent my own. I 
understand that immutability is not your priority for GSoC, 
though.
June 04, 2012
Re: Making generalized Trie type in D
On 05.06.2012 1:56, Roman D. Boiko wrote:
> On Monday, 4 June 2012 at 21:39:50 UTC, Dmitry Olshansky wrote:
>> On 05.06.2012 1:16, Roman D. Boiko wrote:
>> I believe that once encoidng is established your compiler tool should
>> use the best code for that encoding. And that means templates,
>> tailored per encoding in my book.
>> If anything I plan to use Tries on strings without decoding codepoint,
>> just using length of it (as first stage, might need some tweak).
> Will it be difficult to adapt your API for immutable tries? E.g., it is
> not possible to implement immutable sequence (linked list) as a range,

Linked list? I'm horrified. Though I'd need some info on where and why 
you'd need that ;)
Despite some claims to the contrary small arrays (like e-hm 10_000 
elements) are faster in nearly all operations possible.

> so range API doesn't fit that (but could with a tweak - returning tail
> instead of mutating in popFront). If trie API will have similar
> problems, then I need to invent my own. I understand that immutability
> is not your priority for GSoC, though.

Well I might remove obstacles, if you outline your design more clearly.

-- 
Dmitry Olshansky
June 04, 2012
Re: Making generalized Trie type in D
On 05.06.2012 2:06, Dmitry Olshansky wrote:
> On 05.06.2012 1:56, Roman D. Boiko wrote:
>> On Monday, 4 June 2012 at 21:39:50 UTC, Dmitry Olshansky wrote:
>>> On 05.06.2012 1:16, Roman D. Boiko wrote:
>>> I believe that once encoidng is established your compiler tool should
>>> use the best code for that encoding. And that means templates,
>>> tailored per encoding in my book.
>>> If anything I plan to use Tries on strings without decoding codepoint,
>>> just using length of it (as first stage, might need some tweak).
>> Will it be difficult to adapt your API for immutable tries? E.g., it is
>> not possible to implement immutable sequence (linked list) as a range,
>
> Linked list? I'm horrified. Though I'd need some info on where and why
> you'd need that ;)
> Despite some claims to the contrary small arrays (like e-hm 10_000
> elements) are faster in nearly all operations possible.

And you can fake immutability, buy always picking up an unused slot btw. 
No need to go beyond logical immutability.


-- 
Dmitry Olshansky
June 04, 2012
Re: Making generalized Trie type in D
On Monday, 4 June 2012 at 22:06:49 UTC, Dmitry Olshansky wrote:
> On 05.06.2012 1:56, Roman D. Boiko wrote:
>> Will it be difficult to adapt your API for immutable tries? 
>> E.g., it is
>> not possible to implement immutable sequence (linked list) as 
>> a range,
>
> Linked list? I'm horrified. Though I'd need some info on where 
> and why you'd need that ;)
> Despite some claims to the contrary small arrays (like e-hm 
> 10_000 elements) are faster in nearly all operations possible.
That was for illustration only, as the most trivial example of 
immutable data structure (and probably the most widely used 
structure in functional programming).

>> so range API doesn't fit that (but could with a tweak - 
>> returning tail
>> instead of mutating in popFront). If trie API will have similar
>> problems, then I need to invent my own. I understand that 
>> immutability
>> is not your priority for GSoC, though.
>
> Well I might remove obstacles, if you outline your design more 
> clearly.
OK, thanks! I'll go through your code first to understand it 
better. But even before that I need to finish an important 
support request from my past customer...
June 04, 2012
Re: Making generalized Trie type in D
On Monday, 4 June 2012 at 22:21:42 UTC, Dmitry Olshansky wrote:
> And you can fake immutability, buy always picking up an unused 
> slot btw. No need to go beyond logical immutability.
That's applicable in some cases, not in general. But I agree that 
often it is possible to optimize if use cases are known.

My example concern was about fundamental problem of range APIs 
for immutable data structures, which is not possible to emulate: 
popFront is mutating by design.
June 05, 2012
Re: Making generalized Trie type in D
On 05.06.2012 2:28, Roman D. Boiko wrote:
> On Monday, 4 June 2012 at 22:21:42 UTC, Dmitry Olshansky wrote:
>> And you can fake immutability, buy always picking up an unused slot
>> btw. No need to go beyond logical immutability.
> That's applicable in some cases, not in general. But I agree that often
> it is possible to optimize if use cases are known.
>
> My example concern was about fundamental problem of range APIs for
> immutable data structures, which is not possible to emulate: popFront is
> mutating by design.

Keep in mind Ranges are temporary object most of the time. They are 
grease for wheels of algorithms. Given data structure S, it's  range is 
R(element of S). Thus for immutable data structure range will be mutable 
entity of immutable element type.

Interesting example is immutable strings, that still have ranges over 
them, that even return dchar not an immutable(char).

-- 
Dmitry Olshansky
1 2 3 4
Top | Discussion index | About this forum | D home