Thread overview | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
|
September 30, 2013 algorithm API design question | ||||
---|---|---|---|---|
| ||||
I need a function that finds a run of length k of elements equal to x in a range r, and I presume such a simple yet nontrivial algorithm (a dozen-liner) should be part of std.algorithm. This raises an interesting question - what form should the API have. I see three options: 1. The existing find(r1, r2) figures out a way to dynamically check that r2 is a run of identical elements and tailor the argument accordingly. For example, during Boyer-Moore initialization that test comes cheap. 2. We should statically specialize find(r1, r2) for the case r2 is an instance of Repeat. The specialization runs the tailored algorithm. The user is supposed to call e.g. find(r, repeat(x, k)) to benefit of the specialized algorithm. 3. We should introduce a new function called e.g. findRun(r, x, k). Each option has advantages and disadvantages. What do you all think is the best API? Andrei |
September 30, 2013 Re: algorithm API design question | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | On 09/30/2013 04:35 AM, Andrei Alexandrescu wrote: > I need a function that finds a run of length k of elements equal to x in > a range r, and I presume such a simple yet nontrivial algorithm (a > dozen-liner) should be part of std.algorithm. > > This raises an interesting question - what form should the API have. I > see three options: > > 1. The existing find(r1, r2) figures out a way to dynamically check that > r2 is a run of identical elements and tailor the argument accordingly. > For example, during Boyer-Moore initialization that test comes cheap. > This kind of thing tends to not pay off in the average case. > 2. We should statically specialize find(r1, r2) for the case r2 is an > instance of Repeat. The specialization runs the tailored algorithm. The > user is supposed to call e.g. find(r, repeat(x, k)) to benefit of the > specialized algorithm. > If the specialized algorithm runs faster, this should be done anyway. What is the optimization that would the specialized algorithm run faster though? > 3. We should introduce a new function called e.g. findRun(r, x, k). > :( > Each option has advantages and disadvantages. What do you all think is > the best API? > > > Andrei We already have 2. |
September 30, 2013 Re: algorithm API design question | ||||
---|---|---|---|---|
| ||||
Posted in reply to Timon Gehr | On Monday, 30 September 2013 at 09:15:22 UTC, Timon Gehr wrote:
> On 09/30/2013 04:35 AM, Andrei Alexandrescu wrote:
>> 2. We should statically specialize find(r1, r2) for the case r2 is an
>> instance of Repeat. The specialization runs the tailored algorithm. The
>> user is supposed to call e.g. find(r, repeat(x, k)) to benefit of the
>> specialized algorithm.
>>
>
> If the specialized algorithm runs faster, this should be done anyway.
> What is the optimization that would the specialized algorithm run faster though?
This.
The faster algorithm is to presumably jump along for random access ranges. e.g. if you are searching for xxxxxxxxxx then check r[9], if isn't x, check r[19] and so on. If it is x then check the preceding 10 elements.
|
September 30, 2013 Re: algorithm API design question | ||||
---|---|---|---|---|
| ||||
Posted in reply to Peter Alexander | On 09/30/2013 11:59 AM, Peter Alexander wrote:
> On Monday, 30 September 2013 at 09:15:22 UTC, Timon Gehr wrote:
>> On 09/30/2013 04:35 AM, Andrei Alexandrescu wrote:
>>> 2. We should statically specialize find(r1, r2) for the case r2 is an
>>> instance of Repeat. The specialization runs the tailored algorithm. The
>>> user is supposed to call e.g. find(r, repeat(x, k)) to benefit of the
>>> specialized algorithm.
>>>
>>
>> If the specialized algorithm runs faster, this should be done anyway.
>> What is the optimization that would the specialized algorithm run
>> faster though?
>
> This.
>
> The faster algorithm is to presumably jump along for random access
> ranges. e.g. if you are searching for xxxxxxxxxx then check r[9], if
> isn't x, check r[19] and so on. If it is x then check the preceding 10
> elements.
Boyer-Moore will already do this (and more). I guess you can be sure to get rid of the helper tables by specialising it manually. (But there is no fundamental obstacle that would prevent the compiler to perform this partial evaluation automatically).
|
September 30, 2013 Re: algorithm API design question | ||||
---|---|---|---|---|
| ||||
Posted in reply to Timon Gehr | On 9/30/13 2:15 AM, Timon Gehr wrote:
> What is the optimization that would the specialized algorithm run faster
> though?
Upon mismatch, restart search right after the mismatch point.
Andrei
|
September 30, 2013 Re: algorithm API design question | ||||
---|---|---|---|---|
| ||||
Posted in reply to Timon Gehr | On 9/30/13 5:04 AM, Timon Gehr wrote:
> On 09/30/2013 11:59 AM, Peter Alexander wrote:
>> On Monday, 30 September 2013 at 09:15:22 UTC, Timon Gehr wrote:
>>> On 09/30/2013 04:35 AM, Andrei Alexandrescu wrote:
>>>> 2. We should statically specialize find(r1, r2) for the case r2 is an
>>>> instance of Repeat. The specialization runs the tailored algorithm. The
>>>> user is supposed to call e.g. find(r, repeat(x, k)) to benefit of the
>>>> specialized algorithm.
>>>>
>>>
>>> If the specialized algorithm runs faster, this should be done anyway.
>>> What is the optimization that would the specialized algorithm run
>>> faster though?
>>
>> This.
>>
>> The faster algorithm is to presumably jump along for random access
>> ranges. e.g. if you are searching for xxxxxxxxxx then check r[9], if
>> isn't x, check r[19] and so on. If it is x then check the preceding 10
>> elements.
>
> Boyer-Moore will already do this (and more). I guess you can be sure to
> get rid of the helper tables by specialising it manually. (But there is
> no fundamental obstacle that would prevent the compiler to perform this
> partial evaluation automatically).
B-M has significant preprocessing costs, so it's not appropriate for all cases. A function specialized in finding runs would be optimal without preprocessing. Another way of looking at it: it's wasteful to pass repeat(x, k) into Boyer-Moore to have it rediscover what the caller knew already statically.
Andrei
|
September 30, 2013 Re: algorithm API design question | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | On 09/30/2013 05:45 PM, Andrei Alexandrescu wrote:
> On 9/30/13 5:04 AM, Timon Gehr wrote:
>> On 09/30/2013 11:59 AM, Peter Alexander wrote:
>>> On Monday, 30 September 2013 at 09:15:22 UTC, Timon Gehr wrote:
>>>> On 09/30/2013 04:35 AM, Andrei Alexandrescu wrote:
>>>>> 2. We should statically specialize find(r1, r2) for the case r2 is an
>>>>> instance of Repeat. The specialization runs the tailored algorithm.
>>>>> The
>>>>> user is supposed to call e.g. find(r, repeat(x, k)) to benefit of the
>>>>> specialized algorithm.
>>>>>
>>>>
>>>> If the specialized algorithm runs faster, this should be done anyway.
>>>> What is the optimization that would the specialized algorithm run
>>>> faster though?
>>>
>>> This.
>>>
>>> The faster algorithm is to presumably jump along for random access
>>> ranges. e.g. if you are searching for xxxxxxxxxx then check r[9], if
>>> isn't x, check r[19] and so on. If it is x then check the preceding 10
>>> elements.
>>
>> Boyer-Moore will already do this (and more). I guess you can be sure to
>> get rid of the helper tables by specialising it manually. (But there is
>> no fundamental obstacle that would prevent the compiler to perform this
>> partial evaluation automatically).
>
> B-M has significant preprocessing costs, so it's not appropriate for all
> cases. A function specialized in finding runs would be optimal without
> preprocessing. Another way of looking at it: it's wasteful to pass
> repeat(x, k) into Boyer-Moore to have it rediscover what the caller knew
> already statically.
>
>
> Andrei
>
(My point was that the compiler knows it statically too, and the set of program transformations necessary to get rid of it in a standard Boyer-Moore implementation should be simple enough. Eg, any occurrence of needle[j]!=needle[i] will be known to be false. Probably current compilers will not inline array lookups and/or get rid of the allocation though.)
|
September 30, 2013 Re: algorithm API design question | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | On 09/30/2013 05:45 PM, Andrei Alexandrescu wrote:
>>
>
> B-M has significant preprocessing costs, so it's not appropriate for all
> cases. A function specialized in finding runs would be optimal without
> preprocessing.
> ...
I guess another point is that Boyer-Moore requires more capabilities of the input type.
|
Copyright © 1999-2021 by the D Language Foundation