View mode: basic / threaded / horizontal-split · Log in · Help
January 13, 2013
Bidirectional range dilemma
I'm repeatedly hitting a range API problem in my combinatorics 
library. Many of the ranges require (potentially sizeable) 
internal buffers to maintain their current state (permutations, 
set partitions, power sets etc.) Also, many of these ranges are 
bidirectional. What this means is they have to store *two* 
buffers, one for the front, and one for the back.

This is a problem because 90% of the time you'll only iterate 
forward, and of the remaining 10%, 9% is just iterating 
backwards, and maybe 1% of the time you want to iterate from both 
the front and back in the same iteration. This means that 99% of 
the time, having two buffers is just wasted memory. Of course, 
I'm plucking these numbers from the air, but you get the idea.

To make matters worse, thanks to the lack of logical const 
there's no way the buffers can be lazily allocated on first use.

The only (ugly) solution I can come up with is to create three 
different ranges, e.g.

ForwardPermutations
ReversePermutations
BidirectionalPermutations

(or equivalently, one range with an iteration policy).

I'd rather not do that. Any ideas on how to resolve this?
January 13, 2013
Re: Bidirectional range dilemma
Am 13.01.2013 13:05, schrieb Peter Alexander:
> To make matters worse, thanks to the lack of logical const there's no
> way the buffers can be lazily allocated on first use.

can't be Rebindable used for that?
January 13, 2013
Re: Bidirectional range dilemma
Peter Alexander:

> (or equivalently, one range with an iteration policy).

If no other better solution is found, one range with an iteration 
policy is acceptable, in my opinion.

(By the way, what's the API of your functions? In similar 
generators I usually add a boolean doCopy template argument, that 
defaults to true. If it's true, the generator yields different 
buffers, otherwise it yields always the same modified buffer. 
This allows to have both convenience & safety on default, and 
speed on request).

Bye,
bearophile
January 13, 2013
Re: Bidirectional range dilemma
On Sunday, 13 January 2013 at 12:34:38 UTC, bearophile wrote:
> (By the way, what's the API of your functions? In similar 
> generators I usually add a boolean doCopy template argument, 
> that defaults to true. If it's true, the generator yields 
> different buffers, otherwise it yields always the same modified 
> buffer. This allows to have both convenience & safety on 
> default, and speed on request).

Interesting.

I do empathize with having convenience and safety by default and 
speed on demand, but in this case I worry about the performance 
lost by not having speed as the default, especially since I 
imagine that most of the time the safety is unneeded.

If speed is the default then the template parameter is unneeded 
since you can just do permutations(r).map!array(). This is what I 
currently do.
January 13, 2013
Re: Bidirectional range dilemma
> Peter Alexander:

> especially since I imagine that most of the time the safety is 
> unneeded.

The lazyness of D ranges is useful, but unlike Haskell, in D you 
can't always use lazyness, sometimes you need eagerness, because 
D is strongly array-based.

A newbie programmer, or a new user of the permutations functions 
that has not studied its API, will find permutations() more handy 
and safe if it allocates a new buffer for each permutation.

You remember the discussions about File.byLine that is not 
allocating one array for each line on default, but it yields 
char[] so when you convert it to a string you usually copy the 
buffer. So its unsafety is limited. This is not true for 
permutations().

I have found that in small script-like programs I sometimes 
(about half of the time?) prefer an eager generation of 
combinations or permutations, because it's more handy. You don't 
always need max speed in the code.

Safety on default and speed on request is one of the tenets of D 
language. And it's a good thing.

Not allocating a buffer for each permutation is premature 
optimization.

In past I have had several bugs caused by generators that don't 
allocate a buffer each time. When you fix some of such bugs, you 
learn to appreciate safety on default. So my current generators 
of permutations, combinations, permutations by swapping, 
derangements, lexicographic generator, etc are safe on default. 
It makes my life simpler.

Bye,
bearophile
January 13, 2013
Re: Bidirectional range dilemma
On Sunday, 13 January 2013 at 14:01:28 UTC, bearophile wrote:
> Safety on default and speed on request is one of the tenets of 
> D language. And it's a good thing.
>
> Not allocating a buffer for each permutation is premature 
> optimization.

You may have convinced me. I'll need to think more about it.

FWIW, it is not a premature optimisation. On a simple benchmark I 
did, adding .dup to front() increased runtime by 3.5x on DMD with 
all optimisations. 3.5 is the difference between C and Javascript 
in the Computer Language Benchmarks Game.

http://benchmarksgame.alioth.debian.org/u32/which-programs-are-fastest.php

I do care about safety, but I also believe that performance is 
critically important to D's success. Performance on demands is 
good in theory, but if I'm writing high performance code then I 
want performance by default, and I don't want to have to fill my 
code with annotations and special flags/options.

I think we need a D mission statement that we can refer to, to 
settle these disputes. How much performance loss is acceptable in 
the name of safety by default?
January 13, 2013
Re: Bidirectional range dilemma
On Sunday, 13 January 2013 at 15:49:13 UTC, Peter Alexander wrote:
> On Sunday, 13 January 2013 at 14:01:28 UTC, bearophile wrote:
>> Safety on default and speed on request is one of the tenets of 
>> D language. And it's a good thing.
>>
>> Not allocating a buffer for each permutation is premature 
>> optimization.
>
> You may have convinced me. I'll need to think more about it.
>
> FWIW, it is not a premature optimisation. On a simple benchmark 
> I did, adding .dup to front() increased runtime by 3.5x on DMD 
> with all optimisations. 3.5 is the difference between C and 
> Javascript in the Computer Language Benchmarks Game.
>
> http://benchmarksgame.alioth.debian.org/u32/which-programs-are-fastest.php
>
> I do care about safety, but I also believe that performance is 
> critically important to D's success. Performance on demands is 
> good in theory, but if I'm writing high performance code then I 
> want performance by default, and I don't want to have to fill 
> my code with annotations and special flags/options.
>
> I think we need a D mission statement that we can refer to, to 
> settle these disputes. How much performance loss is acceptable 
> in the name of safety by default?

You spend most of the time in the same piece of code. That one 
must be annotated and everything, but most code could be written 
in javascript running in a VM written in PHP that it would change 
much.

Plus, you probably here are benchmarking mostly the GC, which is 
known to be inefficient in D.
January 13, 2013
Re: Bidirectional range dilemma
Peter Alexander:

> FWIW, it is not a premature optimisation. On a simple benchmark 
> I did, adding .dup to front() increased runtime by 3.5x on DMD 
> with all optimisations.

Yes, around 3X is compatible with my timings.

I think with a generational GC (you can test it in Java), or with 
some kind of arena allocator, that performance loss will decrease 
significantly (but then you need to add a memory allocator 
template argument, that I presume defaults to the normal GC 
allocator).


> I do care about safety, but I also believe that performance is 
> critically important to D's success.

I care a lot for performance (where performance is needed), but I 
care even more for correctness :-) A fast but buggy program is 
often useless and sometimes it's even dangerous.


> Performance on demands is good in theory, but if I'm writing 
> high performance code then I want performance by default, and I 
> don't want to have to fill my code with annotations and special 
> flags/options.

Compare:

foreach (p; permutations(items)) {...

Vs:

foreach (p; permutations!false(items)) {...

Or even:

foreach (p; permutations!0(items)) {...

Or even:

alias fastPermutations = permutations!false;
foreach (p; fastPermutations(items)) {...



> I think we need a D mission statement that we can refer to, to 
> settle these disputes. How much performance loss is acceptable 
> in the name of safety by default?

I don't think a "mission statement" is enough to settle all such 
questions, in life there is too much variety. You have to decide 
on specific cases, on the base of few general rules, like the D 
Zen rule I have written in the precedent answer.

Bye,
bearophile
January 13, 2013
Re: Bidirectional range dilemma
> This is a problem because 90% of the time you'll only iterate 
> forward, and of the remaining 10%, 9% is just iterating 
> backwards, and maybe 1% of the time you want to iterate from 
> both the front and back in the same iteration. This means that 
> 99% of the time, having two buffers is just wasted memory. Of 
> course, I'm plucking these numbers from the air, but you get 
> the idea.

You might just suggest these numbers like that, but in my 
experience they seem accurate.

> To make matters worse, thanks to the lack of logical const 
> there's no way the buffers can be lazily allocated on first use.
>
> The only (ugly) solution I can come up with is to create three 
> different ranges, e.g.
>
> ForwardPermutations
> ReversePermutations
> BidirectionalPermutations

That. I am not sure why you find it so ugly however :(. When the 
most capable/general range yields a noticeable cost on 
performance/space consumption (which seems to be the case here), 
I think it better to empower the user with choices rather than 
try to make decisions for him/her.

Documentating how bidirectionalPerms modifies perf/space might 
just be what you need to make peace with yourself. In the event a 
user REALLY needs the bidirectional functionality, he/she will 
most likely read that documentation and  comprehend why it wasn't 
the default in the first place and I am sure they will be 
thankful to know why.
January 13, 2013
Re: Bidirectional range dilemma
Personally,

I would have the policy implemented:
permutations( Policy p = Policy.forward )() {
...
}

And maybe extend it with aliases:

alias fPermutations permutations!( Policy.forward );
alias rPermutations permutations!( Policy.backward );
alias biPermutations permutations!( Policy.bidirectional );

Cheers!
« First   ‹ Prev
1 2
Top | Discussion index | About this forum | D home