Jump to page: 1 2
Thread overview
Bidirectional range dilemma
Jan 13, 2013
Peter Alexander
Jan 13, 2013
David
Jan 13, 2013
bearophile
Jan 13, 2013
Peter Alexander
Jan 13, 2013
bearophile
Jan 13, 2013
Peter Alexander
Jan 13, 2013
deadalnix
Jan 13, 2013
bearophile
Jan 13, 2013
Phil Lavoie
Jan 13, 2013
Phil Lavoie
Jan 13, 2013
Phil Lavoie
Jan 13, 2013
monarch_dodra
Jan 13, 2013
bearophile
Jan 14, 2013
Phil Lavoie
January 13, 2013
I'm repeatedly hitting a range API problem in my combinatorics library. Many of the ranges require (potentially sizeable) internal buffers to maintain their current state (permutations, set partitions, power sets etc.) Also, many of these ranges are bidirectional. What this means is they have to store *two* buffers, one for the front, and one for the back.

This is a problem because 90% of the time you'll only iterate forward, and of the remaining 10%, 9% is just iterating backwards, and maybe 1% of the time you want to iterate from both the front and back in the same iteration. This means that 99% of the time, having two buffers is just wasted memory. Of course, I'm plucking these numbers from the air, but you get the idea.

To make matters worse, thanks to the lack of logical const there's no way the buffers can be lazily allocated on first use.

The only (ugly) solution I can come up with is to create three different ranges, e.g.

ForwardPermutations
ReversePermutations
BidirectionalPermutations

(or equivalently, one range with an iteration policy).

I'd rather not do that. Any ideas on how to resolve this?
January 13, 2013
Am 13.01.2013 13:05, schrieb Peter Alexander:
> To make matters worse, thanks to the lack of logical const there's no way the buffers can be lazily allocated on first use.

can't be Rebindable used for that?
January 13, 2013
Peter Alexander:

> (or equivalently, one range with an iteration policy).

If no other better solution is found, one range with an iteration policy is acceptable, in my opinion.

(By the way, what's the API of your functions? In similar generators I usually add a boolean doCopy template argument, that defaults to true. If it's true, the generator yields different buffers, otherwise it yields always the same modified buffer. This allows to have both convenience & safety on default, and speed on request).

Bye,
bearophile
January 13, 2013
On Sunday, 13 January 2013 at 12:34:38 UTC, bearophile wrote:
> (By the way, what's the API of your functions? In similar generators I usually add a boolean doCopy template argument, that defaults to true. If it's true, the generator yields different buffers, otherwise it yields always the same modified buffer. This allows to have both convenience & safety on default, and speed on request).

Interesting.

I do empathize with having convenience and safety by default and speed on demand, but in this case I worry about the performance lost by not having speed as the default, especially since I imagine that most of the time the safety is unneeded.

If speed is the default then the template parameter is unneeded since you can just do permutations(r).map!array(). This is what I currently do.
January 13, 2013
> Peter Alexander:

> especially since I imagine that most of the time the safety is unneeded.

The lazyness of D ranges is useful, but unlike Haskell, in D you can't always use lazyness, sometimes you need eagerness, because D is strongly array-based.

A newbie programmer, or a new user of the permutations functions that has not studied its API, will find permutations() more handy and safe if it allocates a new buffer for each permutation.

You remember the discussions about File.byLine that is not allocating one array for each line on default, but it yields char[] so when you convert it to a string you usually copy the buffer. So its unsafety is limited. This is not true for permutations().

I have found that in small script-like programs I sometimes (about half of the time?) prefer an eager generation of combinations or permutations, because it's more handy. You don't always need max speed in the code.

Safety on default and speed on request is one of the tenets of D language. And it's a good thing.

Not allocating a buffer for each permutation is premature optimization.

In past I have had several bugs caused by generators that don't allocate a buffer each time. When you fix some of such bugs, you learn to appreciate safety on default. So my current generators of permutations, combinations, permutations by swapping, derangements, lexicographic generator, etc are safe on default. It makes my life simpler.

Bye,
bearophile
January 13, 2013
On Sunday, 13 January 2013 at 14:01:28 UTC, bearophile wrote:
> Safety on default and speed on request is one of the tenets of D language. And it's a good thing.
>
> Not allocating a buffer for each permutation is premature optimization.

You may have convinced me. I'll need to think more about it.

FWIW, it is not a premature optimisation. On a simple benchmark I did, adding .dup to front() increased runtime by 3.5x on DMD with all optimisations. 3.5 is the difference between C and Javascript in the Computer Language Benchmarks Game.

http://benchmarksgame.alioth.debian.org/u32/which-programs-are-fastest.php

I do care about safety, but I also believe that performance is critically important to D's success. Performance on demands is good in theory, but if I'm writing high performance code then I want performance by default, and I don't want to have to fill my code with annotations and special flags/options.

I think we need a D mission statement that we can refer to, to settle these disputes. How much performance loss is acceptable in the name of safety by default?
January 13, 2013
On Sunday, 13 January 2013 at 15:49:13 UTC, Peter Alexander wrote:
> On Sunday, 13 January 2013 at 14:01:28 UTC, bearophile wrote:
>> Safety on default and speed on request is one of the tenets of D language. And it's a good thing.
>>
>> Not allocating a buffer for each permutation is premature optimization.
>
> You may have convinced me. I'll need to think more about it.
>
> FWIW, it is not a premature optimisation. On a simple benchmark I did, adding .dup to front() increased runtime by 3.5x on DMD with all optimisations. 3.5 is the difference between C and Javascript in the Computer Language Benchmarks Game.
>
> http://benchmarksgame.alioth.debian.org/u32/which-programs-are-fastest.php
>
> I do care about safety, but I also believe that performance is critically important to D's success. Performance on demands is good in theory, but if I'm writing high performance code then I want performance by default, and I don't want to have to fill my code with annotations and special flags/options.
>
> I think we need a D mission statement that we can refer to, to settle these disputes. How much performance loss is acceptable in the name of safety by default?

You spend most of the time in the same piece of code. That one must be annotated and everything, but most code could be written in javascript running in a VM written in PHP that it would change much.

Plus, you probably here are benchmarking mostly the GC, which is known to be inefficient in D.
January 13, 2013
Peter Alexander:

> FWIW, it is not a premature optimisation. On a simple benchmark I did, adding .dup to front() increased runtime by 3.5x on DMD with all optimisations.

Yes, around 3X is compatible with my timings.

I think with a generational GC (you can test it in Java), or with some kind of arena allocator, that performance loss will decrease significantly (but then you need to add a memory allocator template argument, that I presume defaults to the normal GC allocator).


> I do care about safety, but I also believe that performance is critically important to D's success.

I care a lot for performance (where performance is needed), but I care even more for correctness :-) A fast but buggy program is often useless and sometimes it's even dangerous.


> Performance on demands is good in theory, but if I'm writing high performance code then I want performance by default, and I don't want to have to fill my code with annotations and special flags/options.

Compare:

foreach (p; permutations(items)) {...

Vs:

foreach (p; permutations!false(items)) {...

Or even:

foreach (p; permutations!0(items)) {...

Or even:

alias fastPermutations = permutations!false;
foreach (p; fastPermutations(items)) {...



> I think we need a D mission statement that we can refer to, to settle these disputes. How much performance loss is acceptable in the name of safety by default?

I don't think a "mission statement" is enough to settle all such questions, in life there is too much variety. You have to decide on specific cases, on the base of few general rules, like the D Zen rule I have written in the precedent answer.

Bye,
bearophile
January 13, 2013
> This is a problem because 90% of the time you'll only iterate forward, and of the remaining 10%, 9% is just iterating backwards, and maybe 1% of the time you want to iterate from both the front and back in the same iteration. This means that 99% of the time, having two buffers is just wasted memory. Of course, I'm plucking these numbers from the air, but you get the idea.

You might just suggest these numbers like that, but in my experience they seem accurate.

> To make matters worse, thanks to the lack of logical const there's no way the buffers can be lazily allocated on first use.
>
> The only (ugly) solution I can come up with is to create three different ranges, e.g.
>
> ForwardPermutations
> ReversePermutations
> BidirectionalPermutations

That. I am not sure why you find it so ugly however :(. When the most capable/general range yields a noticeable cost on performance/space consumption (which seems to be the case here), I think it better to empower the user with choices rather than try to make decisions for him/her.

Documentating how bidirectionalPerms modifies perf/space might just be what you need to make peace with yourself. In the event a user REALLY needs the bidirectional functionality, he/she will most likely read that documentation and  comprehend why it wasn't the default in the first place and I am sure they will be thankful to know why.
January 13, 2013
Personally,

I would have the policy implemented:
permutations( Policy p = Policy.forward )() {
...
}

And maybe extend it with aliases:

alias fPermutations permutations!( Policy.forward );
alias rPermutations permutations!( Policy.backward );
alias biPermutations permutations!( Policy.bidirectional );

Cheers!

« First   ‹ Prev
1 2