October 07, 2010
Tomek Sowiński <just@ask.me> wrote:

> Simen kjaeraas napisał:
>
>> Tomek Sowiński <just@ask.me> wrote:
>>> Even if the attribute properties could see the arguments, how to deal
>>> with things like
>>> lhs.length + rhs.length? It has to be inspectable at compile-time. One
>>> idea is to store the
>>> expression's abstract syntax tree (we want AST macros in D3 anyway)...
>>> But I got a feeling
>>> we're heading for an overkill :)
>>
>> Just as soon as we solve the halting problem, eh?
>
> What's the halting problem?

http://en.wikipedia.org/wiki/Halting_problem

Basically, inspecting the AST in search of the complexity might have an
infinite (or at least, arbitrarily high) complexity itself. It is likely
possible in some situations, but in the general case, not so much.

Also, consider the while loop. It may have an arbitrarily complex
termination condition, making it hard or impossible to find the
complexity. Example, with added complexity by omitting the source of
foo:

extern bool foo( );

void bar( bool delegate( ) dg ) {
    while ( 1 ) {
        if ( dg( ) ) {
            break;
        }
    }
}

bar( &foo );

-- 
Simen
October 08, 2010
On Thu, 7 Oct 2010 15:53:13 -0700, Jonathan M Davis <jmdavisProg@gmx.com> wrote:
> Except that when you're dealing with generic code which has to deal 
with 
> multiple container types (like std.algorithm), you _need_ certain 
complexity 
> guarantees about an operation since it could happen on any 
container that it's

Then, don't use it in std.algorithm or any other code that needs guaranteed complexity, just like now. I don't see the problem with a generic "in" operator, nobody would be forced to use it.
October 08, 2010
On 10/08/2010 03:18 AM, Juanjo Alvarez wrote:
> On Thu, 7 Oct 2010 15:53:13 -0700, Jonathan M Davis
> <jmdavisProg@gmx.com> wrote:
>> Except that when you're dealing with generic code which has to deal
> with
>> multiple container types (like std.algorithm), you _need_ certain
> complexity
>> guarantees about an operation since it could happen on any
> container that it's
>
> Then, don't use it in std.algorithm or any other code that needs
> guaranteed complexity, just like now. I don't see the problem with a
> generic "in" operator, nobody would be forced to use it.

What do you suggest for fast lookup in a container?
October 08, 2010
Jonathan M Davis wrote:
> On Thursday, October 07, 2010 14:44:50 Stanislav Blinov wrote:

>> But that is not a matter of library interface isn't it? It's a matter of
>> algorithm/container choice. It's not the push_back that was slow in the
>> end, it was std::vector (yes, that's arguable, but the point is that
>> container defines rules for its methods, not vice-versa).
> 
> Except that when you're dealing with generic code which has to deal with multiple container types (like std.algorithm), you _need_ certain complexity guarantees about an operation since it could happen on any container that it's given. Using operator in in an algorithm could be perfectly reasonable if it had O(1) or O(log n) complexity but be completely unreasonable if it had O(n) complexity. So, the algorithm then is reasonable with some containers and not others when if in were restricted to O(1) or O(log n), it could be used by the algorithm without worrying that a container would be used with it which would make it an order of complexity greater, or worse.
> 
> If it were strictly a matter of writing code which directly used a particular algorithm and container type, then you could know what the complexities of the algorithm and operations on the container are, but once you're dealing with generic code, operations need to have complexity guarantees regardless of their container, or the complexity of the algorithm will vary with the container type. And if that algorithm is used in yet another algorithm, then it gets that much worse. It can quickly become easy to bury the place where what you're trying to do becomes an order of complexity greater, because the container that you selected was an order of complexity greater than other containers on an operation that an algorithm is using buried in code somewhere.
> 
> - Jonathan M Davis

Yet still, generality ends at some point. You can't devise every possible algorithm for any possible types and have it have set-in-stone complexity independently of types.

Take std.range.popFrontN(). It's generic, and it's used in other algorithms. Yet it has O(1) complexity for ranges that support slicing, and O(n) for those that do not. But you don't take into account complexity of slicing operation, or complexity of stepping through the range.

Or std.algorithm.find(). It's basically O(n), but then again, when using it, one should also consider complexity of used predicate.
Just the same happened in the case Andrei described: the algorithm was O(n) judging from the description (performing n insertions into container). But the container itself blew it up into quadratic time because of it's own insertion algorithm.

What I mean is you'll always have algorithms that will perform differently for different containers, and you'll always have to choose containers that best fit your needs. Generic code is not only about efficiency: you'd at least expect it to work for different types.
Replacing vector with list in Andrei's case would probably solve the problem at the cost of losing random access (together with contiguous storage). Which means it'd also require changing code if random access was in use somewhere. Having a generic random access function at(Container,index) (offering complexity that varies with container used: e.g. O(1) for arrays, O(n) for lists) would save you maintenance.
October 08, 2010
2010/10/7 Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org>:
> In the end I figured there was only _one_
> quadratic operation - appending to a vector<size_t> that held document
> lengths. That wasn't even the bulk of the data and it was the last thing I
> looked at! Yet it made the run time impossible to endure.

>From sgi's stl vector:

void push_back(const _Tp& __x) {
    if (_M_finish != _M_end_of_storage) {
      construct(_M_finish, __x);
      ++_M_finish;
    }
    else
      _M_insert_aux(end(), __x);
  }

How is this quadratic, let alone linear?
October 08, 2010
On Friday 08 October 2010 03:06:01 Stanislav Blinov wrote:
> Yet still, generality ends at some point. You can't devise every possible algorithm for any possible types and have it have set-in-stone complexity independently of types.
> 
> Take std.range.popFrontN(). It's generic, and it's used in other
> algorithms. Yet it has O(1) complexity for ranges that support slicing,
> and O(n) for those that do not. But you don't take into account
> complexity of slicing operation, or complexity of stepping through the
> range.
> 
> Or std.algorithm.find(). It's basically O(n), but then again, when using
> it, one should also consider complexity of used predicate.
> Just the same happened in the case Andrei described: the algorithm was
> O(n) judging from the description (performing n insertions into
> container). But the container itself blew it up into quadratic time
> because of it's own insertion algorithm.
> 
> What I mean is you'll always have algorithms that will perform differently for different containers, and you'll always have to choose containers that best fit your needs. Generic code is not only about efficiency: you'd at least expect it to work for different types. Replacing vector with list in Andrei's case would probably solve the problem at the cost of losing random access (together with contiguous storage). Which means it'd also require changing code if random access was in use somewhere. Having a generic random access function at(Container,index) (offering complexity that varies with container used: e.g. O(1) for arrays, O(n) for lists) would save you maintenance.

All true. However, the point is that operations need to have a know complexity. If in is known to be O(1), the algorithms can safely use it, knowing that it's O(1). Any container that can't do it in O(1) doesn't implement in. You could, on the other hand, have in be O(n) (which doesn't stop containers from implementing it as O(1) since that's better than O(n)), and then any algorithm that uses it has to assume that it could be O(n) and therfore may need to avoid it. The real question is what complexity we want to define it as. At the moment, the only container to use it are the built-in associative arrays, and for them, it's O(1). Since find() is already going to be O(n), it seems to me that in should be better than O(n) - be it O(1) or O(log n) - and Andrei appears to agree, but others don't, hence this long discussion...

- Jonathan M Davis
October 08, 2010
Pelle Wrote:

> On 10/08/2010 03:18 AM, Juanjo Alvarez wrote:
> > On Thu, 7 Oct 2010 15:53:13 -0700, Jonathan M Davis <jmdavisProg@gmx.com> wrote:
> >> Except that when you're dealing with generic code which has to deal
> > with
> >> multiple container types (like std.algorithm), you _need_ certain
> > complexity
> >> guarantees about an operation since it could happen on any
> > container that it's
> >
> > Then, don't use it in std.algorithm or any other code that needs guaranteed complexity, just like now. I don't see the problem with a generic "in" operator, nobody would be forced to use it.
> 
> What do you suggest for fast lookup in a container?

What is being used now? How can having "in" and not using it (just like now) in functions requiring guaranteed complexity can be worse than not having it?

The only drawback I can see to having an "in" operator with all containers is that some programmers would not read the documentation and use it expecting it to be fast. But then that also happens with many other language constructs and some programmers will write crappy algoritms anyway.
October 08, 2010
On Thu, 07 Oct 2010 19:07:55 -0400, Rainer Deyke <rainerd@eldwood.com> wrote:

> On 10/7/2010 14:33, Steven Schveighoffer wrote:
>> On Thu, 07 Oct 2010 16:23:47 -0400, Rainer Deyke <rainerd@eldwood.com>
>> wrote:
>>> I can't say I've ever cared about the big-O complexity of an operation.
>>
>> Then you don't understand how important it is.
>
> Let me rephrase that.  I care about performance.  Big-O complexity can
> obviously have a significant effect on performance, so I so do care
> about it, but only to the extend that it affects performance.  Low big-O
> complexity is a means to an end, not a goal in and of itself.  If 'n' is
> low enough, then a O(2**n) algorithm may well be faster than an O(1)
> algorithm.

You'd be surprised at how important it is.  I remember competing on TopCoder when they just barely added C++ as a possible language.  I remember wondering "how are all the Java guys going to even compete with C++?"  But the big-O complexity was always way way more important than the performance differences of native vs. JVM'd code.

The thing about big-O complexity is that it gives you a good idea of how well your library will perform under defined circumstances.  And libraries must be completely aware and rigid about it, or else you have situations where things are nice and easy syntax-wise but perform horribly when actually used.  What you end up with when libraries don't care about it are "mysterious" slowdowns, or cases where people just start blaming the language for being so slow.

Imagine if a database backend developer said "you know, who cares about big-O complexity, I think linear performance is good enough, most people have small databases anyways" who would ever use that backend?  This is akin to how phobos needs to strive for the best performance.

>
> I also believe that, in the absence of a sophisticated system that
> actually verifies performance guarantees, the language and standard
> library should trust the programmer to know what he is doing.  The
> standard library should only provide transitive performance guarantees,
> e.g. this algorithm calls function 'f' 'n' times, so the algorithm's
> performance is O(n * complexity(f)).  If 'f' runs in constant time, the
> algorithm runs in linear time.  If 'f' runs in exponential time, the
> algorithm still runs.
>
>> big O complexity is very important when you are writing libraries.  Not
>> so much when you are writing applications -- if you can live with it in
>> your application, then fine.  But Phobos should not have these problems
>> for people who *do* care.
>>
>> What I'd suggest is to write your own function that uses in when
>> possible and find when not possible.  Then use that in your code.
>
> The issue is that algorithms may use 'in' internally, so I may have to
> rewrite large parts of Phobos.  (And the issue isn't about 'in'
> specifically, but complexity guarantees in general.)\

You have two options, define opIn how you want if you don't care about complexity guarantees (not recommended), or define a wrapper function that uses the best available option, which your code can use.

Despite phobos defining opIn to require lg(n) or better complexity, it does not restrict you from defining opIn how you want (even on arrays if you wish).  I personally find the tools phobos provides completely adequate for the tasks you would need to do.

-Steve
October 08, 2010
On Thu, 07 Oct 2010 21:18:56 -0400, Juanjo Alvarez <fake@fakeemail.com> wrote:

> On Thu, 7 Oct 2010 15:53:13 -0700, Jonathan M Davis <jmdavisProg@gmx.com> wrote:
>> Except that when you're dealing with generic code which has to deal
> with
>> multiple container types (like std.algorithm), you _need_ certain
> complexity
>> guarantees about an operation since it could happen on any
> container that it's
>
> Then, don't use it in std.algorithm or any other code that needs guaranteed complexity, just like now. I don't see the problem with a generic "in" operator, nobody would be forced to use it.

That kind of "documentation" is useless, it doesn't prevent use, and it doesn't feel right to the person who accidentally uses it.  When I call

sort(x);

and it performs horribly, am I going to blame x or sort?  Certainly, I'll never think it's my own fault :)

-Steve
October 08, 2010
On 10/8/10 5:24 CDT, Torarin wrote:
> 2010/10/7 Andrei Alexandrescu<SeeWebsiteForEmail@erdani.org>:
>> In the end I figured there was only _one_
>> quadratic operation - appending to a vector<size_t>  that held document
>> lengths. That wasn't even the bulk of the data and it was the last thing I
>> looked at! Yet it made the run time impossible to endure.
>
>> From sgi's stl vector:
>
> void push_back(const _Tp&  __x) {
>      if (_M_finish != _M_end_of_storage) {
>        construct(_M_finish, __x);
>        ++_M_finish;
>      }
>      else
>        _M_insert_aux(end(), __x);
>    }
>
> How is this quadratic, let alone linear?

My code wasn't using push_back.

Andrei