December 04, 2015
On Friday, 4 December 2015 at 09:17:07 UTC, Ola Fosheim Grøstad wrote:
> On Friday, 4 December 2015 at 09:09:35 UTC, ZombineDev wrote:
>> and space complexity (the correctness is given). Otherwise, designing large systems becomes impossible, because all large systems have hard performance requirements.
>
> I am sorry to say this, but hard performance requirements require O(1) on everything.

And even then you cannot assume that it is real time as the following is O(1):

if (full_moon()) sleep(1000);

So you can have an O(1) algorithm that occasionally triggers an expensive constant-time/memory path that causes big problems in a running system. You need a hard upper bound measured in cycles.

December 04, 2015
On Friday, 4 December 2015 at 09:17:07 UTC, Ola Fosheim Grøstad wrote:
> On Friday, 4 December 2015 at 09:09:35 UTC, ZombineDev wrote:
>> and space complexity (the correctness is given). Otherwise, designing large systems becomes impossible, because all large systems have hard performance requirements.
>
> I am sorry to say this, but hard performance requirements require O(1) on everything.
>
> Big-Oh tells you essentially very little if it is more than O(1). Quick sort and insertion sort are O(N^2), bucket sort is O(N). That does not make bucket sort faster than quick sort or even insertion sort on a specific case as it is dominated by the number of possible values.

Well, I agree, but I didn't say hard real-time, only performance requirements that are hard to achieve because the system is large, and becuase it would cost even more if the system was larger (slower).
December 04, 2015
On Friday, 4 December 2015 at 09:33:42 UTC, ZombineDev wrote:
> Well, I agree, but I didn't say hard real-time, only performance requirements that are hard to achieve because the system is large, and becuase it would cost even more if the system was larger (slower).

Yes, if you are writing for large scalable systems then it is interesting to know what the ceiling is, but then you probably also want something that is no worse than O(log N). Say for a chat service you'd know that the more expensive to develop solution can scale to 10 million users (O(1)), and the less expensive do develop solution will become very expensive to run when you hit 1 million (O(log N)).

So Big-Oh can be useful when you work with very large sizes and large variations, but in most cases I think it can be misleading without a good description of the actual implementation.

Another very useful measure in real time is the amount of variation between repeated calls. Say if there is a guarantee of <50% variation, you can measure actual running time and add 50% headroom on the hardware requirements. Some algorithms have some fixed size tables they need to clear out every once in a while and that can be very bad.

December 04, 2015
On Friday, 4 December 2015 at 03:37:10 UTC, Andrei Alexandrescu wrote:
> On 12/3/15 10:29 PM, Jack Stouffer wrote:
>> On Friday, 4 December 2015 at 02:21:12 UTC, Andrei Alexandrescu wrote:
>>> On 12/03/2015 09:10 PM, Idan Arye wrote:
>>>> The complexities of the operations is a property of the data structure
>>>> being used. If each collection type will have it's own set of method
>>>> names based on the complexity of operations on it, we won't be able to
>>>> have templated functions that operate on any kind of collection(or at
>>>> the very least, these functions will be really tedious to code).
>>>
>>> Your premise is right but you reach the negation of the correct
>>> conclusion. -- Andrei
>>
>> How so? If a singly linked list and a doubly linked list have two
>> different method names for the same operation, then they cannot be
>> easily templated.
>
> Took me a while to figure. There's a hierarchy of operations, e.g. if a collection implements insert, it automatically implements linearInsert. And so on. The collections framework provides these defaults, so client code that needs quick insert uses insert, whereas code that's fine with a linear upper bound uses linearInsert and captures both.

"I just want to insert an element. I don't care how long it takes. Why do I need to specify that it should be linear?"

In my opinion, there should be "constantInsert", "linearInsert", etc. for those who care about the complexity, and "insert" should be always available.
December 04, 2015
On Friday, 4 December 2015 at 09:51:05 UTC, tn wrote:
> "I just want to insert an element. I don't care how long it takes. Why do I need to specify that it should be linear?"
>
> In my opinion, there should be "constantInsert", "linearInsert", etc. for those who care about the complexity, and "insert" should be always available.

In the current std.container, linearInsert aliases to the sublinear insert function when it's provided. If you don't care about running time, just use linearInsert.

December 04, 2015
On Friday, 4 December 2015 at 09:50:15 UTC, Ola Fosheim Grøstad wrote:
> On Friday, 4 December 2015 at 09:33:42 UTC, ZombineDev wrote:
>> Well, I agree, but I didn't say hard real-time, only performance requirements that are hard to achieve because the system is large, and becuase it would cost even more if the system was larger (slower).
>
> Yes, if you are writing for large scalable systems then it is interesting to know what the ceiling is, but then you probably also want something that is no worse than O(log N). Say for a chat service you'd know that the more expensive to develop solution can scale to 10 million users (O(1)), and the less expensive do develop solution will become very expensive to run when you hit 1 million (O(log N)).
>
> So Big-Oh can be useful when you work with very large sizes and large variations, but in most cases I think it can be misleading without a good description of the actual implementation.
>
> Another very useful measure in real time is the amount of variation between repeated calls. Say if there is a guarantee of <50% variation, you can measure actual running time and add 50% headroom on the hardware requirements. Some algorithms have some fixed size tables they need to clear out every once in a while and that can be very bad.

I strongly agree with what you said earlier that typical complexity analysis is a too naive model for real-world systems. Very often (e.g. due to the nature of real hardware) your time may be dominated by constants. Or a function which looks like O(1) is sometimes O(n), due to hidden complexity. This (the fact that constants matter) is especially true when you aim for O(1). And yes, variation of execution time is also very important.

However, here we're talking about API for containers and algorithms and I think that putting the complexity in the function signature is better than only using abstract data structures, because it makes you more aware of the consequences of your choice of data structures and algorithms that you make.
If we can figure out a way to put even more information it may be even better.
December 04, 2015
On Friday, 4 December 2015 at 10:35:33 UTC, ZombineDev wrote:
> However, here we're talking about API for containers and algorithms and I think that putting the complexity in the function signature is better than only using abstract data structures, because it makes you more aware of the consequences of your choice of data structures and algorithms that you make.
> If we can figure out a way to put even more information it may be even better.

Well, if the algorithm can choose between containers it might be interesting to query the characteristics of a set of operations on each container. Not sure if the method name is is the best way to do it. Although I guess having a generic find(x) and find_instantly(x) would be ok, if the latter verison basically means less than 200 cycles in all cases. If it means less than 2000 cycles I think it is not so useful.

Maybe it is possible to make the inference as static analysis and in addition make it possible for libraries to express various guarantees. Since it is worst case it would be acceptable that if analysis fails it will result in "unbounded".

December 04, 2015
On 12/03/2015 10:46 PM, Walter Bright wrote:
> On 12/3/2015 5:27 PM, Andrei Alexandrescu wrote:
>> Now this primitive may have three complexities:
>>
>> * linear in the length of r (e.g. c is a singly-linked list)
>>
>> * linear in the number of elements after r in the collection (e.g. c
>> is an array)
>>
>> * constant (c is a doubly-linked list)
>>
>> These complexities must be reflected in the name of the primitives. Or
>> perhaps
>> it's okay to conflate a couple of them. I know names are something
>> we're awfully
>> good at discussing :o). Destroy!
>
> Are you sure there are only 3 complexities? What if it's a
> self-balancing tree?
>
> I'm also not sure it is a good idea to put the complexity in the name of
> the primitive. Do any other algorithm libraries do that?

Looks exaggerated, innit? The fact of the matter is people choose collections based on the complexity of their operations all the time. "I need to insert and remove at the front, so I'll use a list here." Or: "I need random access, I'll use a vector" etc.

Not designing nomenclature for complexity leads to the usual design disasters such as providing a list with the same indexing operator as an array.

Turning that on its head, if we want to allow people to design collection-independent code and mix and match collections based only on their APIs, those APIs must reflect the complexity of operations.

Collection-independent code is big in my design btw.

One idea is to have the very name reflect the operation. I'm thinking "insert" to mean "scoot over elements at the tail and insert the new element in the gap" and "link" to mean "create a new node and link it within this collection without scooting over anything". Then the relative costs will be implied.

Another idea is to only classify operations as "quick" (expected O(1) up to O(log n)) and "not quick" (everything worse). Then we only have two categories, and "quick" will be in the name of the respective methods.


Andrei

December 04, 2015
On 12/04/2015 01:05 AM, Tofu Ninja wrote:
> Also maybe a simpler idea would just be to annotate the the operations
> with there complexity with UDAs.

Great idea, will keep it in mind. Thanks! -- Andrei
December 04, 2015
On 12/04/2015 01:05 AM, Tofu Ninja wrote:
> When would a generic algorithm even need to know the complexity of the
> container?

Only always :o). -- Andrei