December 03, 2015
On 12/3/2015 5:27 PM, Andrei Alexandrescu wrote:
> Now this primitive may have three complexities:
>
> * linear in the length of r (e.g. c is a singly-linked list)
>
> * linear in the number of elements after r in the collection (e.g. c is an array)
>
> * constant (c is a doubly-linked list)
>
> These complexities must be reflected in the name of the primitives. Or perhaps
> it's okay to conflate a couple of them. I know names are something we're awfully
> good at discussing :o). Destroy!

Are you sure there are only 3 complexities? What if it's a self-balancing tree?

I'm also not sure it is a good idea to put the complexity in the name of the primitive. Do any other algorithm libraries do that?

December 04, 2015
On Friday, 4 December 2015 at 03:37:10 UTC, Andrei Alexandrescu wrote:
> On 12/3/15 10:29 PM, Jack Stouffer wrote:
>> On Friday, 4 December 2015 at 02:21:12 UTC, Andrei Alexandrescu wrote:
>>> On 12/03/2015 09:10 PM, Idan Arye wrote:
>>>> The complexities of the operations is a property of the data structure
>>>> being used. If each collection type will have it's own set of method
>>>> names based on the complexity of operations on it, we won't be able to
>>>> have templated functions that operate on any kind of collection(or at
>>>> the very least, these functions will be really tedious to code).
>>>
>>> Your premise is right but you reach the negation of the correct
>>> conclusion. -- Andrei
>>
>> How so? If a singly linked list and a doubly linked list have two
>> different method names for the same operation, then they cannot be
>> easily templated.
>
> Took me a while to figure. There's a hierarchy of operations, e.g. if a collection implements insert, it automatically implements linearInsert. And so on. The collections framework provides these defaults, so client code that needs quick insert uses insert, whereas code that's fine with a linear upper bound uses linearInsert and captures both.
>
> Another way to look at it: in STL container-independent code is near impossible because different containers use the same signature for operations that are different (either wrt iterator invalidation or complexity). My design avoids that by giving distinct operations distinct names.
>
>
> Andrei

This sounds really complicated to use? What are the benefits?
When would a generic algorithm even need to know the complexity of the container?

Also maybe a simpler idea would just be to annotate the the operations with there complexity with UDAs. That way things that really care about the complexity can get it, and those who don't can ignore it. It has the benefit of being self documenting as well.
December 04, 2015
On Friday, 4 December 2015 at 03:46:39 UTC, Walter Bright wrote:
> On 12/3/2015 5:27 PM, Andrei Alexandrescu wrote:
>> Now this primitive may have three complexities:
>>
>> * linear in the length of r (e.g. c is a singly-linked list)
>>
>> * linear in the number of elements after r in the collection (e.g. c is an array)
>>
>> * constant (c is a doubly-linked list)
>>
>> These complexities must be reflected in the name of the primitives. Or perhaps
>> it's okay to conflate a couple of them. I know names are something we're awfully
>> good at discussing :o). Destroy!
>
> Are you sure there are only 3 complexities? What if it's a self-balancing tree?
>
> I'm also not sure it is a good idea to put the complexity in the name of the primitive. Do any other algorithm libraries do that?

I agree -- putting the complexity (eh, running time) to the primitive's name seems like a bad idea.

I believe that stability is a more important "feature" of a sorting algorithm than its running time, because you can make implications about it and use them for your own algorithms (I use it for implementing delta compression) and affects the output.

Running time is still a good guarantee to have, but if `sort` runs O(n^2) or O(nlogn) is not going to affect how your output will look like!
December 04, 2015
On Friday, 4 December 2015 at 01:55:15 UTC, Timon Gehr wrote:
> awfully popular outside of actual complexity theory papers. I don't really understand what calling it 'complexity' actually buys though.

It will make static arrays look very good. All operations are O(1).
December 04, 2015
On Friday, 4 December 2015 at 06:05:55 UTC, Tofu Ninja wrote:
> Also maybe a simpler idea would just be to annotate the the operations with there complexity with UDAs. That way things that really care about the complexity can get it, and those who don't can ignore it. It has the benefit of being self documenting as well.

Yes, but can you export it reliably in generically composed functions?

Say you have a loop of N iterations calling a function that involves M operations, you want to export N*M, but then the caller needs to know what N and M are referring to, so you need some standard way of getting there. (e.g. N could be number of edges and M could be number of nodes or something).

December 04, 2015
On Friday, 4 December 2015 at 06:05:55 UTC, Tofu Ninja wrote:
> When would a generic algorithm even need to know the complexity of the container?

A generic algorithm that uses exponential time when given the "wrong" inputs is a hidden danger. This can easily happen when composing with linear algorithms. This naming scheme, also employed by std.container, prevents this.

> Also maybe a simpler idea would just be to annotate the the operations with there complexity with UDAs. That way things that really care about the complexity can get it, and those who don't can ignore it. It has the benefit of being self documenting as well.

If you don't care about the complexity, why are you using collections? Just use T[] and T[U]. Surely choosing a data structure is all about complexity (barring some details like optimizing for cache locality or small data structures).


December 04, 2015
On Friday, 4 December 2015 at 06:05:55 UTC, Tofu Ninja wrote:
> On Friday, 4 December 2015 at 03:37:10 UTC, Andrei Alexandrescu wrote:
>> On 12/3/15 10:29 PM, Jack Stouffer wrote:
>>> On Friday, 4 December 2015 at 02:21:12 UTC, Andrei Alexandrescu wrote:
>>>> On 12/03/2015 09:10 PM, Idan Arye wrote:
>>>>> The complexities of the operations is a property of the data structure
>>>>> being used. If each collection type will have it's own set of method
>>>>> names based on the complexity of operations on it, we won't be able to
>>>>> have templated functions that operate on any kind of collection(or at
>>>>> the very least, these functions will be really tedious to code).
>>>>
>>>> Your premise is right but you reach the negation of the correct
>>>> conclusion. -- Andrei
>>>
>>> How so? If a singly linked list and a doubly linked list have two
>>> different method names for the same operation, then they cannot be
>>> easily templated.
>>
>> Took me a while to figure. There's a hierarchy of operations, e.g. if a collection implements insert, it automatically implements linearInsert. And so on. The collections framework provides these defaults, so client code that needs quick insert uses insert, whereas code that's fine with a linear upper bound uses linearInsert and captures both.
>>
>> Another way to look at it: in STL container-independent code is near impossible because different containers use the same signature for operations that are different (either wrt iterator invalidation or complexity). My design avoids that by giving distinct operations distinct names.
>>
>>
>> Andrei
>
> This sounds really complicated to use? What are the benefits?
> When would a generic algorithm even need to know the complexity of the container?
>
> Also maybe a simpler idea would just be to annotate the the operations with there complexity with UDAs. That way things that really care about the complexity can get it, and those who don't can ignore it. It has the benefit of being self documenting as well.

Ranges are a good example - they provide only the operations thay can efficiently implement. For example forward ranges could provide random access but at the cost of linear running time.

std.containers provides the `elem in container` operation only if they can implement it in logarithmic time or faster.

The fact that you can't use opIn for DList or Array is very good, because it prevents you from writing inefficient genric code. You're algorithm will manifest that it can only work with containers provide efficient operations and because of that you can be sure what its time complexity would be.

You should choose a specific data structure only because you can efficiently implement the algorithm you have in mind with it.

One of worst examples of the opposite is .Count extention method in .NET (which happens to have the same name as .Count member function of ICollection - one of the most used (often implicitly) interfaces), which has linear running time. The most horrible thing I have seen!
December 04, 2015
On Friday, 4 December 2015 at 07:50:19 UTC, Minas Mina wrote:
> On Friday, 4 December 2015 at 03:46:39 UTC, Walter Bright wrote:
>> On 12/3/2015 5:27 PM, Andrei Alexandrescu wrote:
>>> Now this primitive may have three complexities:
>>>
>>> * linear in the length of r (e.g. c is a singly-linked list)
>>>
>>> * linear in the number of elements after r in the collection (e.g. c is an array)
>>>
>>> * constant (c is a doubly-linked list)
>>>
>>> These complexities must be reflected in the name of the primitives. Or perhaps
>>> it's okay to conflate a couple of them. I know names are something we're awfully
>>> good at discussing :o). Destroy!
>>
>> Are you sure there are only 3 complexities? What if it's a self-balancing tree?
>>
>> I'm also not sure it is a good idea to put the complexity in the name of the primitive. Do any other algorithm libraries do that?
>
> I agree -- putting the complexity (eh, running time) to the primitive's name seems like a bad idea.
>
> I believe that stability is a more important "feature" of a sorting algorithm than its running time, because you can make implications about it and use them for your own algorithms (I use it for implementing delta compression) and affects the output.
>
> Running time is still a good guarantee to have, but if `sort` runs O(n^2) or O(nlogn) is not going to affect how your output will look like!

That's wrong. Imagine a sort method that calls to an API server to sort the numbers. Given a good internet connection, this implementation detail will not affect the correctness of the result. But I'm sure that no sane person would want to use this method. When the only job of a function is to implement an algorithm then the only thing you should care about is the time and space complexity (the correctness is given). Otherwise, designing large systems becomes impossible, because all large systems have hard performance requirements.
December 04, 2015
On Friday, 4 December 2015 at 09:09:35 UTC, ZombineDev wrote:
> and space complexity (the correctness is given). Otherwise, designing large systems becomes impossible, because all large systems have hard performance requirements.

I am sorry to say this, but hard performance requirements require O(1) on everything.

Big-Oh tells you essentially very little if it is more than O(1). Quick sort and insertion sort are O(N^2), bucket sort is O(N). That does not make bucket sort faster than quick sort or even insertion sort on a specific case as it is dominated by the number of possible values.


December 04, 2015
On Friday, 4 December 2015 at 01:37:33 UTC, Jack Stouffer wrote:
> On Friday, 4 December 2015 at 01:27:42 UTC, Andrei Alexandrescu wrote:
>> These complexities must be reflected in the name of the primitives.
> I don't see why. IMO, names should convey what the function does, not how it does it.

I'm agree. It sounds like a bad idea.

And who knows, maybe someone will discover a more efficient way to implement "insertAfter" or the collection itself... We should change its name?