December 04, 2015
On 12/04/2015 02:50 AM, Minas Mina wrote:
> I agree -- putting the complexity (eh, running time) to the primitive's
> name seems like a bad idea.
>
> I believe that stability is a more important "feature" of a sorting
> algorithm than its running time, because you can make implications about
> it and use them for your own algorithms (I use it for implementing delta
> compression) and affects the output.
>
> Running time is still a good guarantee to have, but if `sort` runs
> O(n^2) or O(nlogn) is not going to affect how your output will look like!

Sort is almost always assumed to take n log n time. The problem is when you combine simpler operations, e.g. inserting elements at the front in a loop.

Andrei

December 04, 2015
On 12/04/2015 03:41 AM, Jakob Ovrum wrote:
> On Friday, 4 December 2015 at 06:05:55 UTC, Tofu Ninja wrote:
>> When would a generic algorithm even need to know the complexity of the
>> container?
>
> A generic algorithm that uses exponential time when given the "wrong"
> inputs is a hidden danger. This can easily happen when composing with
> linear algorithms. This naming scheme, also employed by std.container,
> prevents this.
>
>> Also maybe a simpler idea would just be to annotate the the operations
>> with there complexity with UDAs. That way things that really care
>> about the complexity can get it, and those who don't can ignore it. It
>> has the benefit of being self documenting as well.
>
> If you don't care about the complexity, why are you using collections?
> Just use T[] and T[U]. Surely choosing a data structure is all about
> complexity (barring some details like optimizing for cache locality or
> small data structures).

WORD! -- Andrei

December 04, 2015
On 12/04/2015 04:27 AM, Andrea Fontana wrote:
> And who knows, maybe someone will discover a more efficient way to
> implement "insertAfter" or the collection itself... We should change its
> name?

Of course. The old name will still work because, remember, there's a hierarchy of operations; those that promise less automatically forward to those that promise more. -- Andrei

December 04, 2015
On 12/04/2015 04:51 AM, tn wrote:
> "I just want to insert an element. I don't care how long it takes. Why
> do I need to specify that it should be linear?"

Oh but you do care.

> In my opinion, there should be "constantInsert", "linearInsert", etc.
> for those who care about the complexity, and "insert" should be always
> available.

What would be the complexity assumption of "insert"?


Andrei
December 04, 2015
On Friday, 4 December 2015 at 14:08:08 UTC, Andrei Alexandrescu wrote:
> On 12/04/2015 04:51 AM, tn wrote:
>> "I just want to insert an element. I don't care how long it takes. Why
>> do I need to specify that it should be linear?"
>
> Oh but you do care.

I don't, if I have a small container of constant size (or bounded by a small constant). Say, less than 10 elements.

Or perhaps I just want to quickly make some prototype.

And if the other part of my algorithm is exponential or even superexponential, then it is going to dominate anyway. Or should there be also baseTwoExponentialInsert etc. for the case in which I don't care as long as the complexity is at most O(2^n)?

>> In my opinion, there should be "constantInsert", "linearInsert", etc.
>> for those who care about the complexity, and "insert" should be always
>> available.
>
> What would be the complexity assumption of "insert"?

None. So in a sense that would be an alias of whateverInsert or idontcareInsert or infinityInsert.

December 04, 2015
On Friday, 4 December 2015 at 09:57:48 UTC, Jakob Ovrum wrote:
> On Friday, 4 December 2015 at 09:51:05 UTC, tn wrote:
>> "I just want to insert an element. I don't care how long it takes. Why do I need to specify that it should be linear?"
>>
>> In my opinion, there should be "constantInsert", "linearInsert", etc. for those who care about the complexity, and "insert" should be always available.
>
> In the current std.container, linearInsert aliases to the sublinear insert function when it's provided. If you don't care about running time, just use linearInsert.

I understand this.

However, my point is that it seems backwards if
1) When I don't care about the complexity, then I need to specify one (e.g. linearInsert).
2) When I care and want a constant complexity, then I don't specify one (e.g. use insert instead of constantInsert).

In addition, if a new container is added that only provides O(n log n) insert, then my "don't care" code that uses linearInsert does not support the new container without changes.

December 04, 2015
On Friday, 4 December 2015 at 13:58:50 UTC, Andrei Alexandrescu wrote:
> On 12/03/2015 10:46 PM, Walter Bright wrote:
>> On 12/3/2015 5:27 PM, Andrei Alexandrescu wrote:
>>> Now this primitive may have three complexities:
>>>
>
> Looks exaggerated, innit? The fact of the matter is people choose collections based on the complexity of their operations all the time. "I need to insert and remove at the front, so I'll use a list here." Or: "I need random access, I'll use a vector" etc.
>
>
> Andrei

My personal preference would be not to have the complexity in the names, as I prefer shorter/concise names. Typically when I am writing code using containers of any sort I will check the documentation to determine what the cost of the operations I need is and base my choice off of that. I would think (hope) most others do this too.  However, I don't have a strong objection to the what is being proposed.

Would you have an insertLogarithmic ( insertInverseAckerman :o) too?



December 04, 2015
On Friday, 4 December 2015 at 14:51:40 UTC, CraigDillabaugh wrote:
> My personal preference would be not to have the complexity in the names, as I prefer shorter/concise names. Typically when I am writing code using containers of any sort I will check the documentation to determine what the cost of the operations I need is and base my choice off of that. I would think (hope) most others do this too.  However, I don't have a strong objection to the what is being proposed.

std.container puts linear and/or stable in some of its names and then creates an alias to whichever one makes sense as the default where the alias doesn't have linear or stable in the name. e.g. linearRemove becomes remove via an alias.

But actually using something like stableLinearRemove in code (or stable.linear.remove) becomes hideous _very_ quickly. No one is going to like it, and it really only benefits generic code, which most container code really isn't (and given how different containers tend to be in the complexity of their operations, I question that it even makes sense to use containers generically in many cases if you actually care about efficiency). Usually the genericity is achieved via iterators or ranges, not via the code that actually operates on the container.

So, while I applaud Andrei's attempt to make it so that containers can be used generically where appropriate (and having a consistent API across containers is a big win even if they're never used generically), I do fear that attempts to put the complexity and stability of the functions into the API are going to make it so unwieldy that it's useless (or at least, very un-user-friendly). Obviously, we'll have to wait and see what he comes up with, but what almost everyone is going to want is remove and insert and the like, not stable.linear.insert or stableLinearInsert or any variant on that.

- Jonathan M Davis
December 04, 2015
On Friday, 4 December 2015 at 13:59:41 UTC, Andrei Alexandrescu wrote:
> On 12/04/2015 01:05 AM, Tofu Ninja wrote:
>> Also maybe a simpler idea would just be to annotate the the operations
>> with there complexity with UDAs.
>
> Great idea, will keep it in mind. Thanks! -- Andrei

Yeah. If the primary reason to have stable and linear and whatnot in the names is so that introspection can be used on them, then UDAs will work fine for that. Where it's stickier is if you're just calling them in generic code, since then you have no protection against the complexity changing when the container being used changes. But pretty much no one is going to be wanting to use stuff like stableLinearInsert or stable.linear.insert in their code instead of insert. So, while having the complexity be part of the API has some serious advantages, it's not user-friendly for normal use, whereas the UDAs put you somewhere in between not having the complexity in the API at all and forcing everyone to type out the complexity every time they use a function.

- Jonathan M Davis
December 04, 2015
On Friday, 4 December 2015 at 09:17:07 UTC, Ola Fosheim Grøstad wrote:
> On Friday, 4 December 2015 at 09:09:35 UTC, ZombineDev wrote:
>> and space complexity (the correctness is given). Otherwise, designing large systems becomes impossible, because all large systems have hard performance requirements.
>
> I am sorry to say this, but hard performance requirements require O(1) on everything.

Only when dealing with an arbitrary number of elements. O(n^2) could be fine if you know for a fact that you're always dealing with a very small number of elements. And some algorithms with worse complexity are actually faster than algorithms with lower complexity when dealing with a small number of elements - particularly since with a small number of elements, the coefficients can matter a lot more than n. So, really, to know which algorithm is appropriate, you need to know how it's actually going to be used in the program in question and how different algorithms perform there. O(1) is definitely better, but it's not necessarily required.

But yes, if you're dealing with an arbitrarily large number of elements, anything worse than O(1) isn't going to cut it if you have hard performance requirements.

> Big-Oh tells you essentially very little if it is more than O(1). Quick sort and insertion sort are O(N^2), bucket sort is O(N). That does not make bucket sort faster than quick sort or even insertion sort on a specific case as it is dominated by the number of possible values.

Yeah. Really, Big-Oh tells you something about the worst case with an arbitrary number of elements. What actually happens in the program depends heavily on the number of elements which are actually involved. With large numbers of elements, Big-Oh starts mattering, but if the number of elements isn't large, then coefficients and minor implementation details of an algorithm start mattering a lot more than its conceptual worst case.

Still, in general, it's better to be aware of algorithmic complexity and favor algorithms with lower complexity until you need to optimize the code based on your specific use case and determine that a higher complexity algorithm actually performs better in that specific use case.

- Jonathan M Davis