We need better documentation for functions with ranges and templates (page 5) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » We need better documentation for functions with ranges and templates (page 5)

December 15, 2015

Re: We need better documentation for functions with ranges and templates

Posted by Adrian Matoga
in reply to Chris Wright

Adrian Matoga

Posted in reply to Chris Wright

On Tuesday, 15 December 2015 at 01:10:01 UTC, Chris Wright wrote:
> This reminds me of the Tango strategy for this kind of thing.
>
> tango.core.Array was arranged like this:
>
> version(TangoDoc) {
>   /** Documentation comment. */
>   bool isSameLangth(Range1, Range2)(Range1 r1, Range2 r2) {
>     return true;
>   }
> } else {
>   bool isSameLength(Range1, Range2)(Range1 r1, Range2 r2)
>     if (
>       isInputRange!Range1 &&
>       isInputRange!Range2 &&
>       !isInfinite!Range1 &&
>       !isInfinite!Range2) {
>     // actual implementation
>   }
> }

Fantastic example of why this strategy should be just banned. You need to duplicate the signature in the source, and you are free to make any mistake that won't be caught by the compiler, such as the typo in the word 'Length' here. A beginner then copies the signature and fills in the argument part, and then spends minutes decrypting error messages or even grepping Phobos source to find out that the name of the function should be spelled 'isSameLength', as it's quite easy to overlook, especially when you copy-paste from the official documentation, which you expect to be correct.

December 15, 2015

Re: We need better documentation for functions with ranges and templates

Posted by ZombineDev
in reply to dnewbie

ZombineDev

Posted in reply to dnewbie

On Monday, 14 December 2015 at 19:56:29 UTC, dnewbie wrote:
> On Monday, 14 December 2015 at 19:04:46 UTC, bachmeier wrote:
>> It's unanimous, at least among the three of us posting in this Reddit thread:
>> ...
>
> Take for example C# Docs: https://msdn.microsoft.com/en-us/library/system.collections.arraylist.addrange.aspx
>
> Syntax C#:
>
> public virtual void AddRange(
> 	ICollection c
> )
>
> Parameters:
>     c
>     Type: System.Collections.ICollection
>     The ICollection whose elements should be added to the end of the ArrayList. The collection itself cannot be null, but it can contain elements that are null.
>
> Clean, simple and instructive!
>

You are really comparing apples to oranges. You should look at signatures of the LINQ functions:

https://msdn.microsoft.com/en-us/library/bb535047(v=vs.100).aspx
https://msdn.microsoft.com/en-us/library/bb534732(v=vs.100).aspx
https://msdn.microsoft.com/en-us/library/hh229621(v=vs.103).aspx
https://msdn.microsoft.com/en-us/library/hh229412(v=vs.103).aspx
https://msdn.microsoft.com/en-us/library/hh211747(v=vs.103).aspx
https://msdn.microsoft.com/en-us/library/hh211886(v=vs.103).aspx
https://msdn.microsoft.com/en-us/library/hh229310(v=vs.103).aspx
https://msdn.microsoft.com/en-us/library/hh229364(v=vs.103).aspx
https://msdn.microsoft.com/en-us/library/hh229473(v=vs.103).aspx
https://msdn.microsoft.com/en-us/library/hh212066(v=vs.103).aspx
https://msdn.microsoft.com/en-us/library/hh244290(v=vs.103).aspx
https://msdn.microsoft.com/en-us/library/hh229392(v=vs.103).aspx
https://msdn.microsoft.com/en-us/library/hh229708(v=vs.103).aspx
https://msdn.microsoft.com/en-us/library/hh229926(v=vs.103).aspx
https://msdn.microsoft.com/en-us/library/hh229008(v=vs.103).aspx
https://msdn.microsoft.com/en-us/library/hh229778(v=vs.103).aspx
https://msdn.microsoft.com/en-us/library/hh229729(v=vs.103).aspx

And then you have one of the most used methods - .Count:
https://msdn.microsoft.com/en-us/library/bb338038(v=vs.100).aspx

Where presumably, for the sake of simplicity, the docs don't even bother to mention that it is almost always O(n), because non of the Enumerable extention methods preserve the underlying ICollection interace.

December 15, 2015

Re: We need better documentation for functions with ranges and templates

Posted by ZombineDev
in reply to ZombineDev

ZombineDev

Posted in reply to ZombineDev

On Tuesday, 15 December 2015 at 09:57:00 UTC, ZombineDev wrote:
>
> And then you have one of the most used methods - .Count:
> https://msdn.microsoft.com/en-us/library/bb338038(v=vs.100).aspx
>
> Where presumably, for the sake of simplicity, the docs don't even bother to mention that it is almost always O(n), because non of the Enumerable extention methods preserve the underlying ICollection interace.

I honestly think that the documentation of walkLength in Phobos is much better, though an example would be nice:
https://dlang.org/phobos/std_range_primitives.html#.walkLength

December 15, 2015

Re: We need better documentation for functions with ranges and templates

Posted by rumbu
in reply to ZombineDev

rumbu

Posted in reply to ZombineDev

On Tuesday, 15 December 2015 at 09:57:00 UTC, ZombineDev wrote:

>
> And then you have one of the most used methods - .Count:
> https://msdn.microsoft.com/en-us/library/bb338038(v=vs.100).aspx
>
> Where presumably, for the sake of simplicity, the docs don't even bother to mention that it is almost always O(n), because non of the Enumerable extention methods preserve the underlying ICollection interace.

Looking at the .net source code, the Count extension method is also doing a best effort "count" by querying the ICollection interface.

     public static int Count<TSource>(this IEnumerable<TSource> source)
    {
      if (source == null)
        throw Error.ArgumentNull("source");
      ICollection<TSource> collection1 = source as ICollection<TSource>;
      if (collection1 != null)
        return collection1.Count;
      ICollection collection2 = source as ICollection;
      if (collection2 != null)
        return collection2.Count;
      int num = 0;
      using (IEnumerator<TSource> enumerator = source.GetEnumerator())
      {
        while (enumerator.MoveNext())
          checked { ++num; }
      }
      return num;
    }

The Remarks section clearly states the same thing:

"If the type of source implements ICollection<T>, that implementation is used to obtain the count of elements. Otherwise, this method determines the count."

And personally, I found the MS remark more compact and more user friendly than:

"This is a best-effort implementation of length for any kind of range.
If hasLength!Range, simply returns range.length without checking upTo (when specified). Otherwise, walks the range through its length and returns the number of elements seen. Performes Ο(n) evaluations of range.empty and range.popFront(), where n is the effective length of range."

Not everybody is licensed in computational complexity theory to understand what O(n) means.

December 15, 2015

Re: We need better documentation for functions with ranges and templates

Posted by ZombineDev
in reply to rumbu

ZombineDev

Posted in reply to rumbu

On Tuesday, 15 December 2015 at 11:26:04 UTC, rumbu wrote:
>
> Looking at the .net source code, the Count extension method is also doing a best effort "count" by querying the ICollection interface.

Yes, I have looked at the source code, before writing this, so I knew exactly how it worked. In short : terrible, because it relies only on OOP. But that's not the point. Why should anyone need to look at the source code, to see what this function does? I thought this is what the docs were supposed to tell.

>
> public static int Count<TSource>(this IEnumerable<TSource> [...]
>
> The Remarks section clearly states the same thing:
>
> "If the type of source implements ICollection<T>, that implementation is used to obtain the count of elements. Otherwise, this method determines the count."
>
>
> And personally, I found the MS remark more compact and more user friendly than:
> [...]

If you look at table at the beginning of page (https://dlang.org/phobos/std_range_primitives.html) you can clearly see a nice concise description of the function. Even if you don't know complexity theory there's the word "Compute" which should give you an idea that the function performs some non-trivial amount of work. Unlike:

> Returns the number of elements in a sequence.

Which implies that it only returns a number - almost like an ordinary getter property. I am scared to think that if back then C# got extension properties, it might have been implemented as such.

> Not everybody is licensed in computational complexity theory to understand what O(n) means.

LOL. Personally, I would never want to use any software written by a programmer, who can't tell the difference.

Well ok, let's consider a novice programmer who hasn't studied yet complexity theory.

Option A: They look at the documentation and see there's some strange O(n) thing that they don't know. They look it up in google and find the wonderful world of complexity theory. They become more educated and are grateful the people who wrote the documentation for describing more accurately the requirements of the function. That way they can easily decide how using such function would impact the performance of their system.

Option B: They look at the documentation and see that there's some strange O(n) thing that they don't know. They decide that it's extremely inhumane for the docs to expect such significant knowledge from the reader and they decide to quit. Such novices that do not want to learn are better off choosing a different profession, than inflicting their poor written software on the world.

December 15, 2015

Re: We need better documentation for functions with ranges and templates

Posted by Andrei Alexandrescu
in reply to Jakob Ovrum

Andrei Alexandrescu

Posted in reply to Jakob Ovrum

On 12/14/15 10:51 PM, Jakob Ovrum wrote:
> On Tuesday, 15 December 2015 at 03:47:30 UTC, Andrei Alexandrescu wrote:
>> We use this pattern in only a couple of places in Phobos, but I think
>> we should generally improve the language to use less, not more, of it.
>>
>> BTW I think all overloads of a given function should be under the same
>> DDOC entry with nice descriptions of what cases they apply to. The
>> situation right now with many function having separately-documented
>> overloads with "Jump to: 2" etc. is undesirable.
>>
>>
>> Andrei
>
> One possible trick is to use multiple `Params:` sections. Optional
> parameters can be described as such in the parameter description to
> reduce the number of `Params:` sections needed.
>
> Another thing we should do is simplify our overload sets/template
> constraints. For example, `find` has two overloads for needle search
> which can be collapsed into one. They have different template
> constraints - but only because of practical limitations in constraining
> all the needles properly, which should be remedied with improvements to
> std.traits and std.meta.

Yah, these are sensible ideas. Please add them to https://issues.dlang.org/show_bug.cgi?id=13676. Thanks! -- Andrei

December 15, 2015

Re: We need better documentation for functions with ranges and templates

Posted by Ola Fosheim Grøstad
in reply to ZombineDev

Ola Fosheim Grøstad

Posted in reply to ZombineDev

On Tuesday, 15 December 2015 at 12:28:02 UTC, ZombineDev wrote:
>> Not everybody is licensed in computational complexity theory to understand what O(n) means.
>
> LOL. Personally, I would never want to use any software written by a programmer, who can't tell the difference.
>
> Well ok, let's consider a novice programmer who hasn't studied yet complexity theory.

Most experienced programmers have a _very_ poor understanding of complexity theory, the associated notation and applicability.

A little bit of sloppy O(1), O(log N) and O(N), is ok, but for anything more than that it becomes more confusing than useful. E.g. the effects are not necessarily measurable for your program. You need more accurate descriptions to understand the effects than O(N^2).

December 15, 2015

Re: We need better documentation for functions with ranges and templates

Posted by ZombineDev
in reply to Ola Fosheim Grøstad

ZombineDev

Posted in reply to Ola Fosheim Grøstad

On Tuesday, 15 December 2015 at 13:47:20 UTC, Ola Fosheim Grøstad wrote:
> On Tuesday, 15 December 2015 at 12:28:02 UTC, ZombineDev wrote:
>>> Not everybody is licensed in computational complexity theory to understand what O(n) means.
>>
>> LOL. Personally, I would never want to use any software written by a programmer, who can't tell the difference.
>>
>> Well ok, let's consider a novice programmer who hasn't studied yet complexity theory.
>
> Most experienced programmers have a _very_ poor understanding of complexity theory, the associated notation and applicability.
>
> A little bit of sloppy O(1), O(log N) and O(N), is ok, but for anything more than that it becomes more confusing than useful. E.g. the effects are not necessarily measurable for your program. You need more accurate descriptions to understand the effects than O(N^2).

I never said that you need to be an expert, but at least you should be able to tell the difference between O(n) and O(1) like in this particular case. This is very basic stuff.

December 15, 2015

Re: We need better documentation for functions with ranges and templates

Posted by rumbu
in reply to ZombineDev

rumbu

Posted in reply to ZombineDev

On Tuesday, 15 December 2015 at 12:28:02 UTC, ZombineDev wrote:
> On Tuesday, 15 December 2015 at 11:26:04 UTC, rumbu wrote:
>>
>> Looking at the .net source code, the Count extension method is also doing a best effort "count" by querying the ICollection interface.
>
> Yes, I have looked at the source code, before writing this, so I knew exactly how it worked. In short : terrible, because it relies only on OOP. But that's not the point. Why should anyone need to look at the source code, to see what this function does? I thought this is what the docs were supposed to tell.
>
>>
>> public static int Count<TSource>(this IEnumerable<TSource> [...]
>>
>> The Remarks section clearly states the same thing:
>>
>> "If the type of source implements ICollection<T>, that implementation is used to obtain the count of elements. Otherwise, this method determines the count."
>>
>>
>> And personally, I found the MS remark more compact and more user friendly than:
>> [...]
>
> If you look at table at the beginning of page (https://dlang.org/phobos/std_range_primitives.html) you can clearly see a nice concise description of the function. Even if you don't know complexity theory there's the word "Compute" which should give you an idea that the function performs some non-trivial amount of work. Unlike:
>
>> Returns the number of elements in a sequence.
>
> Which implies that it only returns a number - almost like an ordinary getter property. I am scared to think that if back then C# got extension properties, it might have been implemented as such.
>
>> Not everybody is licensed in computational complexity theory to understand what O(n) means.
>
> LOL. Personally, I would never want to use any software written by a programmer, who can't tell the difference.
>
> Well ok, let's consider a novice programmer who hasn't studied yet complexity theory.
>
> Option A: They look at the documentation and see there's some strange O(n) thing that they don't know. They look it up in google and find the wonderful world of complexity theory. They become more educated and are grateful the people who wrote the documentation for describing more accurately the requirements of the function. That way they can easily decide how using such function would impact the performance of their system.
>
> Option B: They look at the documentation and see that there's some strange O(n) thing that they don't know. They decide that it's extremely inhumane for the docs to expect such significant knowledge from the reader and they decide to quit. Such novices that do not want to learn are better off choosing a different profession, than inflicting their poor written software on the world.

We are talking about a better documentation, not about the C# vs D performance, we already know the winner. Since C# is an OOP-only language, there is only one way to do reflection - using OOP, (voluntarily ignoring the fact that NGen will reduce this call to a simple memory read in case of arrays).

Your affirmation:

> the docs don't even bother to mention that it is almost always O(n), because non of the > Enumerable extention methods preserve the underlying ICollection interace

was false and you don't need to look to the source code to find out, the Remarks section is self-explanatory:

"If the type of source implements ICollection<T>, that implementation is used to obtain the count of elements. Otherwise, this method determines the count."

This is a *good* documentation:
- "Count" is a better name than "walkLength"; every other programming language will use concepts similar to count, cnt, length, len.
- You don't need to understand computer science terms to find out what a function does;
- If you are really interested about more than finding out the number of elements, there is a performance hint in the Remarks section.
- Links are provided to concepts: even the return type (int) has a link.
- It clearly states what's happening if the range is not defined
- It clearly states what's happening if the range contains more than int.max elements

On the contrary, the D documentation, introduces a bunch of non-linked concepts, but it tells me that it's possible to perform O(n) evaluations:
- isInputRange
- isInfiniteRange
- hasLength
- empty
- popFront

There is no indication what happens if the range is undefined in D docs. In fact, inconsistent behavior:
- it will return 0 in case of null arrays;
- it will throw AccessViolation for null ranges (or probably segfault on Linux);

There is no indication what happens if the range contains more than size_t.max elements:
- integer overflow;

Like someone said: D has genius programmers, but worst marketers.

December 15, 2015

Re: We need better documentation for functions with ranges and templates

Posted by Andrei Alexandrescu
in reply to rumbu

Andrei Alexandrescu

Posted in reply to rumbu

On 12/15/15 9:03 AM, rumbu wrote:
> On Tuesday, 15 December 2015 at 12:28:02 UTC, ZombineDev wrote:
>> On Tuesday, 15 December 2015 at 11:26:04 UTC, rumbu wrote:
>>>
>>> Looking at the .net source code, the Count extension method is also
>>> doing a best effort "count" by querying the ICollection interface.
>>
>> Yes, I have looked at the source code, before writing this, so I knew
>> exactly how it worked. In short : terrible, because it relies only on
>> OOP. But that's not the point. Why should anyone need to look at the
>> source code, to see what this function does? I thought this is what
>> the docs were supposed to tell.
>>
>>>
>>> public static int Count<TSource>(this IEnumerable<TSource> [...]
>>>
>>> The Remarks section clearly states the same thing:
>>>
>>> "If the type of source implements ICollection<T>, that implementation
>>> is used to obtain the count of elements. Otherwise, this method
>>> determines the count."
>>>
>>>
>>> And personally, I found the MS remark more compact and more user
>>> friendly than:
>>> [...]
>>
>> If you look at table at the beginning of page
>> (https://dlang.org/phobos/std_range_primitives.html) you can clearly
>> see a nice concise description of the function. Even if you don't know
>> complexity theory there's the word "Compute" which should give you an
>> idea that the function performs some non-trivial amount of work. Unlike:
>>
>>> Returns the number of elements in a sequence.
>>
>> Which implies that it only returns a number - almost like an ordinary
>> getter property. I am scared to think that if back then C# got
>> extension properties, it might have been implemented as such.
>>
>>> Not everybody is licensed in computational complexity theory to
>>> understand what O(n) means.
>>
>> LOL. Personally, I would never want to use any software written by a
>> programmer, who can't tell the difference.
>>
>> Well ok, let's consider a novice programmer who hasn't studied yet
>> complexity theory.
>>
>> Option A: They look at the documentation and see there's some strange
>> O(n) thing that they don't know. They look it up in google and find
>> the wonderful world of complexity theory. They become more educated
>> and are grateful the people who wrote the documentation for describing
>> more accurately the requirements of the function. That way they can
>> easily decide how using such function would impact the performance of
>> their system.
>>
>> Option B: They look at the documentation and see that there's some
>> strange O(n) thing that they don't know. They decide that it's
>> extremely inhumane for the docs to expect such significant knowledge
>> from the reader and they decide to quit. Such novices that do not want
>> to learn are better off choosing a different profession, than
>> inflicting their poor written software on the world.
>
> We are talking about a better documentation, not about the C# vs D
> performance, we already know the winner. Since C# is an OOP-only
> language, there is only one way to do reflection - using OOP,
> (voluntarily ignoring the fact that NGen will reduce this call to a
> simple memory read in case of arrays).
>
> Your affirmation:
>
>> the docs don't even bother to mention that it is almost always O(n),
>> because non of the > Enumerable extention methods preserve the
>> underlying ICollection interace
>
> was false and you don't need to look to the source code to find out, the
> Remarks section is self-explanatory:
>
> "If the type of source implements ICollection<T>, that implementation is
> used to obtain the count of elements. Otherwise, this method determines
> the count."
>
> This is a *good* documentation:
> - "Count" is a better name than "walkLength"; every other programming
> language will use concepts similar to count, cnt, length, len.
> - You don't need to understand computer science terms to find out what a
> function does;
> - If you are really interested about more than finding out the number of
> elements, there is a performance hint in the Remarks section.
> - Links are provided to concepts: even the return type (int) has a link.
> - It clearly states what's happening if the range is not defined
> - It clearly states what's happening if the range contains more than
> int.max elements
>
> On the contrary, the D documentation, introduces a bunch of non-linked
> concepts, but it tells me that it's possible to perform O(n) evaluations:
> - isInputRange
> - isInfiniteRange
> - hasLength
> - empty
> - popFront
>
> There is no indication what happens if the range is undefined in D docs.
> In fact, inconsistent behavior:
> - it will return 0 in case of null arrays;
> - it will throw AccessViolation for null ranges (or probably segfault on
> Linux);
>
> There is no indication what happens if the range contains more than
> size_t.max elements:
> - integer overflow;

This is a great collection of clear points to improve. Thanks!

Andrei

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation