Thread overview
Algorithms in the std lib
Feb 17, 2009
bearophile
Feb 17, 2009
bearophile
Feb 17, 2009
Daniel de Kok
Feb 17, 2009
bearophile
Feb 17, 2009
bearophile
Feb 17, 2009
Daniel de Kok
February 17, 2009
Seeing how the Phobos is being worked on, I think it's a good moment to think about other things that it can eventually contain.
In all the languages that I use I slowly create a lib of algorithms and data structures and similar stuff.

Every programmer has special needs that require external modules, a standard lib can't contain everything. But often you can find two levels of needs: a more general one and a more specialized one. For example in the std lib of Python you can find lazy generators of permutations and combinators. They are often useful, but they can't be enough for people that work a lot with with combinatorics. Such people have to write or download a module with much more specific stuff.

So I think the std lib of D may enjoy some more algorithms, to fulfill better the "general needs".
Some of the following algorithms may be already present in Phobos and/or Tango, in such case please ignore them.

The following stuff is already implemented in my dlibs, and I use it:

bisect:
  Return the index where to insert item x in an array a, assuming a is sorted.

isInSorted:
  Given a sorted array, an item, and an optional key callable (like the key of bisect()), used a binary search and answers true if the item is present, false otherwise.

binomial(n, k)
factorial
combinations/xcombinations
permutations/xpermutations
subsets/xsubsets
xlexicalPermutations
apply (see lisp)
mapApply(map an apply on an iterable)

convexHull2D
polygonCentroid
RGB2HSV
HSBV2RGB
floodFill

all/any
  (all or any of the items of an iterable are true or is true a given predicate on them)

nextPow2
isPow2
powMod

reverseBits
  Quickly reverse the bits in a ubyte/uint.


partition/xpartition
  splits an iterable in sub-iterables of fixed length

gcd

xprimes (yields primes lazily at hyper speed)

nrank
median/medianObj

thousands(n)  (1000000 => "1_000_000")
joinArray
xsplit
numSorted (smart sort foo1x comes before foo10x)

allEqual
filterAA (associative arrays)
get a.get(2, "def") ==> "def" (for associative arrays)

rotateLeft/rotateRight (for arrays)
AAintersection

maxs
  Given an iterable 'iter', return a new array of the equally maximum items of 'iter'.
mins

xgroupBy/xgroupBync
  This is really useful: it takes an iterable and an optional predicate. This iterable object yields consecutive key(item) and group arrays from the given iterable that have the same result or are all true or all false (it can also yield just the group arrays, or it can yield the index too).

pop (removes last of an array/AA and returns it)
prepend (for arrays, adds at the start)

There is a refined function to flatten nested iterables.

----------------

The following stuff is present in my Python "bag of tricks", I'll probably want to translate them for D1 too:

baseConvert (base[1,64] => base[1, 64])
To convert numbers to strings from/to any base.

K-means

IntervalMap (map of intervals, a kind of associative data structure)

toposort (topological sort, already in Graph below, but also as a single external function for simpler usages).

Odict (associative array where items are also ordered according to the insertion order, they are in a double linked list)

mean
mode
stdev

digitalLine
  Brensenham line algorithm, returns an iterator of (x,y) coords (not useful for the screen)
digitalCircle
segmentSegmentIntersection

triangleContains (predicate, is point into triangle?)

A directed Graph data structure, the class contains most common graph algorithms.
http://sourceforge.net/projects/pynetwork/
(I have already partially implemented this in D1).

-----------------

You may want to add some things to this list, or to remove things you think aren't useful enough for lot of people.

Bye,
bearophile

IntervalMap (map of intervals, a kind of associative data structure)

toposort (topological sort, already in Graph below, but also as a single external function for simpler usages).

Odict (associative array where items are also ordered according to the insertion order, they are in a double linked list)

mean
mode
stdev

digitalLine
  Brensenham line algorithm, returns an iterator of (x,y) coords (not useful for the screen)
digitalCircle
segmentSegmentIntersection

triangleContains (predicate, is point into triangle?)

A directed Graph data structure, the class contains most common graph algorithms.
http://sourceforge.net/projects/pynetwork/
(I have already partially implemented this in D1).

-----------------

You may want to add some things to this list, or to remove things you think aren't useful enough for lot of people.Soggetto: Algorithms in the std lib

Seeing how the Phobos is being worked on, I think it's a good moment to think about other things that it can eventually contain.
In all the languages that I use I slowly create a lib of algorithms and data structures and similar stuff.

Every programmer has special needs that require external modules, a standard lib can't contain everything. But often you can find two levels of needs: a more general one and a more specialized one. For example in the std lib of Python you can find lazy generators of permutations and combinators. They are often useful, but they can't be enough for people that work a lot with with combinatorics. Such people have to write or download a module with much more specific stuff.

So I think the std lib of D may enjoy some more algorithms, to fulfill better the "general needs".
Some of the following algorithms may be already present in Phobos and/or Tango, in such case please ignore them.

The following stuff is already implemented in my dlibs, and I use it:

bisect:
  Return the index where to insert item x in an array a, assuming a is sorted.

isInSorted:
  Given a sorted array, an item, and an optional key callable (like the key of bisect()), used a binary search and answers true if the item is present, false otherwise.

binomial(n, k)
factorial
combinations/xcombinations
permutations/xpermutations
subsets/xsubsets
xlexicalPermutations
apply (see lisp)
mapApply(map an apply on an iterable)

convexHull2D
polygonCentroid
RGB2HSV
HSBV2RGB
floodFill

all/any
  (all or any of the items of an iterable are true or is true a given predicate on them)

nextPow2
isPow2
powMod

reverseBits
  Quickly reverse the bits in a ubyte/uint.


partition/xpartition
  splits an iterable in sub-iterables of fixed length

gcd

xprimes (yields primes lazily at hyper speed)

nrank
median/medianObj

thousands(n)  (1000000 => "1_000_000")
joinArray
xsplit
numSorted (smart sort foo1x comes before foo10x)

allEqual
filterAA (associative arrays)
get a.get(2, "def") ==> "def" (for associative arrays)

rotateLeft/rotateRight (for arrays)
AAintersection

maxs
  Given an iterable 'iter', return a new array of the equally maximum items of 'iter'.
mins

xgroupBy/xgroupBync
  This is really useful: it takes an iterable and an optional predicate. This iterable object yields consecutive key(item) and group arrays from the given iterable that have the same result or are all true or all false (it can also yield just the group arrays, or it can yield the index too).

pop (removes last of an array/AA and returns it)
prepend (for arrays, adds at the start)

There is a refined function to flatten nested iterables.

----------------

The following stuff is present in my Python "bag of tricks", I'll write some of this for D1 too:

baseConvert (base[1,64] => base[1, 64])
To convert numbers to strings from/to any base.

K-means

IntervalMap (map of intervals, a kind of associative data structure)

toposort (topological sort, already in Graph below, but also as a single external function for simpler usages).

Odict (associative array where items are also ordered according to the insertion order, they are in a double linked list)

mean
mode
stdev

digitalLine
  Brensenham line algorithm, returns an iterator of (x,y) coords (not useful for the screen)
digitalCircle
segmentSegmentIntersection

triangleContains (predicate, is point into triangle?)

A directed Graph data structure, the class contains most common graph algorithms.
http://sourceforge.net/projects/pynetwork/
(I have already partially implemented this in D1).

-----------------

You may want to add some things to this list, or to remove things you think aren't useful enough for lot of people.

Bye,
bearophile
February 17, 2009
bearophile wrote:
> Seeing how the Phobos is being worked on, I think it's a good moment to think about other things that it can eventually contain.
> In all the languages that I use I slowly create a lib of algorithms and data structures and similar stuff.
[snip double pasted post]

Ok, I can implement all that... but twice?!? :o)

Andrei
February 17, 2009
Andrei Alexandrescu:
> Ok, I can implement all that... but twice?!? :o)

Sorry for the double posting.
And regarding the implementation, first of all it can be seen if you and other people think it's generally useful stuff. Later I'll offer some/most implementations :-)

Bye,
bearophile
February 17, 2009
On Tue, Feb 17, 2009 at 5:50 PM, bearophile <bearophileHUGS@lycos.com> wrote:
> Andrei Alexandrescu:
>> Ok, I can implement all that... but twice?!? :o)
>
> Sorry for the double posting.
> And regarding the implementation, first of all it can be seen if you and other people think it's generally
> useful stuff. Later I'll offer some/most implementations :-)

How about suffix array construction? It's a very underused useful data structure where you compose a parallel array to data array that is sorted by suffix. This is often used in natural language processing to be able to quickly count the frequency of arbitrary long n-grams without having to store counts of n-grams in a large hash map or traversing the data array in linear order.

This is a probably pretty lame implementation that I made as one of my first tiny D programs (I have only started to learn D in tiny bits):

http://static.danieldk.org/snippets/d/sarr.d

Take care,
Daniel

(P.s. there are some algorithms that can construct suffix arrays faster than a regular sort, but they usually have a lot of requirements, like perfect hashing of all the elements.)
February 17, 2009
Daniel de Kok:
> How about suffix array construction?

I agree that it's useful to have in a std lib (I have had to implement it some times, but not in D yet).
(In those dlibs there are other things like Kd-trees and Range Minimum Query algorithms and data structures, that despite being nice and all are probably not useful to enough people).

Bye,
bearophile
February 17, 2009
Daniel de Kok:
> http://static.danieldk.org/snippets/d/sarr.d

That code needs much more unittests, that test all corner cases you can think of too.

Bye,
bearophile
February 17, 2009
Daniel de Kok wrote:
> On Tue, Feb 17, 2009 at 5:50 PM, bearophile <bearophileHUGS@lycos.com> wrote:
>> Andrei Alexandrescu:
>>> Ok, I can implement all that... but twice?!? :o)
>> Sorry for the double posting.
>> And regarding the implementation, first of all it can be seen if you and other people think it's generally
>> useful stuff. Later I'll offer some/most implementations :-)
> 
> How about suffix array construction? It's a very underused useful data
> structure where you compose a parallel array to data array that is
> sorted by suffix. This is often used in natural language processing to
> be able to quickly count the frequency of arbitrary long n-grams
> without having to store counts of n-grams in a large hash map or
> traversing the data array in linear order.
> 
> This is a probably pretty lame implementation that I made as one of my
> first tiny D programs (I have only started to learn D in tiny bits):
> 
> http://static.danieldk.org/snippets/d/sarr.d
> 
> Take care,
> Daniel
> 
> (P.s. there are some algorithms that can construct suffix arrays
> faster than a regular sort, but they usually have a lot of
> requirements, like perfect hashing of all the elements.)

Yah; in fact I already have a Trie implementation (prefix tree) hanging on around std. I'm just afraid to bite the bullet in choosing a container overall design. I'd need more experience, and currently bugs in dmd's copy constructors prevent most experimentation with value types.


Andrei
February 17, 2009
On Tue, Feb 17, 2009 at 7:05 PM, bearophile <bearophileHUGS@lycos.com> wrote:
> Daniel de Kok:
>> http://static.danieldk.org/snippets/d/sarr.d
>
> That code needs much more unittests, that test all corner cases you can think of too.

Indeed, thanks for the suggestion!