Jump to page: 1 2 3
Thread overview
Library Development: What to finish/flesh out?
Mar 17, 2011
dsimcha
Mar 17, 2011
Tomek Sowiński
Mar 17, 2011
Jason E. Aten
Mar 17, 2011
spir
Mar 17, 2011
dsimcha
Mar 18, 2011
spir
Mar 18, 2011
Walter Bright
Mar 18, 2011
Don
Mar 24, 2011
Sean Kelly
Mar 24, 2011
dsimcha
Mar 24, 2011
dsimcha
Mar 25, 2011
Sean Kelly
Mar 25, 2011
dsimcha
Mar 25, 2011
Denis Koroskin
Mar 26, 2011
dsimcha
TempAlloc (Was: Library Development: What to finish/flesh out?)
Mar 27, 2011
dsimcha
Mar 28, 2011
Masahiro Nakagawa
Mar 28, 2011
dsimcha
Mar 18, 2011
filgood
Mar 19, 2011
Caligo
Mar 26, 2011
Jonathan M Davis
Mar 26, 2011
Johan Granberg
Mar 26, 2011
dsimcha
Mar 26, 2011
spir
March 17, 2011
I've accumulated a bunch of little libraries via various evening and weekend hacking projects over the past year or so, in various states of completion. Most are things I'm at least half-considering for Phobos, though some belong as third-party libs.  I definitely don't have time to finish/flesh out all of them anytime soon, so I've decided to ask the community what to prioritize. Below is a summary of everything I've been working on, with its current level of completion.  Please let me know the following:

1.  A relative ordering of how useful you think these libraries would be to the community.

2.  In absolute terms, would you find this useful?

3.  For the Phobos candidates, whether they're general enough to belong in the **standard** library.

List in order from most to least finished:

1.  Rational:  A library for handling rational numbers exactly.  Templated on integer type, can use BigInts for guaranteed accuracy, or fixed-width integers for more speed where the denominator and numerator will be small.  Completion state:  Mostly finished.  Just need to fix a litte bit rot and submit for review.  (Phobos candidate)

2.  RandAA:  A hash table implementation with deterministic memory management, based on randomized probing.  Main advantage over builtin AAs is that it plays much nicer with the GC and multithreaded programs.  Lookup times are also expected O(1) no matter how many collisions exist in modulus hash space, as long as there are few collisions in full 32- or 64-bit hash space.  Completion state:  Mostly finished.  Just needs a little doc improvement, a few benchmarks and submission for review.  (Phobos candidate)

3.  TempAlloc:  A memory allocator based on a thread-local segmented stack, useful for allocating large temporary buffers in things like numerics code. Also comes with a hash table, hash set and AVL tree optimized for this allocation scheme.  The advantages over plain old stack allocation are that it's independent of function calls (meaning you can return pointers to TempAlloc-allocated memory from a function, etc.) and it's segmented, meaning you can allocate huge buffers w/o risking stack overflow.  Its main weakness is that this stack is not scanned by the GC, meaning that you can't store the only reference to a GC-allocated piece of memory here.  However, in practice large arrays of primitives are an extremely common case in performance-critical code.  I find this module immensely useful in dstats and Lars Kyllingstad uses it in SciD.  Getting it into Phobos would make it easy for other scientific/numerics code to use it.  Completion state:  Working and used.  Needs a litte cleanup and documentation.  (Phobos candidate)

4.  Streaming CSV Parser:  Parses CSV files as they're read in, a few convenience functions for extracting columns into structs.  If Phobos every gets SQLite support I'll probably add sugar for turning a CSV file into an SQLite database, too.  Completion state:  Prototype working, needs testing, cleanup and documentation.  (Phobos candidate)

5.  Matrix operations:  SciD improvements that allow you to write matrix operations that look like normal math/MATLAB and optimizes them via expression templates so that a minimal number of temporary matrices are created. Uses/will use BLAS for multiplication.  Completion state:  Addition implemented.  Multiplication not.

6.  Machine learning:  Decision trees, KNN, Random Forest, Logistic Regression, SVM, Naive Bayes, etc.  This would be a dstats module.  Completion state:  Decision trees prototyped, logistic regression working.

7.  std.mixins:  Mixins for commonly needed boilerplate code.  I stopped working on this when Andrei suggested that making a collection of mixins into a module is a bad idea.  I've thought about it some more and I respectfully disagree.  std.mixins would be a one-stop shop for pretty much any boilerplate you need to inject, and most of this code doesn't fit in any other obvious place.  Completion state:  A few things (struct comparison, simple class constructors, Singleton pattern) prototyped.  (Phobos candidate)

8.  GZip support in std.file:  I'll leave the stream stuff for someone else,
but just simple stuff like read(), write(), append() IMHO belongs in std.file.
 Completion state:  Not started, but this is the easiest of the bunch to
implement.  (Phobos candidate)
March 17, 2011
dsimcha napisał:

> I've accumulated a bunch of little libraries via various evening and weekend hacking projects over the past year or so, in various states of completion. Most are things I'm at least half-considering for Phobos, though some belong as third-party libs.  I definitely don't have time to finish/flesh out all of them anytime soon, so I've decided to ask the community what to prioritize. Below is a summary of everything I've been working on, with its current level of completion.  Please let me know the following:
> 
> 1.  A relative ordering of how useful you think these libraries would be to the community.
> 
> 2.  In absolute terms, would you find this useful?
> 
> 3.  For the Phobos candidates, whether they're general enough to belong in the **standard** library.
> 
> List in order from most to least finished:
> 
> 1.  Rational:  A library for handling rational numbers exactly.  Templated on integer type, can use BigInts for guaranteed accuracy, or fixed-width integers for more speed where the denominator and numerator will be small.  Completion state:  Mostly finished.  Just need to fix a litte bit rot and submit for review.  (Phobos candidate)

I'd find it useful. As for its presence in Phobos, I'm uncertain if it's in enough demand.

> 2.  RandAA:  A hash table implementation with deterministic memory management, based on randomized probing.  Main advantage over builtin AAs is that it plays much nicer with the GC and multithreaded programs.  Lookup times are also expected O(1) no matter how many collisions exist in modulus hash space, as long as there are few collisions in full 32- or 64-bit hash space.  Completion state:  Mostly finished.  Just needs a little doc improvement, a few benchmarks and submission for review.  (Phobos candidate)

Useful for me and in Phobos.

> 3.  TempAlloc:  A memory allocator based on a thread-local segmented stack, useful for allocating large temporary buffers in things like numerics code. Also comes with a hash table, hash set and AVL tree optimized for this allocation scheme.  The advantages over plain old stack allocation are that it's independent of function calls (meaning you can return pointers to TempAlloc-allocated memory from a function, etc.) and it's segmented, meaning you can allocate huge buffers w/o risking stack overflow.  Its main weakness is that this stack is not scanned by the GC, meaning that you can't store the only reference to a GC-allocated piece of memory here.  However, in practice large arrays of primitives are an extremely common case in performance-critical code.  I find this module immensely useful in dstats and Lars Kyllingstad uses it in SciD.  Getting it into Phobos would make it easy for other scientific/numerics code to use it.  Completion state:  Working and used.  Needs a litte cleanup and documentation.  (Phobos candidate)

Useful for me, don't know if for everyone else.

> 4.  Streaming CSV Parser:  Parses CSV files as they're read in, a few convenience functions for extracting columns into structs.  If Phobos every gets SQLite support I'll probably add sugar for turning a CSV file into an SQLite database, too.  Completion state:  Prototype working, needs testing, cleanup and documentation.  (Phobos candidate)

You mean a lazy slurp? It'd be useful for everyone.

> 5.  Matrix operations:  SciD improvements that allow you to write matrix operations that look like normal math/MATLAB and optimizes them via expression templates so that a minimal number of temporary matrices are created. Uses/will use BLAS for multiplication.  Completion state:  Addition implemented.  Multiplication not.

It is worth considering standardizing at least matrix expressions in Phobos. The motivation is analogous to ranges -- to run an algorithm from lib A on a matrix container from lib B. C++ would be green with envy.

I'd be glad to be part of the effort once I'm done with xml.

> 6.  Machine learning:  Decision trees, KNN, Random Forest, Logistic Regression, SVM, Naive Bayes, etc.  This would be a dstats module.  Completion state:  Decision trees prototyped, logistic regression working.

I'd find it useful, I think anyone who's into this would too.

> 7.  std.mixins:  Mixins for commonly needed boilerplate code.  I stopped working on this when Andrei suggested that making a collection of mixins into a module is a bad idea.  I've thought about it some more and I respectfully disagree.  std.mixins would be a one-stop shop for pretty much any boilerplate you need to inject, and most of this code doesn't fit in any other obvious place.  Completion state:  A few things (struct comparison, simple class constructors, Singleton pattern) prototyped.  (Phobos candidate)

I'm afraid I also think functionality should be categorized by the purpose it serves rather than implementation technique.

> 8.  GZip support in std.file:  I'll leave the stream stuff for someone else,
> but just simple stuff like read(), write(), append() IMHO belongs in std.file.
>  Completion state:  Not started, but this is the easiest of the bunch to
> implement.  (Phobos candidate)

I don't know really...

-- 
Tomek

March 17, 2011
On Thu, 17 Mar 2011 15:33:10 +0000, dsimcha wrote:
> 5.  Matrix operations:  SciD improvements that allow you to write matrix operations that look like normal math/MATLAB and optimizes them via expression templates so that a minimal number of temporary matrices are created. Uses/will use BLAS for multiplication.  Completion state: Addition implemented.  Multiplication not.

Nice matrix ops get my vote for what I would find most useful. Having matrices with m.rownames and m.colnames (similar to R's rownames(m) and colnames(m) for a matrix m), would be great too.
March 17, 2011
I'd have much use for both below.

On 03/17/2011 04:33 PM, dsimcha wrote:
> 1.  Rational:  A library for handling rational numbers exactly.  Templated on
> integer type, can use BigInts for guaranteed accuracy, or fixed-width integers
> for more speed where the denominator and numerator will be small.  Completion
> state:  Mostly finished.  Just need to fix a litte bit rot and submit for
> review.  (Phobos candidate)

For decimal exactitude, what about plain fixed point (with decimal factor and binary mantissa)?

> 2.  RandAA:  A hash table implementation with deterministic memory management,
> based on randomized probing.  Main advantage over builtin AAs is that it plays
> much nicer with the GC and multithreaded programs.  Lookup times are also
> expected O(1) no matter how many collisions exist in modulus hash space, as
> long as there are few collisions in full 32- or 64-bit hash space.  Completion
> state:  Mostly finished.  Just needs a little doc improvement, a few
> benchmarks and submission for review.  (Phobos candidate)

How complicated would it be to add (optional) support for keeping insertion order (for iteration only)? Thought at a // array with pointers to the cells holding key/value pairs.

Denis
-- 
_________________
vita es estrany
spir.wikidot.com

March 17, 2011
On 3/17/2011 6:18 PM, spir wrote:
> I'd have much use for both below.
>
> On 03/17/2011 04:33 PM, dsimcha wrote:
>> 1. Rational: A library for handling rational numbers exactly.
>> Templated on
>> integer type, can use BigInts for guaranteed accuracy, or fixed-width
>> integers
>> for more speed where the denominator and numerator will be small.
>> Completion
>> state: Mostly finished. Just need to fix a litte bit rot and submit for
>> review. (Phobos candidate)
>
> For decimal exactitude, what about plain fixed point (with decimal
> factor and binary mantissa)?
>

I wouldn't mind having this, but I see it as completely orthogonal to rational numbers and don't have any near-term intentions of implementing it.

>> 2. RandAA: A hash table implementation with deterministic memory
>> management,
>> based on randomized probing. Main advantage over builtin AAs is that
>> it plays
>> much nicer with the GC and multithreaded programs. Lookup times are also
>> expected O(1) no matter how many collisions exist in modulus hash
>> space, as
>> long as there are few collisions in full 32- or 64-bit hash space.
>> Completion
>> state: Mostly finished. Just needs a little doc improvement, a few
>> benchmarks and submission for review. (Phobos candidate)
>
> How complicated would it be to add (optional) support for keeping
> insertion order (for iteration only)? Thought at a // array with
> pointers to the cells holding key/value pairs.

It's a good idea, but IMHO something like this should be templated on the type of the associative array and work with builtin AAs, etc., too.  It should be a decorator or something:

/**
Wraps any type that conforms to the duck interface of an associative
array to preserve ordering for iteration.
*/
struct OrderedAA(AA) {
    alias typeof(AA.init.keys.front) K;
    alias typeof(AA.init.values.front) V;
    K[] order;
    AA aa;

    void opIndexAssign(V val, K key) {
        if(!(key in aa)) {
            order ~= key;
        }

        aa[key] = val;
    }

    // opApply, etc.
}

March 18, 2011
On 3/17/2011 8:33 AM, dsimcha wrote:
> 8.  GZip support in std.file:  I'll leave the stream stuff for someone else,
> but just simple stuff like read(), write(), append() IMHO belongs in std.file.
>   Completion state:  Not started, but this is the easiest of the bunch to
> implement.  (Phobos candidate)

I'd definitely like to see gzip support in Phobos. But it shouldn't be in std.file, as there are many compression schemes in use besides gzip, and such should be composable with the file interface using ranges.

In fact, gzip has nothing necessarily to do with files. It compresses/decompresses a stream (a range in D). Whether that is a file or something else is quite irrelevant to gzip. DMD, for example, uses a simple compressor for long symbol names.

I suggest making a package for compressors, and then have gzip be a module within that, as in:

   std.compressor.gzip
March 18, 2011
dsimcha wrote:
> I've accumulated a bunch of little libraries via various evening and weekend
> hacking projects over the past year or so, in various states of completion.
> Most are things I'm at least half-considering for Phobos, though some belong
> as third-party libs.  I definitely don't have time to finish/flesh out all of
> them anytime soon, so I've decided to ask the community what to prioritize.
> Below is a summary of everything I've been working on, with its current level
> of completion.  Please let me know the following:

> 3.  TempAlloc:  A memory allocator based on a thread-local segmented stack,
> useful for allocating large temporary buffers in things like numerics code.
> Also comes with a hash table, hash set and AVL tree optimized for this
> allocation scheme.  The advantages over plain old stack allocation are that
> it's independent of function calls (meaning you can return pointers to
> TempAlloc-allocated memory from a function, etc.) and it's segmented, meaning
> you can allocate huge buffers w/o risking stack overflow.  Its main weakness
> is that this stack is not scanned by the GC, meaning that you can't store the
> only reference to a GC-allocated piece of memory here.  However, in practice
> large arrays of primitives are an extremely common case in
> performance-critical code.  I find this module immensely useful in dstats and
> Lars Kyllingstad uses it in SciD.  Getting it into Phobos would make it easy
> for other scientific/numerics code to use it.  Completion state:  Working and
> used.  Needs a litte cleanup and documentation.  (Phobos candidate)

This is #1. Far and away. Belongs in druntime.
I would use it instantly in BigInt.
March 18, 2011
On Thu, 17 Mar 2011 15:33:10 +0000, dsimcha wrote:

> I've accumulated a bunch of little libraries via various evening and weekend hacking projects over the past year or so, in various states of completion. Most are things I'm at least half-considering for Phobos, though some belong as third-party libs.  I definitely don't have time to finish/flesh out all of them anytime soon, so I've decided to ask the community what to prioritize. Below is a summary of everything I've been working on, with its current level of completion.  Please let me know the following:
> 
> 1.  A relative ordering of how useful you think these libraries would be to the community.

In order:

 1. TempAlloc
 2. Matrix ops (I'm biased here, of course...)
 3. RandAA
 4. CSV parser
 5. Rational

That said, it does make sense to start with the things which require the least amount of work.  If it would take you half an hour to complete the rationals lib, for instance, that may be a good starting point.

Regarding std.mixins, I have to agree with Andrei and the others that code should be organised by functionality and not implementation method.

Having thought some more about it, I also think std.file is not the right place for GZip support.  Phobos needs an std.compression package/module, and a method for bulk reading/writing of gzip files may well be a good start.

Finally, I know next to nothing about machine learning, so I won't express any opinion about it.


> 2.  In absolute terms, would you find this useful?

Absolutely!


> 3.  For the Phobos candidates, whether they're general enough to belong in the **standard** library.

I agree with Don that TempAlloc is a candidate for druntime.  Other than that, yes.

I note that others have suggested that the matrix stuff go into Phobos. As long as it depends on BLAS, I would say that's out of the question.

-Lars
March 18, 2011
On 03/17/2011 11:25 PM, dsimcha wrote:
> On 3/17/2011 6:18 PM, spir wrote:
>> I'd have much use for both below.
>>
>> On 03/17/2011 04:33 PM, dsimcha wrote:
>>> 1. Rational: A library for handling rational numbers exactly.
>>> Templated on
>>> integer type, can use BigInts for guaranteed accuracy, or fixed-width
>>> integers
>>> for more speed where the denominator and numerator will be small.
>>> Completion
>>> state: Mostly finished. Just need to fix a litte bit rot and submit for
>>> review. (Phobos candidate)
>>
>> For decimal exactitude, what about plain fixed point (with decimal
>> factor and binary mantissa)?
>>
>
> I wouldn't mind having this, but I see it as completely orthogonal to rational
> numbers and don't have any near-term intentions of implementing it.

Right.

>>> 2. RandAA: A hash table implementation with deterministic memory
>>> management,
>>> based on randomized probing. Main advantage over builtin AAs is that
>>> it plays
>>> much nicer with the GC and multithreaded programs. Lookup times are also
>>> expected O(1) no matter how many collisions exist in modulus hash
>>> space, as
>>> long as there are few collisions in full 32- or 64-bit hash space.
>>> Completion
>>> state: Mostly finished. Just needs a little doc improvement, a few
>>> benchmarks and submission for review. (Phobos candidate)
>>
>> How complicated would it be to add (optional) support for keeping
>> insertion order (for iteration only)? Thought at a // array with
>> pointers to the cells holding key/value pairs.
>
> It's a good idea, but IMHO something like this should be templated on the type
> of the associative array and work with builtin AAs, etc., too. It should be a
> decorator or something:
>
> /**
> Wraps any type that conforms to the duck interface of an associative
> array to preserve ordering for iteration.
> */
> struct OrderedAA(AA) {
> alias typeof(AA.init.keys.front) K;
> alias typeof(AA.init.values.front) V;
> K[] order;
> AA aa;
>
> void opIndexAssign(V val, K key) {
> if(!(key in aa)) {
> order ~= key;
> }
>
> aa[key] = val;
> }
>
> // opApply, etc.
> }

Right too. But the reason I have not implemented it yet is I don't want to keep a // array of keys, which require AA lookup for each key. Instead, I thought at an array of pointers to where the (key:value) cells are stored (cell "buckets"), to avoid AA key lookups. This, I guess, requires tweaking the implementation of the actual data structure. Reason why I asked you as you are dfevelopping a new one (so, you may have such a use case in mind before design is frozen).
I have not yet had a look at the implementation of builtin AAs to see whether this would be easy. I guess not: it just require catching the very moment where a new pair is placed into a given bucket corresponding to its hash value. (And possibly the same thing at re-hash time.)

Denis
-- 
_________________
vita es estrany
spir.wikidot.com

March 18, 2011
Hi Lars,

I agree on your order....but would like to see Matrix ops in Phobos over time (my understanding was that it can work without BLAS (just slower), people can always in BLAS when they need to extra performance, no?).

David, thanks a lot for your hard work...



On 18/03/2011 09:26, Lars T. Kyllingstad wrote:
> On Thu, 17 Mar 2011 15:33:10 +0000, dsimcha wrote:
>
>> I've accumulated a bunch of little libraries via various evening and
>> weekend hacking projects over the past year or so, in various states of
>> completion. Most are things I'm at least half-considering for Phobos,
>> though some belong as third-party libs.  I definitely don't have time to
>> finish/flesh out all of them anytime soon, so I've decided to ask the
>> community what to prioritize. Below is a summary of everything I've been
>> working on, with its current level of completion.  Please let me know
>> the following:
>>
>> 1.  A relative ordering of how useful you think these libraries would be
>> to the community.
>
> In order:
>
>   1. TempAlloc
>   2. Matrix ops (I'm biased here, of course...)
>   3. RandAA
>   4. CSV parser
>   5. Rational
>
> That said, it does make sense to start with the things which require the
> least amount of work.  If it would take you half an hour to complete the
> rationals lib, for instance, that may be a good starting point.
>
> Regarding std.mixins, I have to agree with Andrei and the others that
> code should be organised by functionality and not implementation method.
>
> Having thought some more about it, I also think std.file is not the right
> place for GZip support.  Phobos needs an std.compression package/module,
> and a method for bulk reading/writing of gzip files may well be a good
> start.
>
> Finally, I know next to nothing about machine learning, so I won't
> express any opinion about it.
>
>
>> 2.  In absolute terms, would you find this useful?
>
> Absolutely!
>
>
>> 3.  For the Phobos candidates, whether they're general enough to belong
>> in the **standard** library.
>
> I agree with Don that TempAlloc is a candidate for druntime.  Other than
> that, yes.
>
> I note that others have suggested that the matrix stuff go into Phobos.
> As long as it depends on BLAS, I would say that's out of the question.
>
> -Lars

« First   ‹ Prev
1 2 3