March 19, 2011 Re: Library Development: What to finish/flesh out? | ||||
|---|---|---|---|---|
| ||||
Posted in reply to filgood Attachments:
| On Fri, Mar 18, 2011 at 2:19 PM, filgood <filgood@somewhere.net> wrote: > Hi Lars, > > I agree on your order....but would like to see Matrix ops in Phobos over time (my understanding was that it can work without BLAS (just slower), people can always in BLAS when they need to extra performance, no?). > > David, thanks a lot for your hard work... > > > How about Eigen? http://eigen.tuxfamily.org/index.php?title=Benchmark Those benchmarks are old, but still. Eigen v3 performs even better and they've added multi-threading and tons of other features and enhancements. | |||
March 24, 2011 Re: Library Development: What to finish/flesh out? | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Don | On Mar 17, 2011, at 11:56 PM, Don wrote:
> dsimcha wrote:
>> I've accumulated a bunch of little libraries via various evening and weekend hacking projects over the past year or so, in various states of completion. Most are things I'm at least half-considering for Phobos, though some belong as third-party libs. I definitely don't have time to finish/flesh out all of them anytime soon, so I've decided to ask the community what to prioritize. Below is a summary of everything I've been working on, with its current level of completion. Please let me know the following:
>
>> 3. TempAlloc: A memory allocator based on a thread-local segmented stack, useful for allocating large temporary buffers in things like numerics code. Also comes with a hash table, hash set and AVL tree optimized for this allocation scheme. The advantages over plain old stack allocation are that it's independent of function calls (meaning you can return pointers to TempAlloc-allocated memory from a function, etc.) and it's segmented, meaning you can allocate huge buffers w/o risking stack overflow. Its main weakness is that this stack is not scanned by the GC, meaning that you can't store the only reference to a GC-allocated piece of memory here. However, in practice large arrays of primitives are an extremely common case in performance-critical code. I find this module immensely useful in dstats and Lars Kyllingstad uses it in SciD. Getting it into Phobos would make it easy for other scientific/numerics code to use it. Completion state: Working and used. Needs a litte cleanup and documentation. (Phobos candidate)
>
> This is #1. Far and away. Belongs in druntime.
Stuff like this is why core.memory isn't called core.gc.
| |||
March 24, 2011 Re: Library Development: What to finish/flesh out? | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Sean Kelly | == Quote from Sean Kelly (sean@invisibleduck.org)'s article
> On Mar 17, 2011, at 11:56 PM, Don wrote:
> > dsimcha wrote:
> >> I've accumulated a bunch of little libraries via various evening and
> weekend
> >> hacking projects over the past year or so, in various states of
> completion.
> >> Most are things I'm at least half-considering for Phobos, though some
> belong
> >> as third-party libs. I definitely don't have time to finish/flesh
> out all of
> >> them anytime soon, so I've decided to ask the community what to
> prioritize.
> >> Below is a summary of everything I've been working on, with its
> current level
> >> of completion. Please let me know the following:
> >
> >> 3. TempAlloc: A memory allocator based on a thread-local segmented
> stack,
> >> useful for allocating large temporary buffers in things like numerics
> code.
> >> Also comes with a hash table, hash set and AVL tree optimized for
> this
> >> allocation scheme. The advantages over plain old stack allocation
> are that
> >> it's independent of function calls (meaning you can return pointers
> to
> >> TempAlloc-allocated memory from a function, etc.) and it's segmented,
> meaning
> >> you can allocate huge buffers w/o risking stack overflow. Its main
> weakness
> >> is that this stack is not scanned by the GC, meaning that you can't
> store the
> >> only reference to a GC-allocated piece of memory here. However, in
> practice
> >> large arrays of primitives are an extremely common case in performance-critical code. I find this module immensely useful in
> dstats and
> >> Lars Kyllingstad uses it in SciD. Getting it into Phobos would make
> it easy
> >> for other scientific/numerics code to use it. Completion state:
> Working and
> >> used. Needs a litte cleanup and documentation. (Phobos candidate)
> >
> > This is #1. Far and away. Belongs in druntime.
> Stuff like this is why core.memory isn't called core.gc.
Ok, this seems like a popular choice. I should have time after my )#*#$ thesis
proposal and after std.parallelism is done to clean this up and submit for review.
I'm sure the documentation will need to be improved, but as with std.parallelism
I'm not sure **how** it will be judged deficient by people not as intimately
familiar with the library as I am.
| |||
March 24, 2011 Re: Library Development: What to finish/flesh out? | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Sean Kelly | == Quote from Sean Kelly (sean@invisibleduck.org)'s article
> On Mar 17, 2011, at 11:56 PM, Don wrote:
> > dsimcha wrote:
> >> I've accumulated a bunch of little libraries via various evening and
> weekend
> >> hacking projects over the past year or so, in various states of
> completion.
> >> Most are things I'm at least half-considering for Phobos, though some
> belong
> >> as third-party libs. I definitely don't have time to finish/flesh
> out all of
> >> them anytime soon, so I've decided to ask the community what to
> prioritize.
> >> Below is a summary of everything I've been working on, with its
> current level
> >> of completion. Please let me know the following:
> >
> >> 3. TempAlloc: A memory allocator based on a thread-local segmented
> stack,
> >> useful for allocating large temporary buffers in things like numerics
> code.
> >> Also comes with a hash table, hash set and AVL tree optimized for
> this
> >> allocation scheme. The advantages over plain old stack allocation
> are that
> >> it's independent of function calls (meaning you can return pointers
> to
> >> TempAlloc-allocated memory from a function, etc.) and it's segmented,
> meaning
> >> you can allocate huge buffers w/o risking stack overflow. Its main
> weakness
> >> is that this stack is not scanned by the GC, meaning that you can't
> store the
> >> only reference to a GC-allocated piece of memory here. However, in
> practice
> >> large arrays of primitives are an extremely common case in performance-critical code. I find this module immensely useful in
> dstats and
> >> Lars Kyllingstad uses it in SciD. Getting it into Phobos would make
> it easy
> >> for other scientific/numerics code to use it. Completion state:
> Working and
> >> used. Needs a litte cleanup and documentation. (Phobos candidate)
> >
> > This is #1. Far and away. Belongs in druntime.
> Stuff like this is why core.memory isn't called core.gc.
BTW, the TempAlloc module also includes a hash table, hash set and AVL tree that are specifically optimized for TempAlloc. Should these be included in the submission? The disadvantages I see here is that they are less generally useful (possibly too high level for druntime) and that they will make the review take a heck of a lot longer.
| |||
March 25, 2011 Re: Library Development: What to finish/flesh out? | ||||
|---|---|---|---|---|
| ||||
Posted in reply to dsimcha | On Mar 24, 2011, at 1:00 PM, dsimcha wrote:
>
> BTW, the TempAlloc module also includes a hash table, hash set and AVL tree that are specifically optimized for TempAlloc. Should these be included in the submission? The disadvantages I see here is that they are less generally useful (possibly too high level for druntime) and that they will make the review take a heck of a lot longer.
Are they necessary for TempAlloc to function? If so, I'd add them but hidden, as I imagine there's more code than you'd want to simply drop in a private block in core.memory. It may be time for core to get a core.internal package for this kind of stuff.
| |||
March 25, 2011 Re: Library Development: What to finish/flesh out? | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Sean Kelly | On 3/25/2011 3:50 PM, Sean Kelly wrote:
> On Mar 24, 2011, at 1:00 PM, dsimcha wrote:
>>
>> BTW, the TempAlloc module also includes a hash table, hash set and AVL tree that
>> are specifically optimized for TempAlloc. Should these be included in the
>> submission? The disadvantages I see here is that they are less generally useful
>> (possibly too high level for druntime) and that they will make the review take a
>> heck of a lot longer.
>
> Are they necessary for TempAlloc to function? If so, I'd add them but hidden, as I imagine there's more code than you'd want to simply drop in a private block in core.memory. It may be time for core to get a core.internal package for this kind of stuff.
No, they are just data structures built on top of TempAlloc and optimized for it.
| |||
March 25, 2011 Re: Library Development: What to finish/flesh out? | ||||
|---|---|---|---|---|
| ||||
Posted in reply to dsimcha | On Sat, 26 Mar 2011 00:26:40 +0300, dsimcha <dsimcha@yahoo.com> wrote:
> On 3/25/2011 3:50 PM, Sean Kelly wrote:
>> On Mar 24, 2011, at 1:00 PM, dsimcha wrote:
>>>
>>> BTW, the TempAlloc module also includes a hash table, hash set and AVL tree that
>>> are specifically optimized for TempAlloc. Should these be included in the
>>> submission? The disadvantages I see here is that they are less generally useful
>>> (possibly too high level for druntime) and that they will make the review take a
>>> heck of a lot longer.
>>
>> Are they necessary for TempAlloc to function? If so, I'd add them but hidden, as I imagine there's more code than you'd want to simply drop in a private block in core.memory. It may be time for core to get a core.internal package for this kind of stuff.
>
> No, they are just data structures built on top of TempAlloc and optimized for it.
I'd love to see them, in a separate module probably.
| |||
March 26, 2011 Re: Library Development: What to finish/flesh out? | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Denis Koroskin | On 3/25/2011 5:59 PM, Denis Koroskin wrote:
> On Sat, 26 Mar 2011 00:26:40 +0300, dsimcha <dsimcha@yahoo.com> wrote:
>
>> On 3/25/2011 3:50 PM, Sean Kelly wrote:
>>> On Mar 24, 2011, at 1:00 PM, dsimcha wrote:
>>>>
>>>> BTW, the TempAlloc module also includes a hash table, hash set and
>>>> AVL tree that
>>>> are specifically optimized for TempAlloc. Should these be included
>>>> in the
>>>> submission? The disadvantages I see here is that they are less
>>>> generally useful
>>>> (possibly too high level for druntime) and that they will make the
>>>> review take a
>>>> heck of a lot longer.
>>>
>>> Are they necessary for TempAlloc to function? If so, I'd add them but
>>> hidden, as I imagine there's more code than you'd want to simply drop
>>> in a private block in core.memory. It may be time for core to get a
>>> core.internal package for this kind of stuff.
>>
>> No, they are just data structures built on top of TempAlloc and
>> optimized for it.
>
> I'd love to see them, in a separate module probably.
This suggests two separate proposals. The more I think about it, the more I think this is the way to go. TempAlloc per se is much more self-evidently useful than the extra data structures and doesn't need the extra data structures to work. The extras shouldn't hold up its inclusion. The extra data structures only use (or only should use; I don't remember whether I bend this rule) TempAlloc's public API. Furthermore, I'm not sure they're generally useful enough to belong in Phobos. I'd like feedback from others when/if TempAlloc is in Phobos and more people are familiar with it.
| |||
March 26, 2011 Re: Library Development: What to finish/flesh out? | ||||
|---|---|---|---|---|
| ||||
Posted in reply to dsimcha | On 2011-03-17 08:33, dsimcha wrote: > I've accumulated a bunch of little libraries via various evening and weekend hacking projects over the past year or so, in various states of completion. Most are things I'm at least half-considering for Phobos, though some belong as third-party libs. I definitely don't have time to finish/flesh out all of them anytime soon, so I've decided to ask the community what to prioritize. Below is a summary of everything I've been working on, with its current level of completion. Please let me know the following: > > 1. A relative ordering of how useful you think these libraries would be to the community. > > 2. In absolute terms, would you find this useful? > > 3. For the Phobos candidates, whether they're general enough to belong in the **standard** library. I find the responses to this list to be rather interesting. Most of it, I find to be of mild interest at best (certainly for the sort of stuff that _I_ do anyway), but others find some of them to be very desirable. > List in order from most to least finished: > > 1. Rational: A library for handling rational numbers exactly. Templated > on integer type, can use BigInts for guaranteed accuracy, or fixed-width > integers for more speed where the denominator and numerator will be small. > Completion state: Mostly finished. Just need to fix a litte bit rot and > submit for review. (Phobos candidate) Potentially interesting, but I don't know if I'd ever use it. > 2. RandAA: A hash table implementation with deterministic memory management, based on randomized probing. Main advantage over builtin AAs is that it plays much nicer with the GC and multithreaded programs. Lookup times are also expected O(1) no matter how many collisions exist in modulus hash space, as long as there are few collisions in full 32- or 64-bit hash space. Completion state: Mostly finished. Just needs a little doc improvement, a few benchmarks and submission for review. (Phobos candidate) I'm afraid that I don't understand how this is better than the current hash table, and if it really is better, perhaps it should replace implementation of the built in one? > 3. TempAlloc: A memory allocator based on a thread-local segmented stack, > useful for allocating large temporary buffers in things like numerics code. > Also comes with a hash table, hash set and AVL tree optimized for this > allocation scheme. The advantages over plain old stack allocation are that > it's independent of function calls (meaning you can return pointers to > TempAlloc-allocated memory from a function, etc.) and it's segmented, > meaning you can allocate huge buffers w/o risking stack overflow. Its > main weakness is that this stack is not scanned by the GC, meaning that > you can't store the only reference to a GC-allocated piece of memory here. > However, in practice large arrays of primitives are an extremely common > case in > performance-critical code. I find this module immensely useful in dstats > and Lars Kyllingstad uses it in SciD. Getting it into Phobos would make > it easy for other scientific/numerics code to use it. Completion state: > Working and used. Needs a litte cleanup and documentation. (Phobos > candidate) Personally, I see zero use for this in my stuff, but obviously others find it very compelling. > 4. Streaming CSV Parser: Parses CSV files as they're read in, a few convenience functions for extracting columns into structs. If Phobos every gets SQLite support I'll probably add sugar for turning a CSV file into an SQLite database, too. Completion state: Prototype working, needs testing, cleanup and documentation. (Phobos candidate) This definitely sounds useful. I've had to deal with CSV parsing in Java before, and I'd love to see a solid CSV parser in Phobos, but I don't know how much I'd actually end up using it. In the few cases where I'd be looking to deal with CSV files though, it could be invaluable. > 5. Matrix operations: SciD improvements that allow you to write matrix operations that look like normal math/MATLAB and optimizes them via expression templates so that a minimal number of temporary matrices are created. Uses/will use BLAS for multiplication. Completion state: Addition implemented. Multiplication not. I don't expect that I'd _ever_ need this, but I'd fully expect that some folks would love it. > 6. Machine learning: Decision trees, KNN, Random Forest, Logistic Regression, SVM, Naive Bayes, etc. This would be a dstats module. Completion state: Decision trees prototyped, logistic regression working. I'd have to see the actual library to know whether I'd find much use in it. Probably not though. > 7. std.mixins: Mixins for commonly needed boilerplate code. I stopped working on this when Andrei suggested that making a collection of mixins into a module is a bad idea. I've thought about it some more and I respectfully disagree. std.mixins would be a one-stop shop for pretty much any boilerplate you need to inject, and most of this code doesn't fit in any other obvious place. Completion state: A few things (struct comparison, simple class constructors, Singleton pattern) prototyped. (Phobos candidate) This, I very much like. I'd _love_ to have standard mixins for stuff like opEquals and opCmp. I can kind of understand why Andrei might not like a module that's just mixins, but this just seems so useful that I think that it's a definite loss that we _don't_ have it right now. D has done a lot for reducing boiler plate code (the new overloaded operators in particular are quite good at that), but there's still plenty of stuff which is pretty boilerplate and could use solid, standard string mixins to solve it. > 8. GZip support in std.file: I'll leave the stream stuff for someone > else, but just simple stuff like read(), write(), append() IMHO belongs in > std.file. Completion state: Not started, but this is the easiest of the > bunch to implement. (Phobos candidate) I don't know about this. std.stream should be able to handle gzip files, but I'm not quite sure what you'd do with std.file to support gzip. It would have to be done in a way that would work with other file types generically (decorator pattern kind of solution). This very much feels to me like the kind of thing that you'd want to do with streams rather than std.file or std.stdio. If all you want to do is compress and uncompress a file, then perhaps we should have some set of modules for dealing with different types of compression (we already have std.zip). I don't see what std.file would be doing though. So, overall, the stuff that you have is mostly not stuff that I'd find useful at all, but I can see how someone else might (obviously you did). For the most part, I wouldn't have a problem with this sort of stuff being in Phobos. It's just not something that I'd be using, personally. - Jonathan M Davis | |||
March 26, 2011 Re: Library Development: What to finish/flesh out? | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Jonathan M Davis | Jonathan M Davis wrote: > On 2011-03-17 08:33, dsimcha wrote: >> I've accumulated a bunch of little libraries via various evening and weekend hacking projects over the past year or so, in various states of completion. Most are things I'm at least half-considering for Phobos, though some belong as third-party libs. I definitely don't have time to finish/flesh out all of them anytime soon, so I've decided to ask the community what to prioritize. Below is a summary of everything I've been working on, with its current level of completion. Please let me know the following: >> >> 1. A relative ordering of how useful you think these libraries would be to the community. >> >> 2. In absolute terms, would you find this useful? >> >> 3. For the Phobos candidates, whether they're general enough to belong in the **standard** library. > > I find the responses to this list to be rather interesting. Most of it, I find to be of mild interest at best (certainly for the sort of stuff that _I_ do anyway), but others find some of them to be very desirable. > >> List in order from most to least finished: >> >> 1. Rational: A library for handling rational numbers exactly. >> Templated on integer type, can use BigInts for guaranteed accuracy, or >> fixed-width integers for more speed where the denominator and numerator >> will be small. >> Completion state: Mostly finished. Just need to fix a litte bit rot >> and >> submit for review. (Phobos candidate) > > Potentially interesting, but I don't know if I'd ever use it. > >> 2. RandAA: A hash table implementation with deterministic memory management, based on randomized probing. Main advantage over builtin AAs is that it plays much nicer with the GC and multithreaded programs. Lookup times are also expected O(1) no matter how many collisions exist in modulus hash space, as long as there are few collisions in full 32- or 64-bit hash space. Completion state: Mostly finished. Just needs a little doc improvement, a few benchmarks and submission for review. (Phobos candidate) > > I'm afraid that I don't understand how this is better than the current hash table, and if it really is better, perhaps it should replace implementation of the built in one? > >> 3. TempAlloc: A memory allocator based on a thread-local segmented >> stack, useful for allocating large temporary buffers in things like >> numerics code. Also comes with a hash table, hash set and AVL tree >> optimized for this >> allocation scheme. The advantages over plain old stack allocation are >> that it's independent of function calls (meaning you can return pointers >> to TempAlloc-allocated memory from a function, etc.) and it's segmented, >> meaning you can allocate huge buffers w/o risking stack overflow. Its >> main weakness is that this stack is not scanned by the GC, meaning that >> you can't store the only reference to a GC-allocated piece of memory >> here. >> However, in practice large arrays of primitives are an extremely common >> case in >> performance-critical code. I find this module immensely useful in dstats >> and Lars Kyllingstad uses it in SciD. Getting it into Phobos would make >> it easy for other scientific/numerics code to use it. Completion state: >> Working and used. Needs a litte cleanup and documentation. (Phobos >> candidate) > > Personally, I see zero use for this in my stuff, but obviously others find it very compelling. > >> 4. Streaming CSV Parser: Parses CSV files as they're read in, a few >> convenience functions for extracting columns into structs. If Phobos >> every gets SQLite support I'll probably add sugar for turning a CSV file >> into an >> SQLite database, too. Completion state: Prototype working, needs >> testing, >> cleanup and documentation. (Phobos candidate) > > This definitely sounds useful. I've had to deal with CSV parsing in Java before, and I'd love to see a solid CSV parser in Phobos, but I don't know how much I'd actually end up using it. In the few cases where I'd be looking to deal with CSV files though, it could be invaluable. > >> 5. Matrix operations: SciD improvements that allow you to write matrix operations that look like normal math/MATLAB and optimizes them via expression templates so that a minimal number of temporary matrices are created. Uses/will use BLAS for multiplication. Completion state: Addition implemented. Multiplication not. > > I don't expect that I'd _ever_ need this, but I'd fully expect that some folks would love it. Definitely some would love it, me among them. This to me feels like the most useful. Does your (dsimcha) implementation support spare matrices as well, and if not are the api general engough to support them if someone has time to implement them. When you say looks like normal matlab does that include b=A\x? Regardless any standardization of linear algebra libraries would improve on the current state in most languages. >> 6. Machine learning: Decision trees, KNN, Random Forest, Logistic Regression, SVM, Naive Bayes, etc. This would be a dstats module. Completion state: Decision trees prototyped, logistic regression working. > > I'd have to see the actual library to know whether I'd find much use in it. Probably not though. > >> 7. std.mixins: Mixins for commonly needed boilerplate code. I stopped >> working on this when Andrei suggested that making a collection of mixins >> into a module is a bad idea. I've thought about it some more and I >> respectfully disagree. std.mixins would be a one-stop shop for pretty >> much any boilerplate you need to inject, and most of this code doesn't >> fit >> in any other obvious place. Completion state: A few things (struct >> comparison, simple class constructors, Singleton pattern) prototyped. >> (Phobos candidate) > > This, I very much like. I'd _love_ to have standard mixins for stuff like opEquals and opCmp. I can kind of understand why Andrei might not like a module that's just mixins, but this just seems so useful that I think that it's a definite loss that we _don't_ have it right now. D has done a lot for reducing boiler plate code (the new overloaded operators in particular are quite good at that), but there's still plenty of stuff which is pretty boilerplate and could use solid, standard string mixins to solve it. > >> 8. GZip support in std.file: I'll leave the stream stuff for someone >> else, but just simple stuff like read(), write(), append() IMHO belongs >> in >> std.file. Completion state: Not started, but this is the easiest of the >> bunch to implement. (Phobos candidate) > > I don't know about this. std.stream should be able to handle gzip files, but I'm not quite sure what you'd do with std.file to support gzip. It would have to be done in a way that would work with other file types generically (decorator pattern kind of solution). This very much feels to me like the kind of thing that you'd want to do with streams rather than std.file or std.stdio. If all you want to do is compress and uncompress a file, then perhaps we should have some set of modules for dealing with different types of compression (we already have std.zip). I don't see what std.file would be doing though. > > So, overall, the stuff that you have is mostly not stuff that I'd find useful at all, but I can see how someone else might (obviously you did). For the most part, I wouldn't have a problem with this sort of stuff being in Phobos. It's just not something that I'd be using, personally. > > - Jonathan M Davis | |||
Copyright © 1999-2021 by the D Language Foundation
Permalink
Reply