October 02, 2008
Does anyone have a recommendation for a linear algebra library in D?

I need to do a singular value decomposition on large sparse matrices (approx 100,000 x 100,000).

I've glanced at the Blade library, on DSource, but my linear algebra is a bit rusty, and I can't tell whether the library simply lacks the ability to do an SVD or whether it can be done, but only by composing other operations.

Thanks!

--benji
October 02, 2008
You need something like ARPACK to do SVD on large sparse matrices.
I have implemented a wrapper for Python before.  And I could use one
for D right now too, but I haven't done it yet.
I have a bunch of other linear algebra wrappers in my MultiArray
project on dsource, including LAPACK which can do SVD on dense
matrices.

There's also GSL wrappers someone contributed in the bindings project.
 I don't know much about it, but supposedly it at least contains BLAS
wrappers.  I don't know if it has anything much beyond that though.

If I needed something like this today, I'd put an Arpack wrapper into multiarray, following the pattern used by the Blas, Lapack, SuperLU, Umfpack, and Taucs wrappers I've already written.  Arpack isn't exactly easy to call directly, so then I'd follow up by looking into what SciPy has got for Arpack wrappers these days and try porting that to D.

--bb

On Fri, Oct 3, 2008 at 6:50 AM, Benji Smith <dlanguage@benjismith.net> wrote:
> Does anyone have a recommendation for a linear algebra library in D?
>
> I need to do a singular value decomposition on large sparse matrices (approx
> 100,000 x 100,000).
>
> I've glanced at the Blade library, on DSource, but my linear algebra is a bit rusty, and I can't tell whether the library simply lacks the ability to do an SVD or whether it can be done, but only by composing other operations.
>
> Thanks!
>
> --benji
>
October 02, 2008
Oh, and Blade is a very low-level library aimed at optimizing the very simplest linear algebra expressions like vector+vector.  Last I checked it didn't contain any matrix ops at all.  And it seems like Don is frying different fish these days (like Tango BigInt).

--bb

On Fri, Oct 3, 2008 at 6:50 AM, Benji Smith <dlanguage@benjismith.net> wrote:
> Does anyone have a recommendation for a linear algebra library in D?
>
> I need to do a singular value decomposition on large sparse matrices (approx
> 100,000 x 100,000).
>
> I've glanced at the Blade library, on DSource, but my linear algebra is a bit rusty, and I can't tell whether the library simply lacks the ability to do an SVD or whether it can be done, but only by composing other operations.
>
> Thanks!
>
> --benji
>
October 02, 2008
I used LinAlg of the Shark-Project for SVD but is has no sparse matrix support and is C++.
http://shark-project.sourceforge.net/

GNU Scientific Library
http://www.gnu.org/software/gsl/
http://www.gnu.org/software/gsl/manual/html_node/Singular-Value-Decomposition.html
It seem to have no sparse matrix support but there are bindings in
http://www.dsource.org/projects/bindings/browser/trunk/gsl

SVDLIBC
http://tedlab.mit.edu/~dr/SVDLIBC/
It has sparse matrix support but you have to make bindings yourself or try bcd for that
http://www.dsource.org/projects/bcd

NumPy
http://www.scipy.org/doc/numpy_api_docs/numpy.linalg.linalg.html#svd
I've never used Python but there is Kirks well known pyd for interfacing between Python and D.

Sure you need SVD? Often it is used for least squares fitting or pseudoinverse. So depending on what you are doing there may be weaker alternatives like Moore-Penrose pseudoinverse.

October 03, 2008
Oliver Dathe wrote:
> Sure you need SVD? Often it is used for least squares fitting or pseudoinverse. So depending on what you are doing there may be weaker alternatives like Moore-Penrose pseudoinverse.


Yeah, actually, I definitely need SVD. I'm putting together an LSA algorithm for a machine-learning project.

My first instinct was to use Colt & JAMA (both Java libs) since the rest of the project is in Java. But I thought it would be interesting to compare the Java implementation with one in D, both performance-wise and API-wise.

Thanks, everybody, for your quick responses!

--benji
October 03, 2008
Bill Baxter wrote:
> Oh, and Blade is a very low-level library aimed at optimizing the very
> simplest linear algebra expressions like vector+vector.  Last I
> checked it didn't contain any matrix ops at all.  And it seems like
> Don is frying different fish these days (like Tango BigInt).

It's pretty much obselete now that array ops are in the language. That always happens to my most interesting, sophisticated code -- Walter turns it into a one-liner.

BTW BigInt will eventually get to Phobos, too.


> 
> --bb
> 
> On Fri, Oct 3, 2008 at 6:50 AM, Benji Smith <dlanguage@benjismith.net> wrote:
>> Does anyone have a recommendation for a linear algebra library in D?
>>
>> I need to do a singular value decomposition on large sparse matrices (approx
>> 100,000 x 100,000).
>>
>> I've glanced at the Blade library, on DSource, but my linear algebra is a
>> bit rusty, and I can't tell whether the library simply lacks the ability to
>> do an SVD or whether it can be done, but only by composing other operations.
>>
>> Thanks!
>>
>> --benji
>>
October 03, 2008
On 2008-10-03 10:27:27 +0200, Don Clugston <nospam@nospam.com> said:

> Bill Baxter wrote:
>> Oh, and Blade is a very low-level library aimed at optimizing the very
>> simplest linear algebra expressions like vector+vector.  Last I
>> checked it didn't contain any matrix ops at all.  And it seems like
>> Don is frying different fish these days (like Tango BigInt).
> 
> It's pretty much obselete now that array ops are in the language. That always happens to my most interesting, sophisticated code -- Walter turns it into a one-liner.
> 
> BTW BigInt will eventually get to Phobos, too.
> 
> 
>> 
>> --bb
>> 
>> On Fri, Oct 3, 2008 at 6:50 AM, Benji Smith <dlanguage@benjismith.net> wrote:
>>> Does anyone have a recommendation for a linear algebra library in D?
>>> 
>>> I need to do a singular value decomposition on large sparse matrices (approx
>>> 100,000 x 100,000).
>>> 
>>> I've glanced at the Blade library, on DSource, but my linear algebra is a
>>> bit rusty, and I can't tell whether the library simply lacks the ability to
>>> do an SVD or whether it can be done, but only by composing other operations.
>>> 
>>> Thanks!
>>> 
>>> --benji

I have written dense multidimensional arrays in D.
They provide some wrapper to various Lapack functions (through Bill's wrappers).
The library is available at
	http://github.com/fawzi
SVD is also there, but just for *dense* matrixes.
Bill's wrappers and his dflat also work with sparse matrixes.

Actually I think that you should think more about your problem
If you need full SVD U and V matrixes are *full* even if A is sparse.
100'000x100'000 double matrix ~74GB, floats is half of it, but you have at least two matrixes.

Either you have really big resources, or you switch to a cluster and scalapack, or (better) you reformulate your problem so that you either need just a partial decomposition, or you can solve it iteratively.

Fawzi

October 03, 2008
On 2008-10-03 11:46:11 +0200, Fawzi Mohamed <fmohamed@mac.com> said:

> On 2008-10-03 10:27:27 +0200, Don Clugston <nospam@nospam.com> said:
> 
>> Bill Baxter wrote:
>>> Oh, and Blade is a very low-level library aimed at optimizing the very
>>> simplest linear algebra expressions like vector+vector.  Last I
>>> checked it didn't contain any matrix ops at all.  And it seems like
>>> Don is frying different fish these days (like Tango BigInt).
>> 
>> It's pretty much obselete now that array ops are in the language. That always happens to my most interesting, sophisticated code -- Walter turns it into a one-liner.
>> 
>> BTW BigInt will eventually get to Phobos, too.
>> 
>> 
>>> 
>>> --bb
>>> 
>>> On Fri, Oct 3, 2008 at 6:50 AM, Benji Smith <dlanguage@benjismith.net> wrote:
>>>> Does anyone have a recommendation for a linear algebra library in D?
>>>> 
>>>> I need to do a singular value decomposition on large sparse matrices (approx
>>>> 100,000 x 100,000).
>>>> 
>>>> I've glanced at the Blade library, on DSource, but my linear algebra is a
>>>> bit rusty, and I can't tell whether the library simply lacks the ability to
>>>> do an SVD or whether it can be done, but only by composing other operations.
>>>> 
>>>> Thanks!
>>>> 
>>>> --benji
> 
> I have written dense multidimensional arrays in D.
> They provide some wrapper to various Lapack functions (through Bill's wrappers).
> The library is available at
> 	http://github.com/fawzi
> SVD is also there, but just for *dense* matrixes.
> Bill's wrappers and his dflat also work with sparse matrixes.
> 
> Actually I think that you should think more about your problem
> If you need full SVD U and V matrixes are *full* even if A is sparse.
> 100'000x100'000 double matrix ~74GB, floats is half of it, but you have at least two matrixes.
> 
> Either you have really big resources, or you switch to a cluster and scalapack, or (better) you reformulate your problem so that you either need just a partial decomposition, or you can solve it iteratively.
> 
> Fawzi

Probably if you want least square approximation (that is what you meant with LSA?) some kind of direct minimizer is the way to go (but be careful the problem might ill conditioned).
Fawzi

October 03, 2008
Fawzi Mohamed Wrote:

[...]
> Probably if you want least square approximation (that is what you meant
> with LSA?) some kind of direct minimizer is the way to go (but be
> careful the problem might ill conditioned).
> Fawzi
> 
You normally use SVD only if the system is ill conditioned. Otherwise check for a conjugate gradient method (if the matrix is symmetric), or do some sort of pre-conditioning. But then again, the last time I did CFD was some 10 years ago.

Ciao
Tom
October 03, 2008
Fawzi Mohamed wrote:
> Probably if you want least square approximation (that is what you meant with LSA?) some kind of direct minimizer is the way to go (but be careful the problem might ill conditioned).
> Fawzi

Sorry. I meant "Latent Semantic Analysis". I'm building a table of synonyms from a large document repository.

There are about 100,000 documents in the repository, and the total corpus has about 100,000 unique words (after stemming). The LSA algorithm uses a matrix, where each row represents a single term, and each column represents a single document. Each cell contains the TF-IDF (Term Frequency / Inverse Document Frequency).

Performing an SVD on the matrix yields a collelation matrix, representing the synonymy of all terms, in all documents.

--benji
« First   ‹ Prev
1 2
Top | Discussion index | About this forum | D home