View mode: basic / threaded / horizontal-split · Log in · Help
August 09, 2012
Which D features to emphasize for academic review article
Hello D Users,

The Software Editor for the Journal of Applied Econometrics has 
agreed to let me write a review of the D programming language for 
econometricians (econometrics is where economic theory and 
statistical analysis meet).  I will have only about 6 pages.  I 
have an idea of what I am going to write about, but I thought I 
would ask here what features are most relevant (in your minds) to 
numerical programmers writing codes for statistical inference.

I look forward to your suggestions.

Thanks,

TJB
August 09, 2012
Re: Which D features to emphasize for academic review article
Ok, so IIUC the audience is academic BUT is people interested in 
using D as a means to an end, not computer scientists?  I use D 
for bioinformatics, which IIUC has similar requirements to 
econometrics.  From my point of view:

I'd emphasize the following:

Native efficiency.  (Important for large datasets and monte carlo 
simulations)

Garbage collection.  (Important because it makes it much easier 
to write non-trivial data structures that don't leak memory, and 
statistical analyses are a lot easier if the data is structured 
well.)

Ranges/std.range/builtin arrays and associative arrays.  (Again, 
these make data handling a pleasure.)

Templates.  (Makes it easier to write algorithms that aren't 
overly specialized to the data structure they operate on.  This 
can also be done with OO containers but requires more boilerplate 
and compromises on efficiency.)

Disclaimer:  These last two are things I'm the primary designer 
and implementer of.  I intentionally put them last so it doesn't 
look like a shameless plug.

std.parallelism  (Important because you can easily parallelize 
your simulation, etc.)

dstats  (https://github.com/dsimcha/dstats  Important because a 
lot of statistical analysis code is already implemented for you.  
It's admittedly very basic compared to e.g. R or Matlab, but it's 
also in many cases better integrated and more efficient.  I'd say 
that it has the 15% of the functionality that covers ~70% of use 
cases.  I welcome contributors to add more stuff to it.  I 
imagine economists would be interested in time series, which is 
currently a big area of missing functionality.)
August 09, 2012
Re: Which D features to emphasize for academic review article
On Thu, 09 Aug 2012 17:57:27 +0200, TJB wrote:

> Hello D Users,
> 
> The Software Editor for the Journal of Applied Econometrics has agreed
> to let me write a review of the D programming language for
> econometricians (econometrics is where economic theory and statistical
> analysis meet).  I will have only about 6 pages.  I have an idea of what
> I am going to write about, but I thought I would ask here what features
> are most relevant (in your minds) to numerical programmers writing codes
> for statistical inference.
> 
> I look forward to your suggestions.
> 
> Thanks,
> 
> TJB

Lazy ranges are a lifesaver when dealing with big data.  E.g. read a 
large csv file, use filter and map to clean and transform the data, 
collect stats as you go, then output to a destination file.  The lazy 
nature of most of the ranges in Phobos means that you don't need to have 
the data in memory, but you can write simple imperative code just as if 
it was.
August 09, 2012
Re: Which D features to emphasize for academic review article
On Thursday, 9 August 2012 at 18:20:08 UTC, Justin Whear wrote:
> On Thu, 09 Aug 2012 17:57:27 +0200, TJB wrote:
>
>> Hello D Users,
>> 
>> The Software Editor for the Journal of Applied Econometrics 
>> has agreed
>> to let me write a review of the D programming language for
>> econometricians (econometrics is where economic theory and 
>> statistical
>> analysis meet).  I will have only about 6 pages.  I have an 
>> idea of what
>> I am going to write about, but I thought I would ask here what 
>> features
>> are most relevant (in your minds) to numerical programmers 
>> writing codes
>> for statistical inference.
>> 
>> I look forward to your suggestions.
>> 
>> Thanks,
>> 
>> TJB
>
> Lazy ranges are a lifesaver when dealing with big data.  E.g. 
> read a
> large csv file, use filter and map to clean and transform the 
> data,
> collect stats as you go, then output to a destination file.  
> The lazy
> nature of most of the ranges in Phobos means that you don't 
> need to have
> the data in memory, but you can write simple imperative code 
> just as if
> it was.

Ah, the beauty of functional programming and streams.
August 09, 2012
Re: Which D features to emphasize for academic review article
On 8/9/2012 10:40 AM, dsimcha wrote:
> I'd emphasize the following:

I'd like to add to that:

1. Proper support for 80 bit floating point types. Many compilers' libraries 
have inaccurate 80 bit math functions, or don't implement 80 bit floats at all. 
80 bit floats reduce the incidence of creeping roundoff error.

2. Support for SIMD vectors as native types.

3. Floating point values are default initialized to NaN.

4. Correct support for NaN and infinity values.

5. Correct support for unordered operations.

6. Array types do not degenerate into pointer types whenever passed to a 
function. In other words, array types know their dimension.

7. Array loop operations, i.e.:

    for (size_t i = 0; i < a.length; i++)
           a[i] = b[i] + c;

can be written as:

    a[] = b[] + c;

8. Global data is thread local by default, lessening the risk of unintentional 
unsynchronized sharing between threads.
August 10, 2012
Re: Which D features to emphasize for academic review article
Walter Bright wrote:
> 3. Floating point values are default initialized to NaN.

This isn't a good feature, IMO. C# handles this much more 
conveniently with just as much optimization/debugging benefit 
(arguably more so, because it catches NaN issues at 
compile-time). In C#:

    class Foo
    {
        float x; // defaults to 0.0f

        void bar()
        {
            float y; // doesn't default
            y ++; // ERROR: use of unassigned local

            float z = 0.0f;
            z ++; // OKAY
        }
    }

This is the same behavior for any local variable, so where in D 
you need to explicitly set variables to 'void' to avoid 
assignment costs, C# automatically benefits and catches your NaN 
mistakes before runtime.

Sorry, I'm not trying to derail this thread. I just think D's has 
other, much better advertising points that this one.
August 10, 2012
Re: Which D features to emphasize for academic review article
1) I think compile-time function execution is a very big plus for 
people doing calculations.

For example:

ulong fibonacci(ulong n) { .... }

static x = fibonacci(50); // calculated at compile time! runtime 
cost = 0 !!!

2) It has support for a BigInt structure in its standard library 
(which is really fast!)
August 10, 2012
Re: Which D features to emphasize for academic review article
On Thursday, 9 August 2012 at 18:35:22 UTC, Walter Bright wrote:
> On 8/9/2012 10:40 AM, dsimcha wrote:
>> I'd emphasize the following:
>
> I'd like to add to that:
>
> 1. Proper support for 80 bit floating point types. Many 
> compilers' libraries have inaccurate 80 bit math functions, or 
> don't implement 80 bit floats at all. 80 bit floats reduce the 
> incidence of creeping roundoff error.

How unique to D is this feature?  Does this imply that things 
like BLAS and LAPACK, random number generators, statistical 
distribution functions, and other numerical software should be 
rewritten in pure D rather than calling out to external C or 
Fortran codes?

TJB
August 10, 2012
Re: Which D features to emphasize for academic review article
On 8/10/2012 1:38 AM, F i L wrote:
> Walter Bright wrote:
>> 3. Floating point values are default initialized to NaN.
>
> This isn't a good feature, IMO. C# handles this much more conveniently with just
> as much optimization/debugging benefit (arguably more so, because it catches NaN
> issues at compile-time). In C#:
>
>      class Foo
>      {
>          float x; // defaults to 0.0f
>
>          void bar()
>          {
>              float y; // doesn't default
>              y ++; // ERROR: use of unassigned local
>
>              float z = 0.0f;
>              z ++; // OKAY
>          }
>      }
>
> This is the same behavior for any local variable,

It catches only a subset of these at compile time. I can craft any number of 
ways of getting it to miss diagnosing it. Consider this one:

    float z;
    if (condition1)
         z = 5;
    ... lotsa code ...
    if (condition2)
         z++;

To diagnose this correctly, the static analyzer would have to determine that 
condition1 produces the same result as condition2, or not. This is impossible to 
prove. So the static analyzer either gives up and lets it pass, or issues an 
incorrect diagnostic. So our intrepid programmer is forced to write:

    float z = 0;
    if (condition1)
         z = 5;
    ... lotsa code ...
    if (condition2)
         z++;

Now, as it may turn out, for your algorithm the value "0" is an out-of-range, 
incorrect value. Not a problem as it is a dead assignment, right?

But then the maintenance programmer comes along and changes condition1 so it is 
not always the same as condition2, and now the z++ sees the invalid "0" value 
sometimes, and a silent bug is introduced.

This bug will not remain undetected with the default NaN initialization.


> so where in D you need to
> explicitly set variables to 'void' to avoid assignment costs,

This is incorrect, as the optimizer is perfectly capable of removing dead 
assignments like:

   f = nan;
   f = 0.0f;

The first assignment is optimized away.

> I just think D's has other, much better advertising points that this one.

Whether you agree with it being a good feature or not, it is a feature unique to 
D and merits discussion when talking about D's suitability for numerical 
programming.
August 10, 2012
Re: Which D features to emphasize for academic review article
On 8/10/2012 8:31 AM, TJB wrote:
> On Thursday, 9 August 2012 at 18:35:22 UTC, Walter Bright wrote:
>> On 8/9/2012 10:40 AM, dsimcha wrote:
>>> I'd emphasize the following:
>>
>> I'd like to add to that:
>>
>> 1. Proper support for 80 bit floating point types. Many compilers' libraries
>> have inaccurate 80 bit math functions, or don't implement 80 bit floats at
>> all. 80 bit floats reduce the incidence of creeping roundoff error.
>
> How unique to D is this feature?  Does this imply that things like BLAS and
> LAPACK, random number generators, statistical distribution functions, and other
> numerical software should be rewritten in pure D rather than calling out to
> external C or Fortran codes?

I attended a talk given by a physicist a few months ago where he was using C 
transcendental functions. I pointed out to him that those functions were 
unreliable, producing wrong bits in a manner that suggested to me that they were 
internally truncating to double precision.

He expressed astonishment and told me I must be mistaken.

What can I say? I run across this repeatedly, and that's exactly why Phobos 
(with Don's help) has its own implementations, rather than simply calling the 
corresponding C ones.

I encourage you to run your own tests, and draw your own conclusions.
« First   ‹ Prev
1 2 3 4 5
Top | Discussion index | About this forum | D home