April 22, 2004
Touche! <G>

All I can say is DTL 0.1 will be out as soon as Walter can find the bandwidth to give me the lang/comp changes I need. It will contain sequence containers. I'm hoping that this can be within a week, but it's hard to say at this point.

DTL 0.2 will be out once I get feedback on things from people, and will probably contain some tree and/or associative containers.

Once the basic lang/comp support is there for what I'm trying to do, I see no reason why the library cannot evolve quickly, and with the input of other contributors.


"Stephan Wienczny" <wienczny@web.de> wrote in message news:c66vig$2n7s$1@digitaldaemon.com...
> Matthew wrote:
>
> > I agree with that.
> >
> > Want to write them?
> >
>
> >
> I actually started to months ago, but then wanted to wait until DTL finishes ;-)


April 22, 2004
Walter wrote:
> "Norbert Nemec" <Norbert.Nemec@gmx.de> wrote in message
>> Object references on the other hand are far easier to check: objects cannot overlap, so two references are either equal or noalias.
> 
> This is, unfortunately, not true when you get into interfaces. It's also possible for pointers into class objects, as well as arrays referencing into class objects.

OK, at that point it really gets messy. Anyway: if a interface reference points into an object, it should certainly be possible to recover a pointer to the object itself? This, of course, adds a little overhead to the checking algorithm, but in debugging mode that should still be acceptable.

>> If you want to
>> allow the compiler to optimize a routine for nonaliased arguments, just
>> put in a precondition prohibiting references to identical objects.
> 
> Historically, adding in special keywords for such optimizations has not worked out well. That's why I was thinking of making it implicit for D function parameters.

True, the language would be simpler that way. Anyway:

* You will no only have to think about function arguments but also about references that are stored in objects. Every time the sourcecode handles two references, it should be possible to tell the compiler that they are not aliased. And for this, it I would suggest a builtin function "bool nonaliased(x,y)" that takes two references and checks whether they refer to disjunct portions of the memory. Then you just put an "assert(nonaliased(x,y))" before critical portions of the code and the compiler can happily optimize.

* even for function arguments: there certainly are plenty of cases, where it makes perfect sense to pass two references to the same object to some function. I wonder whether it is worth giving up all of these to be able to optimize in certain cases?

* if you take my proposal and assume that references may be aliased in general, but allow to give a powerful means (like the above mentioned "nonaliased"-builtin) to specify exactly where the compiler is allowed to optimize, then you don't restrict anyone unnecessarily. And still, authors of timecritical code can examine and specify exactly what they mean be "nonaliased".

>> * accept that pointers may always be aliased to anything and don't try to optimize too much there.
> 
> This is a good idea, but I'm concerned it may not be sufficient.

Guess then, that is a misunderstanding: by "pointers" I mean raw, C-like pointers. You already agreed that these should be allowed to alias anything. Whoever uses pointers just has to accept that they don't get utmost optimization.

>> * prohibit aliased object references by explicit preconditions/assertions
> 
> Having the compiler insert runtime checks for debug builds is a good idea. Unfortunately, as you pointed out, adding runtime checks for aliased pointers is impossible.

Again: forget about pointers. They may alias anything and cannot efficiently be checked. If anyone wants full optimization, they should use arrays, slices and object references that have enough semantics for the compiler to check for aliasing.


April 22, 2004
lacs wrote:

> Add primitive-units-checking to D and a lot of people who work with numbers will fall in love with D.
<snip>

If only we had a rational number type built in, implementing by templates would be straightforward.  Just assign each primitive unit to a different prime number.

Otherwise, we could define types to represent units (possibly based on a rational number implementation) and values.  Of course, this would move unit checking to the runtime and so wouldn't be good for performance-critical apps.

The other approach is to define a struct for each unit (primitive and in combination) that the program is going to use.  Operations would be defined to take and return the right types.  This would be compile-time checking, but require quite some repetitive code to be written.

> As you migth have guessed, the notation float<meter> has nothing to do with c++ templates and wouldnt interfere with it since it can only be used with primitives.
<snip>

Not sure about that.  Unless we're going to restrict units to a list of ad-hoc keywords, it'll break CFG just as well as the C++ template syntax does by itself.

Stewart.

-- 
My e-mail is valid but not my primary mailbox, aside from its being the
unfortunate victim of intensive mail-bombing at the moment.  Please keep
replies on the 'group where everyone may benefit.
April 22, 2004
"Norbert Nemec" <Norbert.Nemec@gmx.de> wrote in message news:c67t93$170u$1@digitaldaemon.com...
> * You will no only have to think about function arguments but also about references that are stored in objects. Every time the sourcecode handles two references, it should be possible to tell the compiler that they are not aliased. And for this, it I would suggest a builtin function "bool nonaliased(x,y)" that takes two references and checks whether they refer
to
> disjunct portions of the memory. Then you just put an
> "assert(nonaliased(x,y))" before critical portions of the code and the
> compiler can happily optimize.

If I'm understanding this correctly, it has the same problem that the "restrict" and "noalias" keywords in C have - it's too confusing to users to use correctly, as well as being aesthetically not so pleasing. I think it would be better to have the compiler assume they are not aliased (since that is by far the usual case) and have to say when they are not aliased. Also, a runtime check that they really are not aliased might be appropriate in debug mode.

Now, since aliasing is sadly allowed in C functions, I was thinking:

    extern (C) int func(int a[], int b[])     // a and b may be aliased
    extern (D) int func(int a[], int b[])    // a and be must be disjoint


April 22, 2004
Walter wrote:
> "Norbert Nemec" <Norbert.Nemec@gmx.de> wrote in message news:c67t93$170u$1@digitaldaemon.com...
>> * You will no only have to think about function arguments but also about references that are stored in objects. Every time the sourcecode handles two references, it should be possible to tell the compiler that they are not aliased. And for this, it I would suggest a builtin function "bool nonaliased(x,y)" that takes two references and checks whether they refer
> to
>> disjunct portions of the memory. Then you just put an
>> "assert(nonaliased(x,y))" before critical portions of the code and the
>> compiler can happily optimize.
> 
> If I'm understanding this correctly, it has the same problem that the "restrict" and "noalias" keywords in C have - it's too confusing to users to use correctly, as well as being aesthetically not so pleasing. I think it would be better to have the compiler assume they are not aliased (since that is by far the usual case) and have to say when they are not aliased. Also, a runtime check that they really are not aliased might be appropriate in debug mode.
> 
> Now, since aliasing is sadly allowed in C functions, I was thinking:
> 
>     extern (C) int func(int a[], int b[])     // a and b may be aliased
>     extern (D) int func(int a[], int b[])    // a and be must be disjoint

Yes, it may be confusing to users, but then - nobody has to use the feature. Only those people trying to squeeze out performance. People writing numeric libraries etc. - they will gladly accept the fine tuning capabilites.

As I said: a simple solution as in Fortran will not buy you much. References can not only come through function arguments but also from object members. Saying that function arguments may not be aliased only covers part of the problem.

The fact that makes Fortran 77 so highly optimizable without much language overhead is, that it doesn't have pointers or references at all. Aliasing could *only* come through function arguments. So once this is prohibited, the fortran compiler can simply assume that *nothing whatsoever* is aliased.

I don't know how Fortran 95 handles this issue, but I guess that, as soon as you use pointers, performance goes down.

In D, since we have references everywhere, any real solution to the aliasing problem will get a bit more complex.
April 28, 2004
In article <c66irg$216j$1@digitaldaemon.com>, Walter says...
>
>
>"Norbert Nemec" <Norbert.Nemec@gmx.de> wrote in message news:c65fmi$2js$1@digitaldaemon.com...
>> From what I have read so far, D really has the potential to close the gap between C++ and Fortran and in this way gain a huge share in the scientific/high-performance area of computing. Anyway, to have any chance to go there, much care has to be taken now.
>>
>> People unfamiliar with numeric programming often wonder why Fortran still has such a huge share among scientists. Many scientists still use
>Fortran77
>> and even those who have moved to Fortran95 only use it for its modern syntax, never touching the advanced concepts of it. And this is not only because they don't know better, but also because it is extremely hard to match the performance of Fortran77! (OK, 99% of the reason might actually be the lazyness to learn a different language and the existing code-base, but still people usual argue based on the superior performance of the language)
>
>What Fortran has over C is the 'noalias' on function parameters which allows for aggressive optimization. What I'm thinking of is writing the spec for D functions so that parameters are always 'noalias' (for extern (C) functions this would not apply).
>
>What do you think?
>
>For reference: http://www.lysator.liu.se/c/restrict.html

I am a programmer, working in a scientific area (bioinformatics).  I think it would be bad to imply no-aliasing, because it trades safety for performance.

A lot of the code here is written by biologists and/or statisticians (some of whom are quite brilliant, but only a few are trained as programmers).  They are going to go nuts trying to find bugs like this.

If you work on the "heavy lifting" code that really needs performance, you generally understand about the cache line size etc; you can be trusted to know to use the "restrict" keyword.  If you are a cytologist, writing new statistics functions, you will not know enough to use the "may-alias" keyword.  The burden of knowing that the tradeoffs has to be on the performance-guru; not on the other guy.

Kevin


April 29, 2004
On 2004-04-21 13:33:46 +0200, Norbert Nemec <Norbert.Nemec@gmx.de> said:

> Stewart Gordon wrote:
>> The features I like in F90 (and which are useful for SP) are built-in
>> vector arithmetic and aggregate functions.  The former is in the D
>> language, it just needs to be finally put into the compiler.  I've
>> briefly suggested the latter....
>> http://www.digitalmars.com/drn-bin/wwwnews?D/21671
> 
> I don't really know how much of that has to be in the language. It is nice
> to have arrays as part of the language, but vectors and matrices should
> rather be defined in the library. There are plenty of efficient ways to
> deal with arrays, which can be implemented in a very efficient way by
> optimizing (perhaps also parallelizing) compilers.
> 
> Vector arithmetic actually gives arrays special matrix semantics which is
> not what you would want in general. By far not every array is a matrix so
> why should it behave like one?
> 
> vector arithmetic and aggregate functions may just as well be defined in the
> library. Just encapsulate arrays in your own class and give it all the
> semantics you need, without forcing everyone else to get that semantics
> when his arrays are something completely different.
I agree that matrices, which are basically mathematical tools from linear algebra, do not belong in the core language. But powerful multidimensional arrays are important, if you are to avoid the mess that C++ is when it comes to high-performance programming.

To be effective, you must be able to create multidimensional static arrays at run time on the heap. As far as I can tell, you can only set the size of dynamic arrays at run time at the moment, but the data in these arrays are not contiguous in memory, and thus not good for most high-performane computting.

We really need to be able to do this, to get a continuous 2d array:

int n = 10, m = 20;
double[][] a = new double[n][m];

Drew McCormack

April 29, 2004
On 2004-04-21 20:53:10 +0200, "Walter" <walter@digitalmars.com> said:

> 
> "Norbert Nemec" <Norbert.Nemec@gmx.de> wrote in message
> news:c65fmi$2js$1@digitaldaemon.com...
>> From what I have read so far, D really has the potential to close the gap
>> between C++ and Fortran and in this way gain a huge share in the
>> scientific/high-performance area of computing. Anyway, to have any chance
>> to go there, much care has to be taken now.
> 
> I agree totally with you, which is why D has several features with a
> numerics focus. If there are any I missed, I want to know about it.

As you would have gathered, had you read my previous posts, I am strongly in favor of more powerful multidimensional arrays. This should not be left to library writers like in C++, in my view. A library writer can, in theory, write a powerful and fast class, but it holds back adoption in practice because you get problems compiling the library on different platforms, and performance may sometimes be subpar for a given platform. It also often leads to ugly syntax. Basically, it leaves too much to chance to be left to a library.

Here are some more for the wish list:

It is important to be able to create a contiguous multidimensional array on the heap at run time. There are very few occasions where I know how big an array is at compile time.

Elementwise operations are important too, but I think these are already on the list of things to do. Note that they should work with arrays of any shape, eg,

double[10][5] a, b;
double[][] c;
// Initialize a and b here
..
c = a + b;

Something which I missed when I was working yesterday with D was array literals. These don't work:

double[] a = { 1, 2, 3, 4 };
double[4] a = { 1, 2, 3, 4 };

It would be good if they did, and even that array literals were allowed in other contexts:

funcToDoSomething( someArgument, {1, 2, 3, 4} );

It would be nice if slicing worked for multidimensional arrays:

double [][] c = a[4..5][16,.19];

Of course, there are any number of elementwise functions you can think of (eg max, min, sin etc), but these would belong in the library. I am more concerned to get powerful multidimensional arrays. It is possible to build the other stuff yourself, but if you have to write a multidimensional array class that is high performance, you end up back in the C++ expression template moras.

Drew McCormack
Free University, Amsterdam

April 29, 2004
Drew McCormack wrote:
> double[] a = { 1, 2, 3, 4 };
> double[4] a = { 1, 2, 3, 4 };
> 
> It would be good if they did, and even that array literals were allowed in other contexts:
> 
> funcToDoSomething( someArgument, {1, 2, 3, 4} );
> 
> Drew McCormack
> Free University, Amsterdam
> 

http://digitalmars.com/d/arrays.html#bounds

Scroll down to "Array Initialization"... it seems you use brackets. (meaning the kind in my name - [ and ].)

It doesn't work, of course, for associative arrays though.. sadly... I'm not sure if that's meant to change or not.

-[Unknown]
April 29, 2004
"Drew McCormack" <drewmccormack@mac.com> wrote in message news:c6q5h7$2u5k$1@digitaldaemon.com...
> On 2004-04-21 13:33:46 +0200, Norbert Nemec <Norbert.Nemec@gmx.de> said:
>
> > Stewart Gordon wrote:
> >> The features I like in F90 (and which are useful for SP) are built-in vector arithmetic and aggregate functions.  The former is in the D language, it just needs to be finally put into the compiler.  I've briefly suggested the latter.... http://www.digitalmars.com/drn-bin/wwwnews?D/21671
> >
> > I don't really know how much of that has to be in the language. It is nice to have arrays as part of the language, but vectors and matrices should rather be defined in the library. There are plenty of efficient ways to deal with arrays, which can be implemented in a very efficient way by optimizing (perhaps also parallelizing) compilers.
> >
> > Vector arithmetic actually gives arrays special matrix semantics which is not what you would want in general. By far not every array is a matrix so why should it behave like one?
> >
> > vector arithmetic and aggregate functions may just as well be defined in the library. Just encapsulate arrays in your own class and give it all the semantics you need, without forcing everyone else to get that semantics when his arrays are something completely different.
> I agree that matrices, which are basically mathematical tools from linear algebra, do not belong in the core language. But powerful multidimensional arrays are important, if you are to avoid the mess that C++ is when it comes to high-performance programming.

Have you looked at the very efficient (though poorly documented) multidimensional
arrays in STLSoft (http://stlsoft.org/)? There's fixed_array for
fixed-dimensional arrays (up to four) with dimension extents variable at runtime,
and frame_array (soon to be renamed to something more meaningful) which have a
fixed number of dimensions and fixed extents (a thin veneer for STL compatility
over built-in arrays).

I reckon they're about as close to the bone as C++ will let you get. I'd be interested in hearing your opinions of the implementations.

Their storage is contiguous