View mode: basic / threaded / horizontal-split · Log in · Help
March 09, 2006
Re: std.array suggestion
Oskar Linde wrote:
> Don Clugston wrote:
>> Oskar Linde wrote:
>> I'll just comment on one function:
>>
>>  > T sum(T[] arr)
>>  >
>>  > Returns the sum of the element in arr as defined by the + operator.
>>
>> This one has an interesting twist. I'm not sure that the sum should 
>> necessarily be of type T. I've been playing around with the concept of 
>> what I've called the Archetype of a type, which is the largest type 
>> with the same semantics as T (possibly with the same calculation 
>> speed). (ie,
>> Archetype!(byte)= Archetype!(short)
>> Archetype!(int) = long
>> Archetype!(float) = Archetype!(double) = real,
>> Archetype!(cfloat)= creal, etc). Obviously it's a trivial template.
> 
> Interesting...
> 
>> I think that at least,  sum(double[] ) should internally use a real 
>> while accumulating the sum, so that it can satisfy this test (at least 
>> on x86 platforms):
>>
>> unittest {
>>    const double a = [ double.max, double.max, -double.max];
>>    assert(sum(a) == double.max);
>> }
>>
>> After all, this is one of the reasons why reals exist. I'm still not 
>> sure if sum(double []) should return a double or a real, although I'm 
>> inclined to think that *any* function that returns a single floating 
>> point value should return a real (on x87, the 80-bit result is just 
>> left on the FPU stack anyway). But, I'm less confident about how a sum 
>> of ints should behave.
> 
> Int overflows is well defined, making the sum of ints behave correctly 
> in the case above. Int is also the promotion type for integral 
> operations and would be the natural return type for sum(int[]). A sum of 
> shorts should probably return an int though, following integer promotion 
> rules. Should Archetype for integers follow the integer promotion rules 
> or all be long?

I'm really not sure. But you make a good point -- if it mimics the 
promotion rules, it's easy to justify the choice.
There's an implementation of Archetype!() with a sum!() that behaves in 
this way at http://svn.dsource.org/projects/mathextra/trunk/mathtempl.d

I also have a sumList(a[]...) implementation there, but it's just an 
experiment. Doesn't work too well with the current IFTI limitations.

If IFTI worked for ... arguments, we could just write

template min(T)
{
   T min(T[]...) {
   }
}

and then it would work for both:
int z = 4;
int [] a = [3, 5, 67];

int x = min(5, 6, z, 2, 9);
int y = min(a);

but unfortunately ... arguments don't let you do the implicit property 
trick. So I don't know if it's a good idea or not.

>> However, all the other functions seem to be free of mathematical 
>> subtleties. sum() is the only one which involves arithmetic operators, 
>> and therefore it might not belong with the rest.
> 
> You make a strong argument and I agree. sum() is used as join() in other 
> languages that lack the distinction between addition and concatenation. 
> D doesn't need a non-arithmetic sum function.

The ~ operator was a stroke of genius. It has many wonderful consequences.
March 09, 2006
Re: std.array suggestion
Oskar Linde wrote:
<snip>
> Oskar

I would also add:

T[]  left( T[] array, int n )
T[]  right( T[] array, int n )
T[]  skip( T[] array, int n )

In my own personal parsing stuff I regularly use the above plus:

T[]  countUntil( char[] str, bool delegate(char) pred );
T[]  countWhile( char[] str, bool delegate(char) pred );

So skipping whitespace becomes:

text = skip( text.countWhile( delegate bool(char c){ c<=32; } ) );

-DavidM
March 09, 2006
Re: std.array suggestion
John C wrote:
> "Oskar Linde" <oskar.lindeREM@OVEgmail.com> wrote in message 
> news:dup4fr$2c0b$3@digitaldaemon.com...
>> With the corresponding inplace versions:
>>
>> T[] doSort(T[]);
>> T[] doStableSort(T[]);
>> T[] doSort(T[], delegate|function wrongOrder(T,T));
>> T[] doReverse(T[]);
>>
> 
> My only reservation would be the decision to prefix these functions with 
> "do". It's tautological and doesn't express anything about them being 
> in-place versions. I suggest just calling them sortInPlace/inPlaceSort (or 
> even sortInSitu or inSituSort). 

I have not been giving much thought to the naming of the in-place 
versions. I briefly considered sortInPlace and inPlaceSort, but they 
appeared too unwieldy. "do" was meant to signal the side-effect by 
implying that the function "does" something, rather than just being a 
definition of a mapping of values onto other values. (A void return type 
would help too.) If I were a native English speaker, I might have felt 
differently about the "do" prefix though.

The general idea is to make the default (short) form not have 
side-effects. (Which I believe reduces surprises). Otherwise, I guess 
you could have named the non-in-place versions sorted() filtered() 
mapped() reversed() etc.

Before trying to justify the naming, I would need to justify the 
decision to make non-in-place versions the default. This is really a 
philosophical question, but I think the most important thing is to be 
consistent throughout the library. This choice seems to go hand in hand 
with the Phobos philosophy of using copy-on-write as often as possible. 
There are also plenty of functions that behave like this: 
std.string.tolower, toupper, capitalize, capwords, ljustify and replace 
to name a few. On the other hand, we have the in-place .sort and 
.reverse that behaves contrary to this.

I'm not really set on any naming and appreciate all suggestions.

/Oskar
March 09, 2006
Re: std.array suggestion
Oskar Linde wrote:
<snip>
> T[] doSort(T[], delegate|function wrongOrder(T,T));
<snip>

You've left off the return type of wrongOrder.  Moreover, why the name 
wrongOrder?  To simply return whether or not these two elements are out 
of order?  Why not use what people are used to, i.e. a function that 
returns -ve, 0 or +ve for <, = or > respectively?

Stewart.

-- 
-----BEGIN GEEK CODE BLOCK-----
Version: 3.1
GCS/M d- s:- C++@ a->--- UB@ P+ L E@ W++@ N+++ o K-@ w++@ O? M V? PS- 
PE- Y? PGP- t- 5? X? R b DI? D G e++>++++ h-- r-- !y
------END GEEK CODE BLOCK------

My e-mail is valid but not my primary mailbox.  Please keep replies on 
the 'group where everyone may benefit.
March 09, 2006
Re: std.array suggestion
Stewart Gordon wrote:
> Oskar Linde wrote:
> <snip>
>> T[] doSort(T[], delegate|function wrongOrder(T,T));
> <snip>
> 
> You've left off the return type of wrongOrder.  Moreover, why the name 
> wrongOrder?  To simply return whether or not these two elements are out 
> of order?  Why not use what people are used to, i.e. a function that 
> returns -ve, 0 or +ve for <, = or > respectively?

The return value would be anything usable as a condition in a 
conditional function if(wrongOrder(a,b)) {...}

The reason for not having -ve,0,+ve is that you only need a binary 
predicate for sorting. It is also in many cases simpler to define a 
ordering predicate than a 3-valued ordering. The name wrongOrder just 
helps as a reminder of what the predicate should return.

names.sort(delegate bool(char[] a, char[] b) { return a > b; });

would sort the names in alphabetical order. To sort in reverse 
alphabetical order:

names.sort(delegate bool(char[] a, char[] b) { return a < b; });

records.sort(delegate bool(Person a, Person b) { return a.name < b.name;});

/Oskar
March 09, 2006
Re: std.array suggestion
Oskar Linde wrote:
> Stewart Gordon wrote:
>> Oskar Linde wrote:
>> <snip>
>>> T[] doSort(T[], delegate|function wrongOrder(T,T));
>> <snip>
>>
>> You've left off the return type of wrongOrder.  Moreover, why the name 
>> wrongOrder?  To simply return whether or not these two elements are 
>> out of order?  Why not use what people are used to, i.e. a function 
>> that returns -ve, 0 or +ve for <, = or > respectively?
> 
> The return value would be anything usable as a condition in a 
> conditional function if(wrongOrder(a,b)) {...}
> 
> The reason for not having -ve,0,+ve is that you only need a binary 
> predicate for sorting.

But depending on the sorting algorithm, I can imagine that either may be 
more efficient.

Quite a lot of things at the moment rely on a three-valued ordering.

C: strcmp, memcmp, qsort
Java: compareTo (sic), Comparator
D: opCmp, std.string.cmp

Even if you do have it documented that doSort is unusual, somebody might 
either forget this or be already thinking on the three-valued terms when 
they see code that uses it.

> It is also in many cases simpler to define a 
> ordering predicate than a 3-valued ordering. The name wrongOrder just 
> helps as a reminder of what the predicate should return.
<snip>

If that's so, I wonder why so many things have stuck with the 
three-valued ordering.

Stewart.

-- 
-----BEGIN GEEK CODE BLOCK-----
Version: 3.1
GCS/M d- s:- C++@ a->--- UB@ P+ L E@ W++@ N+++ o K-@ w++@ O? M V? PS- 
PE- Y? PGP- t- 5? X? R b DI? D G e++>++++ h-- r-- !y
------END GEEK CODE BLOCK------

My e-mail is valid but not my primary mailbox.  Please keep replies on 
the 'group where everyone may benefit.
March 09, 2006
Re: std.array suggestion
Oskar Linde wrote:
> Hello,
> 
> With the new IFTI support I have been looking at ways of upgrading the 
> standard library with new and more generic functions. What follows is my 
> suggestions for functions to be added to std.array. I have implemented 
> all of them and the current (0.149) limited IFTI-support is enough to 
> support them. That being said, I wish to hold off making a source code 
> submission until a) I get review comments on the suggested function 
> prototypes and b) it is more clear how far D is taking IFTI support 
> (meaning possibly neater implementation).

Just a few quick comments, as I've begun thinking about this for Ares as 
well.

> All functions are designed to be used both as free function and as 
> implicit array methods. Except for the inplace versions, no functions 
> modifies the array.
> 
> The prototype notation is my own. a|b means two alternative types. T is 
> the generic element type. T[] is the array.

To those who are unfamiliar with ITI, it's worth noting that any 
template parameters that cannot be determined implicitly should be 
placed at the beginning of the template parameter list.  Doing so should 
allow this to be possible:

	template func( Ret, Val )
	{
	    Ret func( Val val ) {}
	}
	
	int i;
	char c = func!(char)( i );

This isn't implemented yet, but I suspect it will be soon.

> -----------
> 
> T fold(T[] arr, T init, T delegate|function combiner(T,T));
> 
> Generic array recursion function. combiner is called recursively:
>     return combiner(init,fold(arr[1..$],arr[0],combiner));
> (The actual implementation will of course call the combiner iteratively)
> 
> 
> T max(T[] arr)
> 
> Returns the maximum element in arr as defined by the > operator.
> 
> 
> T min(T[] arr)
> 
> Returns the minimum element in arr as defined by the < operator.

min and max should probably allow a comparison predicate to be supplied 
as well.  Thus the declarations would be something like this:

T min(T[] arr, P pred = &less!(T));

I'm also hoping the above syntax will actually work, and that P will 
resolve to being a function pointer by default.

> T sum(T[] arr)
> 
> Returns the sum of the element in arr as defined by the + operator.
>
> ptrdiff_t find(T[] arr, T|delegate|function d);
> 
> Returns the index of the first occurence of d or the first true 
> predicate d applied to the elements in order. Returns -1 when no element 
> is found.

I suggest returning size_t instead and using size_t.max in place of -1. 
 The two are basically equivalent, but using size_t offers a bit more 
range in legal array sizes as the flag value occupies the least 
significant bit rather than the most significant.

I also almost suggested adding a "substring" find function, except that 
seems more appropriate for a std.string module.  Have you thought about 
how the two will overlap?  I'm not entirely certain how template 
overloading will work for functions defined in different modules.  In 
fact, if this is indeed a limitation them it might be prudent to put all 
such algorithms in std.array instead of splitting them between std.array 
and std.string (I just thought of this--I had been considering multiple 
modules for overloads in Ares, and I suspect this won't work).

> size_t indexOf(T[] arr, T|delegate|function d);
> 
> Like find, but throws on missing element.
> 
> 
> T[][] split(T[] arr, T|T[]|delegate|function d);
> 
> Split the array arr using a predicate/element/subarray d.
> (obsoletes std.string.split that only works for char[])
> 
> 
> T[] join(T[][] arr, T|T[]|delegate|function separator);
> 
> Join the elements array arr using separator.
> (obsoletes std.string.join that only works for char[])

How do you envision the delegate version being applied?  Is it expected 
to return an element/array for a separator?

> T[] join(T[][] arr);
> 
> Join the array T[][] without separator.
> 
> 
> U[] map(T[] arr, U delegate|function f(T));
> 
> Map the elements of arr over function f, returning an array of the results.
> 
> 
> T[] filter(T[] arr, delegate|function p(T));
> 
> Filters arr over the predicate p, returning array of elements of arr 
> where the predicate is true.
> 
> 
> Possible inplace version of map:
> 
> T[] doMap(T[], T delegate|function f(T));
> 
> ---------
> 
> I would also prefer those over the language built in .sort .reverse:
> 
> T[] sort(T[]);
> T[] stableSort(T[]);
> T[] sort(T[], delegate|function wrongOrder(T,T));
> T[] reverse(T[]);
> 
> 
> With the corresponding inplace versions:
> 
> T[] doSort(T[]);
> T[] doStableSort(T[]);
> T[] doSort(T[], delegate|function wrongOrder(T,T));
> T[] doReverse(T[]);
> 
> ---------
> 
> Is there in general even any interest in adding generic functions to the 
> standard library?

Heck yes.  Many of the functions you've mentioned above were ones I was 
planning to implement.  I also like that you're using delegates as 
predicates, as it makes them far more generic than some of the hardcoded 
versions in Phobos' std.string.  I don't suppose you'd like to submit 
them to Ares as well?  Or please don't object if I end up implementing 
something quite similar :-)


Sean
March 09, 2006
Re: std.array suggestion
Don Clugston wrote:
> Oskar Linde wrote:
>> Hello,
>>
>> With the new IFTI support I have been looking at ways of upgrading the 
>> standard library with new and more generic functions. 
> 
> [snip]
>  > Is there in general even any interest in adding generic functions to the
>  > standard library?
> 
> Some of these functions are the last thing remaining in std.math2 (but 
> they definitely don't below there, none of them are truly mathematical). 
> We definitely want to remove std.math2 prior to 1.0, std.array sounds 
> good to me.
> 
> I'll just comment on one function:
> 
>  > T sum(T[] arr)
>  >
>  > Returns the sum of the element in arr as defined by the + operator.
> 
> This one has an interesting twist. I'm not sure that the sum should 
> necessarily be of type T. I've been playing around with the concept of 
> what I've called the Archetype of a type, which is the largest type with 
> the same semantics as T (possibly with the same calculation speed). (ie,
> Archetype!(byte)= Archetype!(short)
> Archetype!(int) = long
> Archetype!(float) = Archetype!(double) = real,
> Archetype!(cfloat)= creal, etc). Obviously it's a trivial template.

Interesting idea.  I've used "LargerOf" templates for min/max 
comparisons before:

template min(T,U) { LargerOf!(T,U) min( T t, U u ) {} }

but the idea of Archetypes is a nice abstraction of this idea.

> I think that at least,  sum(double[] ) should internally use a real 
> while accumulating the sum, so that it can satisfy this test (at least 
> on x86 platforms)

Agreed.

> However, all the other functions seem to be free of mathematical 
> subtleties. sum() is the only one which involves arithmetic operators, 
> and therefore it might not belong with the rest.

sum almost seems out of place here, except that it's an array algorithm. 
 I wonder if it might be more appropriate to create a numerics module 
for this sort of thing?


Sean
March 09, 2006
Re: std.array suggestion
In article <dup8rj$68s$1@digitaldaemon.com>, Don Clugston says...
>
>Oskar Linde wrote:
>> Hello,
>> 
>> With the new IFTI support I have been looking at ways of upgrading the 
>> standard library with new and more generic functions. 
>
>[snip]
> > Is there in general even any interest in adding generic functions to the
> > standard library?
>
>Some of these functions are the last thing remaining in std.math2 (but 
>they definitely don't below there, none of them are truly mathematical). 
>We definitely want to remove std.math2 prior to 1.0, std.array sounds 
>good to me.
>
>I'll just comment on one function:
>
> > T sum(T[] arr)
> >
> > Returns the sum of the element in arr as defined by the + operator.
>
>This one has an interesting twist. I'm not sure that the sum should 
>necessarily be of type T. I've been playing around with the concept of 
>what I've called the Archetype of a type, which is the largest type with 
>the same semantics as T (possibly with the same calculation speed). (ie,
>Archetype!(byte)= Archetype!(short)
>Archetype!(int) = long
>Archetype!(float) = Archetype!(double) = real,
>Archetype!(cfloat)= creal, etc). Obviously it's a trivial template.
>
>I think that at least,  sum(double[] ) should internally use a real 
>while accumulating the sum, so that it can satisfy this test (at least 
>on x86 platforms):
>
>unittest {
>    const double a = [ double.max, double.max, -double.max];
>    assert(sum(a) == double.max);
>}
>
>After all, this is one of the reasons why reals exist. I'm still not 
>sure if sum(double []) should return a double or a real, although I'm 
>inclined to think that *any* function that returns a single floating 
>point value should return a real (on x87, the 80-bit result is just left 
>on the FPU stack anyway). But, I'm less confident about how a sum of 
>ints should behave.
>
>However, all the other functions seem to be free of mathematical 
>subtleties. sum() is the only one which involves arithmetic operators, 
>and therefore it might not belong with the rest.

I would prefer a syntax like this, instead of (or in addition to) the
archetype version:

void sum(in T1 a, out T2 b)

This way, I can sum ints into a double or doubles into a float if that's what I
really want to do.  A user who is worried about overflow of int, can sum into a
long.

Otherwise, most users will do a cast of the return value back to T, which means
there is cast proliferation.

One can imagine a "product" version:

void product(int T1 a, out T2 b);

In this case, "long" won't cut it for many cases - but a user can supply a
variable length integer type of their own design, so long as it provides a
method like "opAddAssign(int foo)".

Also note:  I deliberately use T1 instead of T1[] here.  I think the container
should not be restrained to a regular array.  This code (template syntax
omitted):

: void sum(T1 inp, T2 outp)
: {
:     foreach(auto sub; inp) {
:         outp += sub;
:     }
: }

. is more flexible than:

: void sum(T1[] inp, T2 outp)
: {
:     foreach(T1 sub; inp) {
:         outp += sub;
:     }
: }

Because it works with any class that can do opApply(), not just a vanilla array.

Kevin
March 09, 2006
Re: std.array suggestion
Kevin Bealer wrote:
> 
> Also note:  I deliberately use T1 instead of T1[] here.  I think the container
> should not be restrained to a regular array.  This code (template syntax
> omitted):
> 
> : void sum(T1 inp, T2 outp)
> : {
> :     foreach(auto sub; inp) {
> :         outp += sub;
> :     }
> : }
> 
> . is more flexible than:
> 
> : void sum(T1[] inp, T2 outp)
> : {
> :     foreach(T1 sub; inp) {
> :         outp += sub;
> :     }
> : }
> 
> Because it works with any class that can do opApply(), not just a vanilla array.

I agree.  However, opApply ranges don't support random access, so I 
think there's value in providing an array-specific overload.  find() is 
a good example here, as the opApply version would need to accept a 
delegate(size_t,inout T) and fewer optimizations would be possible.


Sean
1 2 3
Top | Discussion index | About this forum | D home