Jump to page: 1 24  
Page
Thread overview
Ode to the set
Aug 21, 2001
Daniel Mantione
Aug 21, 2001
Angus Graham
Aug 22, 2001
Daniel Mantione
Aug 22, 2001
Daniel Mantione
Aug 22, 2001
Daniel Mantione
Aug 22, 2001
Daniel Mantione
Oct 16, 2001
Walter
Oct 29, 2001
Sean L. Palmer
Aug 23, 2001
Walter
Aug 24, 2001
Rajiv Bhagwat
Oct 15, 2001
Walter
Aug 23, 2001
Walter
Aug 21, 2001
Sheldon Simms
Aug 21, 2001
D Man
Aug 21, 2001
Angus Graham
Aug 22, 2001
Daniel Mantione
Sep 14, 2001
Anthony Steele
Aug 22, 2001
Charles Hixson
Aug 22, 2001
Charles Hixson
Aug 22, 2001
Charles Hixson
Aug 22, 2001
Charles Hixson
Aug 26, 2001
Dan Hursh
Aug 22, 2001
Daniel Mantione
Aug 22, 2001
Charles Hixson
Aug 22, 2001
Daniel Mantione
Aug 22, 2001
Charles Hixson
Sep 14, 2001
Anthony Steele
Sep 17, 2001
Charles Hixson
Oct 15, 2001
Walter
August 21, 2001
Hi,

One of the features I miss the most in C is the set. Being a member of the Free Pascal development team, I program a lot in Pascal. Now I don't want to say that Pascal rules and C sucks, but there are some things that Pascal is superior in.

Why do you need a set?

I think it can best be illustrated with an example. The following code code from the AOLserver http server and is used to encode a parameter to place in in a URL. For example the text "This is a sentence" is changed into "This%20is%20a%20sentence".

char *
Ns_EncodeUrl(Ns_DString *pds, char *string)
{
    static char safechars[] =
        "abcdefghijklmnopqrstuvwxyz"
        "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
        "0123456789";

    while (*string != '\0') {

        if (strchr(safechars, *string) != NULL) {
            Ns_DStringNAppend(pds, string, 1);
        } else {
            char buf[4];
            sprintf(buf, "%%%02x", (unsigned char) *string);
            Ns_DStringNAppend(pds, buf, 3);
        }
        ++string;
    }
    return pds->string;
}

That's really horrible inefficient isn't it? But it's easy to maintain, if the safechars are not correct, you can correct it by changing it. Now how would you do this into Pascal?

function ns_encodeurl(s:string):string;

const   safechars=['a'..'z','A'..'Z','0'..'9'];

var     i:longint;

begin
    ns_encodeurl:='';
    for i:=1 to length(s) do
        if s[i] in safechars then
            ns_encodeurl:=ns_encodeurl+s[i]
        else
            ns_encodeurl:=ns_encodeurl+'%'+hexstr(byte(s[i]));
end;

That looks much cleaner! And it's much more efficient; instead of a strchr call for each character the compiler now only needs a set lookup. And because this is a constant set, the compiler can optimize this into a few compare operations.

Set's make also good flags. In C you usually see code like this:

#define borders_enabled         1
#define shadow_enabled          2
#define lights_enabled          4

And then:

void do_nice_things(int flags) {
    if (flags & borders_enabled) {
    };
};

Now how would you do it with a set? A simple enumeration does the trick:

type    effects=(borders_enabled,shadow_enabled,lights_enabled);

And then:

procedure do_nice_things(flags:set of effects);

begin
    if borders_enabled in effects then
        begin
        end;
end;

... so no terrible long header files. Just simple enumerations, while the compiler will propably generate the same code.

So, how about a set in D??

Greetings,

Daniel Mantione
August 21, 2001
"Daniel Mantione" <daniel@deadlock.et.tudelft.nl> wrote
>
> That looks much cleaner! And it's much more efficient; instead of a strchr call for each character the compiler now only needs a set lookup. And because this is a constant set, the compiler can optimize this into a few compare operations.

Well, to be fair, the C version doesn't have to use strchr. You could just change

if (strchr(safechars, *string) != NULL) {

to

char c = *string;
if( (c>='a' && 'z'>=c) || (c>='A' && 'Z'>=c) || (c>='0' && '9'>=c) )

or even

char set[][] = { {'a','z'},
                         {'A','Z'},
                         {'0','9'};
char c = *string;
if( (c>=set[0][0] && set[0][1]>=c) || (c>=set[1][0] && set[1][1]>=c) ||
(c>=set[2][0] && set[2][1]>=c) )

or finally

char set[][] = { {'a','z'},
                         {'A','Z'},
                         {'0','9'};
char c = *string;
for( int i=0; i<sizeof(set); ++i )
{
    found = (c>=set[i][0] && set[i][1]>=c);
}
if(found)
{

which is a little hectic, bit is what I presume the Pascal version does internally.


Angus Graham


August 21, 2001
Angus Graham wrote:
> 
> "Daniel Mantione" <daniel@deadlock.et.tudelft.nl> wrote
> >
> > That looks much cleaner! And it's much more efficient; instead of a strchr call for each character the compiler now only needs a set lookup. And because this is a constant set, the compiler can optimize this into a few compare operations.
> 
> Well, to be fair, the C version doesn't have to use strchr. You could just change... [options snipped]

Those alternatives aren't very pretty, IMO.

strchr()'s actually pretty efficient -- only one function call involved, and I presume the typical Pascal system is going to make a function call to do a set lookup. However, strchr() isn't a general object-in-set function.

For the general case, are D's associative arrays good enough, Daniel?

-RB
August 21, 2001
Im Artikel <9luk4s$2oqi$1@digitaldaemon.com> schrieb "Angus Graham" <agraham_d@agraham.ca>:

> "Daniel Mantione" <daniel@deadlock.et.tudelft.nl> wrote
>>
>> That looks much cleaner! And it's much more efficient; instead of a strchr call for each character the compiler now only needs a set lookup. And because this is a constant set, the compiler can optimize this into a few compare operations.
> 
> Well, to be fair, the C version doesn't have to use strchr. You could just change
> 
> if (strchr(safechars, *string) != NULL) {
> 
> to
> 
> char c = *string;
> if( (c>='a' && 'z'>=c) || (c>='A' && 'Z'>=c) || (c>='0' && '9'>=c) )
> 
> or even
> 
> char set[][] = { {'a','z'},
>                          {'A','Z'},
>                          {'0','9'};
> char c = *string;
> if( (c>=set[0][0] && set[0][1]>=c) || (c>=set[1][0] && set[1][1]>=c) ||
> (c>=set[2][0] && set[2][1]>=c) )
> 
> or finally
> 
> char set[][] = { {'a','z'},
>                          {'A','Z'},
>                          {'0','9'};
> char c = *string;
> for( int i=0; i<sizeof(set); ++i )
> {
>     found = (c>=set[i][0] && set[i][1]>=c);
> }
> if(found)
> {
> 
> which is a little hectic, bit is what I presume the Pascal version does internally.

Which basically proves Daniel's point. Why should the programmer have to code this explicitly?

-- 
Sheldon Simms / sheldon@semanticedge.com
August 21, 2001
I second that.  I am not a fan of Pascal, but if there are features that are this handy...

"Sheldon Simms" <sheldon@semanticedge.com> wrote in message news:9lul5d$2p87$1@digitaldaemon.com...
> Im Artikel <9luk4s$2oqi$1@digitaldaemon.com> schrieb "Angus Graham" <agraham_d@agraham.ca>:
>
> > "Daniel Mantione" <daniel@deadlock.et.tudelft.nl> wrote
> >>
> >> That looks much cleaner! And it's much more efficient; instead of a strchr call for each character the compiler now only needs a set lookup. And because this is a constant set, the compiler can optimize this into a few compare operations.
> >
> > Well, to be fair, the C version doesn't have to use strchr. You could just change
> >
> > if (strchr(safechars, *string) != NULL) {
> >
> > to
> >
> > char c = *string;
> > if( (c>='a' && 'z'>=c) || (c>='A' && 'Z'>=c) || (c>='0' && '9'>=c) )
> >
> > or even
> >
> > char set[][] = { {'a','z'},
> >                          {'A','Z'},
> >                          {'0','9'};
> > char c = *string;
> > if( (c>=set[0][0] && set[0][1]>=c) || (c>=set[1][0] && set[1][1]>=c) ||
> > (c>=set[2][0] && set[2][1]>=c) )
> >
> > or finally
> >
> > char set[][] = { {'a','z'},
> >                          {'A','Z'},
> >                          {'0','9'};
> > char c = *string;
> > for( int i=0; i<sizeof(set); ++i )
> > {
> >     found = (c>=set[i][0] && set[i][1]>=c);
> > }
> > if(found)
> > {
> >
> > which is a little hectic, bit is what I presume the Pascal version does internally.
>
> Which basically proves Daniel's point. Why should the programmer have to code this explicitly?
>
> --
> Sheldon Simms / sheldon@semanticedge.com


August 21, 2001
> Which basically proves Daniel's point. Why should the programmer have to code this explicitly?

Becuase if you have a large list of things to test membership in you can use
a real container.  If you have a tiny list you can use one line of code.
How often would a built in set be useful?  To me, seldom to never, so why
cut down a tree and put it in every D manual?
I'm not even too happy about having associative arrays built in.  Much as I
love them in Perl, it's just gonna make the runtime bigger and slower, my
manual longer and heavier.  A nice standard library is the place for
containers.

Angus Graham


August 22, 2001
Russell Bornschlegel wrote:

> Those alternatives aren't very pretty, IMO.

That's indeed the point. Code like this is hard to maintain.

> strchr()'s actually pretty efficient -- only one function call involved, and I presume the typical Pascal system is going to make a function call to do a set lookup. However, strchr() isn't a general object-in-set function.

Our own Free Pascal compiler uses compares if the amount of compares that are needed is smaller than 9. If the amount of compares is greater than 9, it creates a bit map, and checks if bit number x is set.

So:

if s[i] in ['0'..'9']

.. will generate the same code as ..

if i>='0' and i<='9'

.. while ..

if s[i] in ['a','d','#','A..X','!','-','+','0..9']

.. will generate code similar to ..

const bitmap:packed array[#0..#255] of boolean=(0,0,0, .... );

if bitmap[s[i]] then

--

Both are much more efficient than a strchr call. Furthermore, strchr is often optmized for very long strings, look at the implementation in the GNU libc for an example how much code it is.


> For the general case, are D's associative arrays good enough, Daniel?

They are a cool feature, but how would you use them as a set replacement?

Daniel
August 22, 2001
Angus Graham wrote:

>> Which basically proves Daniel's point. Why should the programmer have to code this explicitly?
> 
> Becuase if you have a large list of things to test membership in you can use a real container.

This is true for complex types. Sets of records would be example be a very bad idea, the question comes wether they need to be stored in a tree, btree, or whatever. In a practical language the programmers needs to be in control of this. In these cases, a tree object or a btree object is more apropiate.

But in the case of ordinal types, a set isn't easily replaced by a library.

>  If you have a tiny list you can use one line of code.

Only in the case of a constant set!

And even then, people are going to do things like strchr yo make things easy to maintain. With a set, you have both: clean looking code and very efficient code.

> How often would a built in set be useful?  To me, seldom to never, so why cut down a tree and put it in every D manual?

You have never programmed in Pasacal or Delphi? If I program in C I *really* miss the set.

> I'm not even too happy about having associative arrays built in.  Much as
> I love them in Perl, it's just gonna make the runtime bigger and slower,
> my
> manual longer and heavier.  A nice standard library is the place for
> containers.

There is some truth in this. Associative arrays also need to be stored in internal datastructures over which the programmer has no control. An object might be a good idea too here. If you do it with properties, it will feel like an array too.

Greetings,

Daniel Mantione
August 22, 2001
Daniel Mantione wrote:
> Hi,
> 
> One of the features I miss the most in C is the set. Being a member of the Free Pascal development team, I program a lot in Pascal. Now I don't want to say that Pascal rules and C sucks, but there are some things that Pascal is superior in.
> 
>...
It's hard to argue that sets wouldn't be nice to have.  So would AVL-trees, and B+Trees.  And other classical data structures.  Perhaps a better approach would be to emphasize ways that make the language easy to extend with these features.  If not, then if it came to a choice my vote would be for the inclusion of B+Trees as a standard language feature.  I would want to use them far more frequently, and they are much more work to implement.

So, perhaps the question should be, what possible language (as opposed to library) features would make it easier to integrate sets into the language?  And how could these be generalized to support data strudtures in general?  (Presumably including associative arrays, strings, and the other built in features.)

One thing that most of these seem to do is to allow one to store data by means of some non-numeric keying function (usually a string, but lets try to keep this general).  Another is to test the structure for the existence of that key.  Another is to retrieve data associated with that key.  Another is to remove a key, and its associated data.  Anything else primitive-common?


Then the operations are, essentially a[key]=x, x=a[key], x in a, and a[key]=null.
Can these operations be defined for arbitrary user defined data structures?

The one remaining operation is initialization.  If a data structure were a class, then this could be handled with a constructor.  So the problem would be in specifying how the argument list of the constructor could cope with the kinds of argument that you want to use to initialize a set.

Is this a fair analysis?  If so, then I feel it a better basis to proceed than on a data structure by data structure basis.  And what is required here is that in be made an overloadable operator.  (Well, and the argument list problem.  I don't know how that should be handled.  I usually start with an empty data structure and add things to it.)


August 22, 2001
Charles Hixson wrote:
> If not, then if it came to a choice my
> vote would be for the inclusion of B+Trees as a standard language
> feature.  I would want to use them far more frequently, and they are
> much more work to implement.

Are you interested in the primitive functionality of B+Trees, or of some particular implementation-dependent characteristic of them?

In a garbage-collected language like D, where you have few or no performance guarantees to begin with, it doesn't seem necessary to make a big deal over the latter.

> One thing that most of these seem to do is to allow one to store data by means of some non-numeric keying function (usually a string, but lets try to keep this general).  Another is to test the structure for the existence of that key.  Another is to retrieve data associated with that key.  Another is to remove a key, and its associated data.  Anything else primitive-common?

Yes: iterate over the whole collection, performing an arbitrary operation on each element. This is the only operation I see in your list that isn't supported by associative arrays in the D spec.

A set can be represented by an associative array where the contents of an existing element are unused -- all you care about is the existence or absence of an element.

Consider that the STL set and map containers are normally implemented
as some flavor of binary tree, while Perl's associative array, which
does essentially the same thing, is normally referred to and
implemented as a hash. They're used the same way, but have slightly
different performance characteristics -- that most people aren't
worried about, because they're lazy and would otherwise be doing
linear searches on arrays or linked lists, so the near-constant
lookup time of a hash or the log2 lookup time of a tree is a huge
win for them. Trees can also maintain the collection in order,
while hashes can't.

> Then the operations are, essentially a[key]=x, x=a[key], x in a, and a[key]=null.

The D-spec associative array specifies "delete a[key]" instead of "a[key] = null", and I think you meant "key in a" instead of "x in a", but yeah.

Note -- I do agree that the language should support adding new data structures for your particular needs; to do them reasonably requires:

- operator overloading to make them pretty, including overloading operator delete, operator in, and operator [].

- templates or the equivalent to make them reusable and type-safe.

It looks like you aren't going to get both these features in early releases of D, unfortunately...

-Russell B
« First   ‹ Prev
1 2 3 4