August 22, 2001
Hi,

Yes, this looks like a very promising implementation to use sets. What you are essentially saying is that an array of bits has set semantics, like the "in" operator.

This fits very well in the philosophy of the language. In C you can use an int as boolean, in Pascal not. This can apply to arrays/sets as set as well.

Still, I would propose to also have some kind of set constructor, to be able to code:

if (i in ['a'..'z']) {};

So the set constructor ['a'..'z'] constructs an associative array of bits. Now the compiler can analize this and see that the same result can be achieved by doing compares. In fact, Free Pascal does this when it encounters a constant set.

Note that a set constructor is not something to write down a constant set, but an expression that results in a set. The following code is completely legal:

var a,b:longint;
    c:set;

begin
    a:=1;
    b:=3;
    c:=[a,b,2*b+7,4,5,6];
end;

How about using [...] as set constructor in D? I don't think it makes the syntax ambiguous. Or do we need a different constructor?

Your example would then be written as:

   char[] hexlate = "0123456789ABCDEF";
   bit[char] safechars = ['a'..'z','A'..'Z','0'..'9'];

   // all others left undefined/null/nonexistent

   char[] source, destination;

   for (i = 0; i < source.length(); i++)
   {
     char c = source[i];
     if (c in safechars)
     {
         // append character
         destination.append( c );
         // Walter: does destination += c do the Right Thing?
     }
     else
     {
         // append %## code
         destination.append( '%%' );
         destination.append( hexlate[ ( (c >> 4) & 0x0F ) ] );
         destination.append( hexlate[ (c & 0x0F) ] );
     }
   }

Russell Bornschlegel wrote:

> Daniel Mantione wrote:
> 
>> No, this won't do as replacement; it is much to cumbersome to use. You are right that a set allways needs to be a bitmap in unoptimized form. The compiler can then decide if it optimizes the bitmap away.
>> 
>> Try to code my first example with such an array. It's no replacement.
> 
> Your Ns_EncodeUrl() example? In C, that's a mess for a number of reasons, not the least of which is C's inadequate string handling.
> 
> In terms of performance efficiency, in C, the set lookup is going to get swamped by the sprintf. In Pascal, the set lookup may be faster but it'll be drowned out over the long haul by array bounds checking. :)

What do you mean? There are no arrays involved.

> In terms of implementation cleanliness, Pascal wins out here. In D, I think the _usage_ of the set is just as clean; if assoc-array-of-bits is optimized specially, then performance should be basically equal, and that just leaves the instantiation of the set.
> 
> So, I concede the desirability of static range-initialization in associative arrays in D:
> 
>   char[] hexlate = "0123456789ABCDEF";
>   bit[char] safechars;
> 
>   // we want a way of saying this in a static initializer
>   safechars[ 'a'..'z' ] = 1;
>   safechars[ 'A'..'Z' ] = 1;
>   safechars[ '0'..'9' ] = 1;
> 
>   // all others left undefined/null/nonexistent
> 
>   char[] source, destination;
> 
>   for (i = 0; i < source.length(); i++)
>   {
>     char c = source[i];
>     if (c in safechars)
>     {
>         // append character
>         destination.append( c );
> // Walter: does destination += c do the Right Thing?
>     }
>     else
>     {
>         // append %## code
>         destination.append( '%%' );
>         destination.append( hexlate[ ( (c >> 4) & 0x0F ) ] );   // upper 4
>         bits of c as hex digit
>         destination.append( hexlate[ (c & 0x0F) ] );            // lower 4
>         bits of c as hex digit
>     }
>   }
> 
> Which seems reasonably clean to me. It's not clear to me from my reading of the D spec whether Walter intended range initialization to work on associative arrays.
> 
> -RB

Greetings,

Daniel Mantione
August 22, 2001
Charles Hixson wrote:

> Considering that bit arrays are an included part of the language, I fail to understand why a set could not be implemented as a library.  Given that the compiler would have all the necessary information, it should (this might be optomistic) be able to optimize references to your "SmallSet" class as inline calls.  So the problem is just syntax, which is what I was discussing.  If that was answered, then I didn't understand the answer.

To optimize things well, you need compiler magic. A simple ...

if i in [0..9,18..20]

... would never get optimized if the set was implemented in a library. So compiler would be able to see that it can optimize the bitmap away in this case.

About syntax, you would like that a set that is implemented with a btree in a library for example, programs same as the built in set. To be able to do that you need operator overloading, which is being heavily dicussed.

Daniel
August 22, 2001
Daniel Mantione wrote:
> Yes, this looks like a very promising implementation to use sets. What you are essentially saying is that an array of bits has set semantics, like the "in" operator.

As currently specified, though, associative-array-of-bit actually has three states per key: 1, 0, or not-in-set. That's not a great fit to an underlying array-of-bit implementation.

> Still, I would propose to also have some kind of set constructor, to be able to code:
> 
> if (i in ['a'..'z']) {};
> 
> So the set constructor ['a'..'z'] constructs an associative array of bits. Now the compiler can analize this and see that the same result can be achieved by doing compares. In fact, Free Pascal does this when it encounters a constant set.

OK.

> > In terms of performance efficiency, in C, the set lookup is going to get swamped by the sprintf. In Pascal, the set lookup may be faster but it'll be drowned out over the long haul by array bounds checking. :)
> 
> What do you mean? There are no arrays involved.

Not in that code snippet, but in the rest of the program, whatever that program might be. That's what I meant by "the long haul".

I really tried to resist suggesting:

 static char* translate[] = { "%00", "%01", "%02" ... "0", "1", "2" ... "A", "B", "C" ... "a", "b", "c" ... "%FE", "%FF"
}; // about 2K bytes

 while (*string)
 {
   char* x = translate[*string++];
   while (*x)
   {
     *dest++ = *x++;
   }
 }

-RB
August 22, 2001
Russell Bornschlegel wrote:

>> What do you mean? There are no arrays involved.
> 
> Not in that code snippet, but in the rest of the program, whatever that program might be. That's what I meant by "the long haul".

Luckily you can turn off range checking, which you usually do when your program is bug-free.

Daniel
August 22, 2001
Russell Bornschlegel wrote:
> Charles Hixson wrote:
> 
>>...
> 
> Hmm. How about:
> 
>    foreach (keytype k in assoc_array.keys)
>    {
>        operate_on( k );
>    }
> 
> This overloads "in" somewhat, but the foreach should be a sufficient hint to the parser; the keytype (int, string, whatever) makes the k look like a declaration like any other. 
> 
Nice!

> 
>>>C++ can do just about anything, wrap that functionality in a
>>>class, and tuck it away...
>>>
>>Actually, except in the sense of algorithmic completeness, there are a
>>lot of things that neither C++ nor any other statically bound language
>>can do. 
>>
> 
> Frankly, I've never used a dynamically linked language, I don't think. Can you give me a short example of a cool thing you can
> do with dynamic linking?
This one's a bit strange, and you don't mind wierd syntax, so how about:
var = eval(include(filename.suf).something(include(otherfile.suf))) + "qwert";

The reason that most AI work is done in the dynamic languages isn't only because Lisp is easier to parse (or Forth would be more popular) and it sure isn't speed.  It's because code can copy itself, modify the copy, and then transfer execution to the copy.  Even where parsing is a bit difficult (can you say Python) this ability is maintained.  This normally means that you need to cart around an interpreter to do the job, but not always.  Some Smalltalk dialects can export their meaning as C code, and some dialects of Lisp are directly compileable.

Now truthfully, this could also be done with C (assuming decent cooperation from the OS), but it's so difficult that nobody does it.  So C, etc., don't get used in the areas that need that kind of flexibility.  And Lisp, Python, Ruby, etc. don't get used where speed is needed, though there's a study that "proves" that Lisp code can execute as fast as C code.  (I think that what they actually end up comparing is code optimizers, but that's not what they claim to show... and what's wrong with saying "My code can be more highly optimized with automatic tools, so that for a sufficiently complex task it is faster than your code!", but that's not what they said.)
> 
>...
> 
> Okay. I freely admit to being unafraid of ugly constructs, and to having
> been thoroughly warped by using C++. I firmly believe that no matter how
> elegant the language can be, some damn fool is going to use two-space indents, asinine bracketing styles, uncommented state machines implemented
> via numeric literal cases in huge nested switches, or other constructs of
> equal value, and that I'll be forced to maintain their code. C++ is the
> least of my problems this week. :)
> 
> -RB
> 
I generally end up maintaining my own code.  So I guess that's not a problem for me (except when stupid editors convert my tabs into spaces, and won't let me say what columns the tabs are supposed to represent).

August 22, 2001

Charles Hixson wrote:
> 
> Russell Bornschlegel wrote:
> > Frankly, I've never used a dynamically linked language, I don't think. Can you give me a short example of a cool thing you can do with dynamic linking?
> This one's a bit strange, and you don't mind wierd syntax, so how about:
> var = eval(include(filename.suf).something(include(otherfile.suf))) +
> "qwert";
> 
> The reason that most AI work is done in the dynamic languages isn't only because Lisp is easier to parse (or Forth would be more popular) and it sure isn't speed.  It's because code can copy itself, modify the copy, and then transfer execution to the copy.  Even where parsing is a bit difficult (can you say Python) this ability is maintained.  This normally means that you need to cart around an interpreter to do the job, but not always.  Some Smalltalk dialects can export their meaning as C code, and some dialects of Lisp are directly compileable.
>
> Now truthfully, this could also be done with C (assuming decent cooperation from the OS), but it's so difficult that nobody does it.  So C, etc., don't get used in the areas that need that kind of flexibility.

Ah, okay, Perl has this also -- I've seen it, seen that it could be
cool, but outside of one cookbook use (a Perl program eval'ing a startup/rc
file written in Perl) I haven't tried it. Really it's dynamic code-
evaluation/dynamic compilation, as much as dynamic linking. I suppose I
could invoke a C compiler to build a DLL and then bind the DLL, in C/Win32,
but that's probably not really very time-efficient... :)

And not really supported by standard C, but with the proper OS support you can build a sequence of native machine code in a buffer and then execute that buffer. High performance 3D applications have been known to do this, for highly critical polygon-pushing paths.

Also, blah blah blah security blah blah blah executing arbitrary code from an external file blah blah blah. :)

-RB
August 22, 2001
Daniel Mantione wrote:
> Charles Hixson wrote:
> 
> 
>>Considering that bit arrays are an included part of the language, I fail
>>to understand why a set could not be implemented as a library.  Given
>>that the compiler would have all the necessary information, it should
>>(this might be optomistic) be able to optimize references to your
>>"SmallSet" class as inline calls.  So the problem is just syntax, which
>>is what I was discussing.  If that was answered, then I didn't
>>understand the answer.
>>
> 
> To optimize things well, you need compiler magic. A simple ...
> 
> if i in [0..9,18..20]
> 
> ... would never get optimized if the set was implemented in a library. So compiler would be able to see that it can optimize the bitmap away in this case.
> 
> About syntax, you would like that a set that is implemented with a btree in a library for example, programs same as the built in set. To be able to do that you need operator overloading, which is being heavily dicussed.
> 
> Daniel
> 

A library should provide many different varieties of data structure. Not just one.  You chose the flavor that fits your problem.  As to compound ranges (not to my mind the same thing as a set, but rather one thing that is sometimes handled by sets), I don't know what the best choice would be.  If there were a range primitive, then one might want to say:
if i in [0..9] or i in [18..20]

a bit clumsier, I admit.  Or one could define a class that implemented range handling in the way that one chose, as defined by the problem under consideration.

OTOH, I will admit that it would be nice if complex range constants were a recognized data type (which would imply complex range variables).  And that one of the more reasonable ways to implement that would be as a set.  But not the only reasonable way.  It's perfectly reasonable to have a complex range that can handle things like:
Range r = "Alpha" .. "Blueberry" union "Peanuts" ... "Zebras"

All you need for this is two vectors of string, call them lower and upper.  (The rest is a lot of work, but basically obvious.)  I've needed to do this a couple of times, and once I needed to deal with complex ranges of floating point numbers, and even needed to perform arithmetic on the ranges.  Consider what:
r -= "Spinach";
would mean.  (Defining what arithmetic and comparison ops mean when applied to ranges of floating point values is a bit interesting.  That is probably very domain specific.  But I needed to do it the time I needed ranges of floating point numbers [well, rationals was all I really needed, but it was easier to use floats].)

Here union, intersect, remove, etc. are the basic ops.  But these map in a relatively straightforward way onto the general data structure operations that I was proposing.  (Relatively.  It's certainly a less good fit than most.)

But I rarely end up dealing with simple numbers except as numbers, so *I* don't see much need for the Pascal kind of set.  Except occasionally.

August 23, 2001
"Daniel Mantione" <daniel@deadlock.et.tudelft.nl> wrote in message news:9lvkln$a05$1@digitaldaemon.com...
> > For the general case, are D's associative arrays good enough, Daniel?
> They are a cool feature, but how would you use them as a set replacement?

int[char] set;
char c;

if (c in set)
{
}


August 23, 2001
"Russell Bornschlegel" <kaleja@estarcion.com> wrote in message news:3B83EF19.D755A19@estarcion.com...
> The D spec is silent on the implementation details for
> associative arrays, but they'd likely be binary trees (ordered
> keys, lookup time normally O(log2(n))) or hashes (disordered keys,
> lookup time normally O(1)).

They're a mix of hash tables and binary trees. The implementation would be free to pick its own way of implementing them.


August 24, 2001

Walter wrote:
> 
> "Russell Bornschlegel" <kaleja@estarcion.com> wrote in message news:3B83EF19.D755A19@estarcion.com...
> > The D spec is silent on the implementation details for
> > associative arrays, but they'd likely be binary trees (ordered
> > keys, lookup time normally O(log2(n))) or hashes (disordered keys,
> > lookup time normally O(1)).
> 
> They're a mix of hash tables and binary trees. The implementation would be free to pick its own way of implementing them.

So the spec won't offer promises about key ordering?

-R