Jump to page: 1 2
Thread overview
New set class
Feb 25, 2007
Michiel
Feb 25, 2007
Bill Baxter
Feb 25, 2007
Michiel
Feb 26, 2007
Bill Baxter
Feb 26, 2007
Michiel
Feb 26, 2007
Frits van Bommel
Feb 26, 2007
Michiel
Feb 26, 2007
Frits van Bommel
Feb 26, 2007
Michiel
Feb 26, 2007
Frits van Bommel
Feb 26, 2007
Michiel
Feb 26, 2007
Stephan Diehl
Feb 26, 2007
Michiel
Feb 26, 2007
Stephan Diehl
Feb 26, 2007
Michiel
Feb 26, 2007
Frits van Bommel
Feb 26, 2007
Michiel
February 25, 2007
I have written a new class to represent sets, with several useful features. Uploaded for your review and free use. I don't know much about open source and licensing, but I request that if you use the class, that you keep my name in the author section of the documentation.

http://files.michiel.eu/set.d

Tell me what you think.

There still are some problems, though. I could use some help:

* There are still problems involving static arrays as the set type. The set literal creates a set of the exact type you specify. So if you use a string literal, it will be a set of static char arrays. It is only a sort of hack in the Set class that allows for the dynamic array equivalent to also be accepted in. What I really need is to implement the set literal function so it creates a set of the dynamic array type directly. But I've run into some problems, sometimes strangely involving char-encodings. (invalid UTF-8 sequence, printed to the terminal at runtime when I try to print the toString())

* When I enable unit testing, several things can go wrong. Sometimes I get a segmentation fault. Sometimes I get a program that doesn't terminate and eats all of the memory until I kill it.

-- 
Michiel
February 25, 2007
Michiel wrote:
> I have written a new class to represent sets, with several useful
> features. Uploaded for your review and free use. I don't know much about
> open source and licensing, but I request that if you use the class, that
> you keep my name in the author section of the documentation.
> 
> http://files.michiel.eu/set.d
> 
> Tell me what you think.

If you add an opApply with the signature
   int opApply(int delegate(uint i, inout Type) dg) { ... }

it will allow this kind of foreach too:
   foreach(i, e; myset) {
       ...
   }

Also I would just leave out opApplyReverse.  That way using it is a compile-time error.  Or if you're dead set on it, at least have it throw an exception.  I believe assert() is a no-op in release builds, meaning your program will fail in strange ways in release.  Actually I'm surprised that compiles without a return statement.

Finally
   Set!(T[0]) set(T...)(T elements)
why not just
   Set!(T) set(T)(T elements...) ??

Have you looked at the HashSet implementation in Tango for comparison?

--bb
February 25, 2007
Bill Baxter wrote:

>> I have written a new class to represent sets, with several useful features. Uploaded for your review and free use. I don't know much about open source and licensing, but I request that if you use the class, that you keep my name in the author section of the documentation.
>>
>> http://files.michiel.eu/set.d
>>
>> Tell me what you think.
> 
> If you add an opApply with the signature
>    int opApply(int delegate(uint i, inout Type) dg) { ... }
> 
> it will allow this kind of foreach too:
>    foreach(i, e; myset) {
>        ...
>    }

That's why I didn't do it. What meaning would it have for a set, since they have no index?

> Also I would just leave out opApplyReverse.  That way using it is a compile-time error.  Or if you're dead set on it, at least have it throw an exception.  I believe assert() is a no-op in release builds, meaning your program will fail in strange ways in release.  Actually I'm surprised that compiles without a return statement.

assert(0) is evaluated by D as a point in the code you can't ever reach. I got the idea from the opCmp example:

--------------------------------------
For some objects, testing for less or greater makes no sense. For these,
override opCmp() with:

class A
{
    int opCmp(Object o)
    {
	assert(0);	// comparison makes no sense
	return 0;
    }
}
--------------------------------------

I see that that has a return statement, but I don't see why it would need one. assert(0) always exists the function. I think it's a feature of the compiler that it recognizes this.

> Finally
>    Set!(T[0]) set(T...)(T elements)
> why not just
>    Set!(T) set(T)(T elements...) ??

You only use that syntax if T is an aggregate type. So the resulting set would actually be Set!(T[0]) anyway. I think I tried this before and had some problems, but I'm not certain now.

> Have you looked at the HashSet implementation in Tango for comparison?

I had not. Is this the one?

http://www.dsource.org/projects/tango/docs/current/tango.util.collection.HashSet.html

Though I haven't looked at the source code, I see it exposes many of its internal workings to the outside. When you are working with a set you don't want to know if it's using a hash table or a balanced tree. Just that it implements a set.

Though I'm sure that they put more work into it than I and that it's more efficient. I just sneakily used an associative array on the inside. Which uses its own hash table, if I'm not mistaken.

-- 
Michiel
February 26, 2007
Michiel wrote:
> Bill Baxter wrote:
> 
>>> I have written a new class to represent sets, with several useful
>>> features. Uploaded for your review and free use. I don't know much about
>>> open source and licensing, but I request that if you use the class, that
>>> you keep my name in the author section of the documentation.
>>>
>>> http://files.michiel.eu/set.d
>>>
>>> Tell me what you think.
>> If you add an opApply with the signature
>>    int opApply(int delegate(uint i, inout Type) dg) { ... }
>>
>> it will allow this kind of foreach too:
>>    foreach(i, e; myset) {
>>        ...
>>    }
> 
> That's why I didn't do it. What meaning would it have for a set, since
> they have no index?

It's useful to the user who wants to print out the items numbered or something.  Or they have a set of 10 items and they want to copy them to a preallocated 10-item list.  Or they want to make another associative array that maps numbers to the items in the set.  Or they want to do something with only the first five items in the set then if(i>5) break;.  Or they want to append a number to every item in the set.  Or they want to split the set into two smaller sets arbitrarily using if (i%2).

Yes the user can always declare their own damn counter, but it doesn't make your code much more complex, and it does make their lives a little bit easier.

> 
>> Also I would just leave out opApplyReverse.  That way using it is a
>> compile-time error.  Or if you're dead set on it, at least have it throw
>> an exception.  I believe assert() is a no-op in release builds, meaning
>> your program will fail in strange ways in release.  Actually I'm
>> surprised that compiles without a return statement.
> 
> assert(0) is evaluated by D as a point in the code you can't ever reach.
> I got the idea from the opCmp example:
> 

Interesting.  I didn't realize that.  That's really not obvious from the docs that that's what happens:

"""
The expression assert(0) is a special case; it signifies that it is unreachable code. Either AssertError is thrown at runtime if it is reachable, or the execution is halted (on the x86 processor, a HLT instruction can be used to halt execution). The optimization and code generation phases of compilation may assume that it is unreachable code.
"""  -- http://www.digitalmars.com/d/expression.html#AssertExpression

Nowhere does it say it that it is "unconditional regardless of debug/release flags".

This page talks about asserts but doesn't mention debug/release behavior either.
http://www.digitalmars.com/d/dbc.html

Actually I can't find where disabling asserts for release builds is documented, though I'm sure I've seen it somewhere before.  A quick test, however, reveals that you are indeed correct: "regular" asserts turn off in a release build, but assert(0) does not.

Well that's good to know.  Still, it generates a runtime error, when you really would prefer a compile-time error.   So I think it's a bad idea to start putting in lots of
   opFoo() { assert(0,"This method doesn't exist"); }
Once you start putting those in, where do you stop?  I'm sure there are other operators and functions that Set does not implement.

It's almost like comments along the lines of:
   // this is an integer that does the counting:
   int counter;

The fact that the method is not there is all the documentation you need.

But if you really want to do put such things in your code you could try this:
struct Set(T) {
    ...
    void opApplyReverse()() {static assert(0,"don't be silly");}
}

Since foo is a template member, no attempt is made to compile it unless it's actually called from somewhere.  If you do try to call it then you trip the static assert at compile-time.

>> Finally
>>    Set!(T[0]) set(T...)(T elements)
>> why not just
>>    Set!(T) set(T)(T elements...) ??
> 
> You only use that syntax if T is an aggregate type. So the resulting set
> would actually be Set!(T[0]) anyway. I think I tried this before and had
> some problems, but I'm not certain now.

I don't follow what you mean about "aggregate type".  The unittest uses int as the type.  I don't think 'int' is considered an aggregate.



>> Have you looked at the HashSet implementation in Tango for comparison?
> 
> I had not. Is this the one?
> 
> http://www.dsource.org/projects/tango/docs/current/tango.util.collection.HashSet.html

Yes.

> Though I haven't looked at the source code, I see it exposes many of its
> internal workings to the outside. When you are working with a set you
> don't want to know if it's using a hash table or a balanced tree. Just
> that it implements a set.

It also doesn't include set operations like you have.  (union, intersection, difference, symmetric difference).  And I agree with you that it looks like it exposes too much of its innards for most users.

> Though I'm sure that they put more work into it than I and that it's
> more efficient. I just sneakily used an associative array on the inside.
> Which uses its own hash table, if I'm not mistaken.

It looks to be based on code by Doug Lea, who wrote dlmalloc, one of the fastest implementations of malloc around.  So I suspect performance is pretty good.


Another thing you might want to add to your set class is opCat (~) and opCatAssign (~=).  Those are pretty commonly used in D for appending something to something else.  They'd just be synonyms for add and union.

--bb

February 26, 2007
Bill Baxter wrote:

>>> If you add an opApply with the signature
>>>    int opApply(int delegate(uint i, inout Type) dg) { ... }
>>>
>>> it will allow this kind of foreach too:
>>>    foreach(i, e; myset) {
>>>        ...
>>>    }
>>
>> That's why I didn't do it. What meaning would it have for a set, since they have no index?
> 
> It's useful to the user who wants to print out the items numbered or
> something.  Or they have a set of 10 items and they want to copy them to
> a preallocated 10-item list.  Or they want to make another associative
> array that maps numbers to the items in the set.  Or they want to do
> something with only the first five items in the set then if(i>5) break;.
>  Or they want to append a number to every item in the set.  Or they want
> to split the set into two smaller sets arbitrarily using if (i%2).
> 
> Yes the user can always declare their own damn counter, but it doesn't make your code much more complex, and it does make their lives a little bit easier.

I feel that if I added that syntax, it would imply that the counter has some sort of relevance to the set, which it really doesn't.

A counter that increments by one each iteration has real meaning for an array. So does a key-type for an associative array. But for a set... sorry, I'd just feel dirty adding it.

>>> Also I would just leave out opApplyReverse.  That way using it is a compile-time error.  Or if you're dead set on it, at least have it throw an exception.  I believe assert() is a no-op in release builds, meaning your program will fail in strange ways in release.  Actually I'm surprised that compiles without a return statement.
>>
>> assert(0) is evaluated by D as a point in the code you can't ever reach. I got the idea from the opCmp example:
>>
> 
> Interesting.  I didn't realize that.  That's really not obvious from the docs that that's what happens:
> 
> """
> The expression assert(0) is a special case; it signifies that it is
> unreachable code. Either AssertError is thrown at runtime if it is
> reachable, or the execution is halted (on the x86 processor, a HLT
> instruction can be used to halt execution). The optimization and code
> generation phases of compilation may assume that it is unreachable code.
> """  -- http://www.digitalmars.com/d/expression.html#AssertExpression
> 
> Nowhere does it say it that it is "unconditional regardless of debug/release flags".
> 
> This page talks about asserts but doesn't mention debug/release behavior
> either.
> http://www.digitalmars.com/d/dbc.html
> 
> Actually I can't find where disabling asserts for release builds is documented, though I'm sure I've seen it somewhere before.  A quick test, however, reveals that you are indeed correct: "regular" asserts turn off in a release build, but assert(0) does not.
> 
> Well that's good to know.  Still, it generates a runtime error, when you
> really would prefer a compile-time error.   So I think it's a bad idea
> to start putting in lots of
>    opFoo() { assert(0,"This method doesn't exist"); }
> Once you start putting those in, where do you stop?  I'm sure there are
> other operators and functions that Set does not implement.
> 
> It's almost like comments along the lines of:
>    // this is an integer that does the counting:
>    int counter;
> 
> The fact that the method is not there is all the documentation you need.
> 
> But if you really want to do put such things in your code you could try
> this:
> struct Set(T) {
>     ...
>     void opApplyReverse()() {static assert(0,"don't be silly");}
> }
> 
> Since foo is a template member, no attempt is made to compile it unless it's actually called from somewhere.  If you do try to call it then you trip the static assert at compile-time.

No, if you use a static assert, the compilation will fail as soon as you instantiate the class somewhere. Not if you try to use foreach_reverse. If it weren't a template class, the compiler would throw the error in any case.

You're right that a compiler error would be preferable to a runtime error, but simply leaving the function out doesn't give as clear an error message ("Error: cannot infer type for e"). It would be ideal if I could have both.

But well, I don't understand why unittests are done at runtime either.

>>> Finally
>>>    Set!(T[0]) set(T...)(T elements)
>>> why not just
>>>    Set!(T) set(T)(T elements...) ??
>>
>> You only use that syntax if T is an aggregate type. So the resulting set would actually be Set!(T[0]) anyway. I think I tried this before and had some problems, but I'm not certain now.
> 
> I don't follow what you mean about "aggregate type".  The unittest uses int as the type.  I don't think 'int' is considered an aggregate.

Well, That's what the 'void func(T name...)' syntax is for. If T is an aggregate type (array, class, struct, etc.) you can either supply that type to the func, or all of its members separated by comma's.

I suspect that you meant this one: 'void func(T name, ...)', which is something different. I haven't tried it, and I will.

> Another thing you might want to add to your set class is opCat (~) and opCatAssign (~=).  Those are pretty commonly used in D for appending something to something else.  They'd just be synonyms for add and union.

Good idea. I'll add them.

-- 
Michiel
February 26, 2007
Michiel wrote:
> Bill Baxter wrote:
> 
[snip suggestion about counters in set-foreach]
> I feel that if I added that syntax, it would imply that the counter has
> some sort of relevance to the set, which it really doesn't.
> 
> A counter that increments by one each iteration has real meaning for an
> array. So does a key-type for an associative array. But for a set...
> sorry, I'd just feel dirty adding it.

I agree with you here, sets don't have indexes for their elements, they are unordered.

>> But if you really want to do put such things in your code you could try
>> this:
>> struct Set(T) {
>>     ...
>>     void opApplyReverse()() {static assert(0,"don't be silly");}
>> }
>>
>> Since foo is a template member, no attempt is made to compile it unless
>> it's actually called from somewhere.  If you do try to call it then you
>> trip the static assert at compile-time.
> 
> No, if you use a static assert, the compilation will fail as soon as you
> instantiate the class somewhere. Not if you try to use foreach_reverse.
> If it weren't a template class, the compiler would throw the error in
> any case.

The _member_ is _also_ a template here (with 0 parameters, but a template nonetheless). So compilation will only fail if that template is instantiated, which will happen only if foreach_reverse is used (or it's explicitly called, of course).

Here, let me show you (some modifications made to above code)
---
urxae@urxae:~/tmp$ cat test.d
import std.stdio;

struct Set(T) {
    void opApplyReverse()(int delegate(inout T)) { static assert(0,"don't be silly");}
}

void main()
{
    Set!(int) x;
    version(ForeachReverse)
        foreach_reverse(int e; x) {}
}
urxae@urxae:~/tmp$ dmd test.d && ./test
gcc test.o -o test -m32 -lphobos -lpthread -lm -Xlinker -L/home/urxae/opt/dmd/lib
urxae@urxae:~/tmp$ # Note: no error above
urxae@urxae:~/tmp$ dmd -version=ForeachReverse test.d
test.d(4): static assert  (0) is false, "don't be silly"
---

The static assert is only triggered with -version=ForeachReverse.

> You're right that a compiler error would be preferable to a runtime
> error, but simply leaving the function out doesn't give as clear an
> error message ("Error: cannot infer type for e"). It would be ideal if I
> could have both.

No, "cannot infer type for e" is indeed a bit unclear, it doesn't explain *why* it happened.

> But well, I don't understand why unittests are done at runtime either.

Yeah, there's been some discussion about that. A lot of people would like them to be ran after compiling & linking, but so far no change has been made to allow this to be done.

>>>> Finally
>>>>    Set!(T[0]) set(T...)(T elements)
>>>> why not just
>>>>    Set!(T) set(T)(T elements...) ??
>>> You only use that syntax if T is an aggregate type. So the resulting set
>>> would actually be Set!(T[0]) anyway. I think I tried this before and had
>>> some problems, but I'm not certain now.
>> I don't follow what you mean about "aggregate type".  The unittest uses
>> int as the type.  I don't think 'int' is considered an aggregate.
> 
> Well, That's what the 'void func(T name...)' syntax is for. If T is an
> aggregate type (array, class, struct, etc.) you can either supply that
> type to the func, or all of its members separated by comma's.
> 
> I suspect that you meant this one: 'void func(T name, ...)', which is
> something different. I haven't tried it, and I will.

I suspect he may have meant the "void func(T)(T[] name...)" syntax (or in this case, "Set!(T) set(T)(T[] elements...)"). That would let the user pass as many arguments as he wants, while presenting them as a nice typesafe array to the library.
By the way, "void func(int name...)" is also perfectly valid syntax but only allows one argument to be specified.
This is all explained in the section "Typesafe Variadic Functions" of http://www.digitalmars.com/d/function.html.
February 26, 2007
Michiel wrote:
> I have written a new class to represent sets, with several useful
> features. Uploaded for your review and free use. I don't know much about
> open source and licensing, but I request that if you use the class, that
> you keep my name in the author section of the documentation.
> 
> http://files.michiel.eu/set.d
> 
> Tell me what you think.
> 

excelent. I somewhat missed some subset operations. If it were up to me, I'd use the opCmp operator for this (just like they do in python).
February 26, 2007
Stephan Diehl wrote:

>> I have written a new class to represent sets, with several useful features. Uploaded for your review and free use. I don't know much about open source and licensing, but I request that if you use the class, that you keep my name in the author section of the documentation.
>>
>> http://files.michiel.eu/set.d
>>
>> Tell me what you think.
>>
> 
> excelent. I somewhat missed some subset operations. If it were up to me, I'd use the opCmp operator for this (just like they do in python).

You're right! I missed the subset operations. :)

Unfortunately the opCmp operator is taken. Implementing that correctly allows a set to be a key-type for an associative array (or the type of another set).

I can't make it double as a subset operation, because sets would have to be declared equal if one is not subset of the other. Even if they are not, indeed, equal. This would break their proper definition.

For a moment I was tempted to use the shifting operators, because
they're only one character apart. But I don't think I like it. They
already have a non-shifter use (right, Walter? :)). A third would be
confusing. Besides, <<= (which would mean: non-strict subset) is
recognized as an op= (or: opAssign) operator.

I also considered the 'in' operator for non-strict subset, but it's already used as a membership operator, and overloading it for this would also be confusing.

I will just create a function for it.

-- 
Michiel
February 26, 2007
Frits van Bommel wrote:

>>> But if you really want to do put such things in your code you could try
>>> this:
>>> struct Set(T) {
>>>     ...
>>>     void opApplyReverse()() {static assert(0,"don't be silly");}
>>> }
>>>
>>> Since foo is a template member, no attempt is made to compile it unless it's actually called from somewhere.  If you do try to call it then you trip the static assert at compile-time.
>>
>> No, if you use a static assert, the compilation will fail as soon as you instantiate the class somewhere. Not if you try to use foreach_reverse. If it weren't a template class, the compiler would throw the error in any case.
> 
> The _member_ is _also_ a template here (with 0 parameters, but a template nonetheless). So compilation will only fail if that template is instantiated, which will happen only if foreach_reverse is used (or it's explicitly called, of course).
> 
> Here, let me show you (some modifications made to above code)
> <SNIP>

Interesting trick. I missed the extra (). One problem, though. If I use
it like this:

foreach_reverse(e ; x) { }

Not specifying the element type (int). I get the "cannot infer type" error, instead of our custom error message. Why can't the compiler infer the type? Isn't it right in its face?

> I suspect he may have meant the "void func(T)(T[] name...)" syntax (or
> in this case, "Set!(T) set(T)(T[] elements...)"). That would let the
> user pass as many arguments as he wants, while presenting them as a nice
> typesafe array to the library.

I did try some variations of that. I couldn't make it work quite the way I wanted.

-- 
Michiel
February 26, 2007
Michiel wrote:
> Stephan Diehl wrote:
> 
>>> I have written a new class to represent sets, with several useful
>>> features. Uploaded for your review and free use. I don't know much about
>>> open source and licensing, but I request that if you use the class, that
>>> you keep my name in the author section of the documentation.
>>>
>>> http://files.michiel.eu/set.d
>>>
>>> Tell me what you think.
>>>
>> excelent. I somewhat missed some subset operations. If it were up to me,
>> I'd use the opCmp operator for this (just like they do in python).
> 
> You're right! I missed the subset operations. :)
> 
> Unfortunately the opCmp operator is taken. Implementing that correctly
> allows a set to be a key-type for an associative array (or the type of
> another set).
> 
> I can't make it double as a subset operation, because sets would have to
> be declared equal if one is not subset of the other. Even if they are
> not, indeed, equal. This would break their proper definition.
> 
> For a moment I was tempted to use the shifting operators, because
> they're only one character apart. But I don't think I like it. They
> already have a non-shifter use (right, Walter? :)). A third would be
> confusing. Besides, <<= (which would mean: non-strict subset) is
> recognized as an op= (or: opAssign) operator.
> 
> I also considered the 'in' operator for non-strict subset, but it's
> already used as a membership operator, and overloading it for this would
> also be confusing.
> 
> I will just create a function for it.
> 

the python people have 'set' and 'frozenset' (http://docs.python.org/lib/types-set.html). The main difference between these two is basicly, that a 'set' can't be the key to a dict (and therefore not being a member of a set. To make it short: every member of a set must be immutable, or, to be more technical, must always produce the same hash value. This really makes sense, if one thinks about it a bit (and one wants to model sets in the more mathematical sense)
« First   ‹ Prev
1 2