Jump to page: 1 2
Thread overview
imports and a data structure (any critique welcome)
Dec 27, 2013
Jonathan
Dec 27, 2013
Timon Gehr
Dec 27, 2013
bearophile
Dec 27, 2013
Timon Gehr
Dec 27, 2013
Jonathan
Dec 27, 2013
bearophile
Dec 27, 2013
Timon Gehr
Dec 27, 2013
bearophile
Dec 27, 2013
Marco Leise
Dec 27, 2013
Timon Gehr
Dec 27, 2013
bearophile
Dec 27, 2013
Jonathan
Dec 27, 2013
Timon Gehr
Dec 27, 2013
John Colvin
December 27, 2013
I come from Haskell, so please excuse any functional programming idiosyncracies I have :)

In Haskell, I have a datatype for representing so called terms which looks like:

    data Term = Var Char | Op Char [Term]

i.e. a Term is a variable (denoted by an Int) or an operation (whose name is also denoted by an int), applied to a list of its arguments.  For example,

    f(g(x,y),a(),x) is represented by Op 'f' [Op 'g' [Var 'x',Var 'y'],Op 'a' [], Var 'x]

Now, the reason I am writing in D is twofold.  First, I want to learn the D language, and second I really need pointers to reduce certain complexities in the operations (Haskell does not have observable sharing).

Okay, so the first thing I need to do is represent this as a data structure in D.  My choices are to represent the structure as a class or as a struct.  I choose struct because it is simpler (in terms of what is going on), and because the entire program takes one struct and makes changes to it.

Here is my first attempt at the datastructure.

struct Term {
	enum NodeType
	{
		Var, Op
	}
	NodeType node;
	union TermBody
	{
		int VarName;
		struct OpSymbol
		{
			int ConsName;	
			Term*[] Subterms;
		}
		OpSymbol terms;
	}
	TermBody term;
	this(int nodeValue)
	{
		node = NodeType.Var;
		term.VarName = nodeValue;
	}
	this(int OpSymb,Term*[] otherTerms)
	{
		node = NodeType.Op;
		term.terms.ConsName = OpSymb;
		term.terms.Subterms = otherTerms;
	}
}

I strongly encourage you guys to try to destroy my datastructure and critique anything and everything about it!!

But for now I will move on.  As a sanity check, I want to make sure I can print this datastructure out.

string termToString(Term* term)
{
	if ((*term).node == Term.NodeType.Var)
	{
		return [convertNum((*term).term.VarName)];
	}
	else
	{
		string retString = [convertNum((*term).term.terms.ConsName),'('];
		retString ~= termToString((*term).term.terms.Subterms[0]);
		for(size_t i = 1; i < (*term).term.terms.Subterms.length;i++)
		{
			retString ~= ',';
			retString ~= termToString((*term).term.terms.Subterms[i]);
		}
		retString ~= ')';
		return retString;
	}
}

For now, convertNum is rather silly,

char convertNum(int x){
	switch(x)
	{
		case 1: return 'x';
		case 2: return 'y';
		case 3: return 'z';
		case 4: return 'f';
		case 5: return 'g';
		case 6: return 'h';
		default: return 'e';
	}
}


Okay, here is some code to produce a simple example of a datastructure:

Term makeExTerm(){
	Term var1 = Term(1);//x
	Term var2 = Term(2);//y
	Term var3 = Term(3);//z
	Term fxx = Term(4, [&var1,&var1]);//f(x,x)
	Term gfxxy = Term(5, [&fxx,&var2]);//g(f(x,x),y)
	Term hzg = Term(6, [&gfxxy,&gfxxy]);//h(g(f(x,x),y),g(f(x,x),y))
	return hzg;
}

Using the above, I wrote a test program,

void main(string[] args)
{
	Term exTerm = makeExTerm();
	string printable = termToString(&exTerm);
	writeln(printable);
	Term var = Term(219);
	(*exTerm.term.terms.Subterms[0]).term.terms.Subterms[0] = &var;
        // it is now  (and should be) h(g(e,y),g(e,y))
	string printable2 = termToString(&exTerm);
	writeln(printable2);
}

If I put this in a file, compile it (with ldc or gdc), and run it, everything works just superbly.

However, if I extract everything except main to a different file and import that file, compilation passes, but running gives a segmentation fault.  I ran the program in a debugger, and the first line of main passes no problem, we are able to construct the term.  I checked the memory, and everything seems to work just fine.  However, when I step into the second line of main, the call to termToString, I get the segfault.

Can anyone tell me what is going on?
December 27, 2013
On 12/27/2013 01:40 AM, Jonathan wrote:
>
> Term makeExTerm(){
>      Term var1 = Term(1);//x
>      Term var2 = Term(2);//y
>      Term var3 = Term(3);//z
>      Term fxx = Term(4, [&var1,&var1]);//f(x,x)
>      Term gfxxy = Term(5, [&fxx,&var2]);//g(f(x,x),y)
>      Term hzg = Term(6, [&gfxxy,&gfxxy]);//h(g(f(x,x),y),g(f(x,x),y))
>      return hzg;
> }
>
> Using the above, I wrote a test program,
>...
>
> Can anyone tell me what is going on?

You are taking the address of stack-allocated variables. [1]
Accessing the terms after the function has returned results in undefined behaviour. In the case where it worked you just were lucky.

You may allocate your terms on the heap using 'new':
---
Term* makeExTerm(){
    auto var1 = new Term(1);//x
    auto var2 = new Term(2);//y
    auto var3 = new Term(3);//z
    auto fxx = new Term(4, [var1,var1]);//f(x,x)
    auto gfxxy = new Term(5, [fxx,var2]);//g(f(x,x),y)
    auto hzg = new Term(6, [gfxxy,gfxxy]);//h(g(f(x,x),y),g(f(x,x),y))
    return hzg;
}

void main(string[] args){
    auto exTerm = makeExTerm();
    string printable = termToString(exTerm);
    writeln(printable);
    exTerm.term.terms.Subterms[0].term.terms.Subterms[0] = new Term(219);
    // it is now  (and should be) h(g(e,y),g(e,y))
    string printable2 = termToString(exTerm);
    writeln(printable2);
}
---

prints:
---
h(g(f(x,x),y),g(f(x,x),y))
h(g(e,y),g(e,y))
---

[1] This would be disallowed in the memory safe subset. You might want to put '@safe:' on top of your files.
December 27, 2013
Jonathan:

> I come from Haskell, so please excuse any functional programming idiosyncracies I have :)

Welcome. You are the first Haskell programmer that I know willing to learn and try D :-)


> In Haskell, I have a datatype for representing so called terms which looks like:
>
>     data Term = Var Char | Op Char [Term]
>
> i.e. a Term is a variable (denoted by an Int) or an operation (whose name is also denoted by an int), applied to a list of its arguments.  For example,
>
>     f(g(x,y),a(),x) is represented by Op 'f' [Op 'g' [Var 'x',Var 'y'],Op 'a' [], Var 'x]

In almost-theory D should allow code like:


import std.stdio, std.typecons, std.variant;

alias Term = Algebraic!(string, Tuple!(string, This[]));

void main() {
    const expr = Term(tuple("f", [Term(tuple("g", ["x".Term, "y".Term])), Term(tuple("a", [])), "x".Term]));
}


In practice std.variant module needs a This defined as "struct This {}", plus another improvement to support recursive types better, so that code doesn't work.

That code also can't work because the second type of the "tuple("a", [])" literal is void[], so it's not compatible.

I am not sure a good enough library-defined Algebraic type can be defined.


> Okay, so the first thing I need to do is represent this as a data structure in D.  My choices are to represent the structure as a class or as a struct.  I choose struct because it is simpler (in terms of what is going on), and because the entire program takes one struct and makes changes to it.

Raw pointers are like razors, they can be useful, but they should must also handled with a good amount of discipline and safeties added by you and your experience, otherwise you cut yourself often.

For a first version of your code you can try to use class instances instead, that are easer to handle (especially in garbage-collected language. But don't forget to add to functions all the pre-conditions to assert the class references are not null).

I see tons of things that can be improved in your code. Like the switch, "(*term).node", the lack of immutable/const/pure/nothrow (you should be able to write good enough functional code in D), but better to think about the larger things first.

Bye,
bearophile
December 27, 2013
On 12/27/2013 02:49 AM, bearophile wrote:
>
> In almost-theory D should allow code like:
>
>
> import std.stdio, std.typecons, std.variant;
>
> alias Term = Algebraic!(string, Tuple!(string, This[]));
>
> void main() {
>      const expr = Term(tuple("f", [Term(tuple("g", ["x".Term,
> "y".Term])), Term(tuple("a", [])), "x".Term]));
> }
>
>
> In practice std.variant module needs a This defined as "struct This {}",
> plus another improvement to support recursive types better, so that code
> doesn't work.
>
> That code also can't work because the second type of the "tuple("a",
> [])" literal is void[], so it's not compatible.
>
> I am not sure a good enough library-defined Algebraic type can be defined.

mixin ADT!q{ Term: Var char | Op char Term[] };

void main(){
    const expr = Op('f', [Op('g',[Var('x'), Var('y')]), Op('a',[]), Var('x')]);
}
December 27, 2013
Thanks to everyone that has replied.

@Timon Gehr:
Thank you.  In reply to [1]: that is an interesting issue that I don't really understand right now.  Can you explain exactly where I invoke undefined behavior?  It was my understanding that structs are passed and returned by value, and the assignment
  x = some struct
makes a copy of some struct and stores it in x.  Perhaps I don't understand the type system of D?  If a function returns T where T is some struct, does the function result in a struct?  I am having trouble finding these subtleties in the documentation (but I also acknowledge that this is probably my fault and not the documentation's).

Does auto do a simple type inference to find out that Term* should be there?  Is there an advantage to using auto?

I can't seem to put @safe: at the top of the file -- is it the same as compiling with -safe.  Is there a way to add this as a source annotation?

Is ! an operator in e.g. ADT!q ...

What is the mixin bit doing?  Can you point me towards any tutorials, or should I buy the book?

@bearophile:
Thank you for the welcome.  That is odd; I find that D shares a philosophy closer to Haskell than many of the "lower level" languages like C.

So let me just say, I really do need to build a structure that can have a shared reference to a substructure.  I will be rewriting these terms, and the problem is that e.g.

    f(t) -> g(t,t)

I don't want to copy t, but store a reference to t, so that if t can be re-written, it will only be rewritten once.  However, this is really *not* an algebraic type since sharing is not algebraic, so I figured that this would not be a good solution.

The object oriented approach would be really nice.  Since objects are passed by reference, I should get sharing for free.  The only reason I didn't use an object oriented approach off the bat is that I don't think I completely understand when to use a struct versus an object.  Using structs seem to fit the story better, because the structure isn't really emitting any functionality, I just want to perform some functions to it.  So using structs feel like a better fit.  Feel free to correct me here.  What are some of the drawbacks to using object.  How much more time overhead is there once the object is created?

You are right, I probably look like an idiomless noob!  I totally am!  Hopefully an understanding of the appropriate keywords to use and where will come (there is quite a lot of syntax in D).  For now, can you tell me what I could do better about the switch statement and what is wrong with (*term).node?
December 27, 2013
Jonathan:

> Thank you.  In reply to [1]: that is an interesting issue that I don't really understand right now.  Can you explain exactly where I invoke undefined behavior?

It can be shown with this function:

int*[] foo() {
    int x;
    int*[] arr = [&x];
    return arr;
}


Here the value 'x' is allocated on the stack, in the stack frame of the function foo(). arr is a heap-allocated array, so you can return it safely from foo. But arr contains a pointer to x, that points still to the stack frame of foo(). When foo() ends, the stack space used by the stack frame of foo gets reused for other purposes, so now the pointer inside the array arr points to data unrelated to the original x, essentially garbage.


> It was my understanding that structs are passed and returned by value, and the assignment
>   x = some struct
> makes a copy of some struct and stores it in x.

This is right. Bit the problem is different.


> Does auto do a simple type inference to find out that Term* should be there?

Right.


> Is there an advantage to using auto?

It makes writing the code faster, the code less cluttered, and sometimes the type is "obvious" for the person that reads the code. Just don't abuse it.


> I can't seem to put @safe: at the top of the file -- is it the same as compiling with -safe.  Is there a way to add this as a source annotation?

There is a writeln in the main, so you have to use:

void main(string[] args) @system {

But still the code doesn't compile:

temp.d(30): Error: field TermBody.terms cannot be accessed in @safe code because it overlaps with a pointer
...

So @safe can't be used here.


> Is ! an operator in e.g. ADT!q ...

! is the "not" operator, but there it's used to instantiate a template.


> What is the mixin bit doing?

It probably mixes-in (statically) some code generated at compile-time.


> Can you point me towards any tutorials, or should I buy the book?

The book is useful if you want to get serious about learning D. Otherwise the online docs, Rosettacode and the book by a person that posts often in D.learn could suffice.


> That is odd; I find that D shares a philosophy closer to Haskell than many of the "lower level" languages like C.

In D you can write functional-style code, look at Rosettacode D entries for many examples. (Also regarding your first post, D1 language used to have literary programming built-in, as GHC-Haskell, but this feature was dropped in D2).


> However, this is really *not* an algebraic type since sharing
> is not algebraic,

If you share something that is immutable, and you assure there are no null pointers, then the fact that is shared is transparent, so why is it not algebraic?


>  The only reason I didn't use an object oriented approach off the bat is that I don't think I completely understand when to use a struct versus an object.  Using structs seem to fit the story better, because the structure isn't really emitting any functionality, I just want to perform some functions to it.  So using structs feel like a better fit.  Feel free to correct me here.  What are some of the drawbacks to using object.  How much more time overhead is there once the object is created?

Class instances have some overhead, +2 words of memory, plus a bit of time to initialize those fields. Class instances allocated on the heap are a bit simpler to use compared to structs handled by pointers.


> You are right, I probably look like an idiomless noob!  I totally am!

Don't worry, you are going to learn.


> (there is quite a lot of syntax in D).

D is a system language designed to be simpler to use, and it tries to be safer, and it tries to have many of the most useful tools. All this leads to a complex language. And the syntax reflects the varied semantics of all the things it contains. It's not just syntax sugar of the same bits of lambda calculus.


> For now, can you tell me what I could do better about the switch statement

That switch can become an array. Or you can use enumerations with "final switch", and so on.


> and what is wrong with (*term).node?

D is a bit smarter than C, so "term.node" suffices.

Bye,
bearophile
December 27, 2013
On 12/27/2013 05:54 AM, Jonathan wrote:
> Thanks to everyone that has replied.
>
> @Timon Gehr:
> Thank you.  In reply to [1]: that is an interesting issue that I don't
> really understand right now.  Can you explain exactly where I invoke
> undefined behavior?  It was my understanding that structs are passed and
> returned by value, and the assignment
>    x = some struct
> makes a copy of some struct and stores it in x.  Perhaps I don't
> understand the type system of D?

The original code takes the address of variables that are allocated on the stack. Function-local variables (fortunately/unfortunately) have limited lifetime unless they are closed over. If their address is dereferenced after they have gone out of scope, you are accessing locations on the stack that may have been populated with new data of different type in the meantime. Eg. any usage of the '&' operator in the original code is wrong, because the addresses are part of the structure that the function returns.

> If a function returns T where T is
> some struct, does the function result in a struct?  I am having trouble
> finding these subtleties in the documentation (but I also acknowledge
> that this is probably my fault and not the documentation's).
> ...

The documentation can indeed be a little sparse. This may help you:
http://ddili.org/ders/d.en/index.html
(I have not read a lot of it, but it appears to put special focus on subtleties as it is targeted also at programming novices.)

> Does auto do a simple type inference to find out that Term* should be
> there?

No, auto is just a place holder used to show that what follows is a declaration. What is important to invoke type deduction is that there is no type specified. (It indeed is very simple, all it does is copying the type of the initializer.)

> Is there an advantage to using auto?
> ...

Less redundancy. Eg. I would have been able to remove the undefined behaviour faster if type deduction was already in use.

> I can't seem to put @safe: at the top of the file

Yes, I missed the union. (It is not generally a memory safe construct.)

> -- is it the same as compiling with -safe.

I think this compiler switch no longer exists.

> Is there a way to add this as a source annotation?
> ...

Yes, @safe: at the top of the file is the source annotation.

> Is ! an operator in e.g. ADT!q ...
> ...

It is notation for template instantiation.
http://dlang.org/template.html
http://ddili.org/ders/d.en/templates.html
http://ddili.org/ders/d.en/templates_more.html

> What is the mixin bit doing?

It imports the declarations within the template body into the current scope.

> Can you point me towards any tutorials, or  should I buy the book?
> ...

http://ddili.org/ders/d.en/mixin.html
http://dlang.org/template-mixin.html
I don't think the book discusses mixins in depth.

> @bearophile:
> Thank you for the welcome.  That is odd;

Maybe he just does not know a lot of Haskell programmers. :o)

> I find that D shares a
> philosophy closer to Haskell than many of the "lower level" languages
> like C.
> ...

I guess some missing type system features can turn off Haskell programmers. Most importantly, the type system is monomorphic. (Templates cover some use cases of a polymorphic type system.)

> So let me just say, I really do need to build a structure that can have
> a shared reference to a substructure.  I will be rewriting these terms,
> and the problem is that e.g.
>
>      f(t) -> g(t,t)
>
> I don't want to copy t, but store a reference to t, so that if t can be
> re-written, it will only be rewritten once.  However, this is really
> *not* an algebraic type since sharing is not algebraic, so I figured
> that this would not be a good solution.
>
> The object oriented approach would be really nice.  Since objects are
> passed by reference, I should get sharing for free.  The only reason I
> didn't use an object oriented approach off the bat is that I don't think
> I completely understand when to use a struct versus an object.

Objects are often used for convenient virtual function tables, for reference semantics, and sometimes to emulate closed sum types in a memory safe fashion. structs are typically used for more manual control and for implementing types with (part-wise) value semantics.

> Using
> structs seem to fit the story better, because the structure isn't really
> emitting any functionality, I just want to perform some functions to
> it.

Some might claim that the structure emits visitor functionality. :o)

> So using structs feel like a better fit.  Feel free to correct me
> here.  What are some of the drawbacks to using object.

They may waste some memory.

> How much more time overhead is there once the object is created?
> ...

Virtual function calls have some overhead compared to static function calls, but it is less bad on x86/x64 than on some other architectures. How performance-critical is your application?

> You are right, I probably look like an idiomless noob!  I totally am!
> Hopefully an understanding of the appropriate keywords to use and where
> will come (there is quite a lot of syntax in D). For now, can you tell
> me what I could do better about the switch statement

I don't know what he was targeting at, but convertNum could be written more compactly as:
char convertNum(int x){
    return "exyzfgh"[x<$?x:0];
}

> and what is wrong with (*term).node?

The explicit dereference is redundant. term.node suffices.
December 27, 2013
Timon Gehr:

> mixin ADT!q{ Term: Var char | Op char Term[] };
>
> void main(){
>     const expr = Op('f', [Op('g',[Var('x'), Var('y')]), Op('a',[]), Var('x')]);
> }

Where is ADT defined? (And why isn't it a Phobos patch on GitHub?)


> convertNum could be written more compactly as:
> char convertNum(int x){
>     return "exyzfgh"[x<$?x:0];

That's too much compact :-) Humans put a space around operators, and sometimes some parentheses don't hurt:

char convertNum(in int x) pure nothrow
in {
    assert(x >= 0);
} body {
    return "exyzfgh"[(x < $) ? x : 0];
}

Bye,
bearophile
December 27, 2013
Am Fri, 27 Dec 2013 13:45:10 +0000
schrieb "bearophile" <bearophileHUGS@lycos.com>:

> Timon Gehr:
> 
> > mixin ADT!q{ Term: Var char | Op char Term[] };
> >
> > void main(){
> >     const expr = Op('f', [Op('g',[Var('x'), Var('y')]),
> > Op('a',[]), Var('x')]);
> > }
> 
> Where is ADT defined? (And why isn't it a Phobos patch on GitHub?)
> 
> 
> > convertNum could be written more compactly as:
> > char convertNum(int x){
> >     return "exyzfgh"[x<$?x:0];
> 
> That's too much compact :-) Humans put a space around operators, and sometimes some parentheses don't hurt:
> 
> char convertNum(in int x) pure nothrow
> in {
>      assert(x >= 0);
> } body {
>      return "exyzfgh"[(x < $) ? x : 0];
> }
> 
> Bye,
> bearophile

Before you go _that_ far, just write:

  char convertNum(in size_t x) pure nothrow
  {
      return "exyzfgh"[(x < $) ? x : 0];
  }

:-)

-- 
Marco

December 27, 2013
On 12/27/2013 02:45 PM, bearophile wrote:
> Timon Gehr:
>
>> mixin ADT!q{ Term: Var char | Op char Term[] };
>>
>> void main(){
>>     const expr = Op('f', [Op('g',[Var('x'), Var('y')]), Op('a',[]),
>> Var('x')]);
>> }
>
> Where is ADT defined?

https://d.puremagic.com/issues/show_bug.cgi?id=10431

> (And why isn't it a Phobos patch on GitHub?)
> ...

https://d.puremagic.com/issues/show_bug.cgi?id=11558

> That's too much compact :-)  Humans put a space around
> operators,

Feel free to reformat, but implying that you are somehow ethically superior and that I am not human seems to go a little far. :o)

> and sometimes some parentheses don't hurt:

They hurt there because they add noise and don't help.
« First   ‹ Prev
1 2