Thread overview
Translating Modula2 into D: variant records, pointer
Jan 09, 2007
BLS
Jan 09, 2007
Frits van Bommel
Jan 09, 2007
BLS
Jan 09, 2007
Frits van Bommel
Jan 09, 2007
BLS
Jan 09, 2007
Frits van Bommel
Jan 09, 2007
BLS
Jan 10, 2007
BLS
January 09, 2007
Translating Modula2 into D: Subject: variant records,pointer

Hi I have some Modula2 code which I would like to translate in D.
I often have to use something like that :

MODULE snippet

  FROM SYSTEM IMPORT TSIZE;
  FROM Heap IMPORT ALLOCATE;

  TYPE  NodePtr = POINTER TO Node;
        HeadPtr = POINTER TO Header;

        (* variant record *)
        Node = RECORD suc, alt: NodePtr;
         CASE terminal: BOOLEAN OF
           TRUE:  tsym: CHAR |
           FALSE: nSym: HeadPtr
         END
       END;

       Header = RECORD sym: CHAR;
                  entry: NodePtr;
                  suc: HeadPtr
                END;


  VAR list, sentinel: HeadPtr

  (* JUST A PROCEDURE FRAGMENT *)
  PROCEDURE Find(s: CHAR; VAR h: HeadPtr);
    VAR h1: HeadPtr;
  BEGIN
    h1 := list;
    sentinel^.sym := s;

    ALLOCATE(sentinel, TSIZE(Header));

    h := h1;
    (* etc. *°

  END
So: How do I have to implement this snippet in D ?
Many thanks !!!! in advance; Bjoern
January 09, 2007
"BLS" <Killing_Zoe@web.de> wrote in message news:eo04h3$1us4$1@digitaldaemon.com...
> Translating Modula2 into D: Subject: variant records,pointer
>
> Hi I have some Modula2 code which I would like to translate in D.
> I often have to use something like that :
> ...
> So: How do I have to implement this snippet in D ?
> Many thanks !!!! in advance; Bjoern

Something like:

module snippet;

struct Node
{
    Node* suc;
    Node* alt;
    bool terminal;

    union
    {
        char tsym;
        Header* nSym;
    }
}

struct Header
{
    char sym;
    Node* entry;
    Header* suc;
}

Header* list;
Header* sentinel;

void Find(char s, inout Header* h)
{
    Header* h1 = list;
    sentinel.sym = s;

    sentinel = new Header;

    h = h1;
    // etc.
}

One thing I'm not real sure on is the variant record.  The closest thing I think in D is a union, but nothing prevents you from accessing either tsym or nSym in Node at any time.  I guess in Modula2, if 'terminal' is true, you can only access tsym, and otherwise you can only access nSym?


January 09, 2007
Jarrett Billingsley wrote:
> One thing I'm not real sure on is the variant record.  The closest thing I think in D is a union, but nothing prevents you from accessing either tsym or nSym in Node at any time.  I guess in Modula2, if 'terminal' is true, you can only access tsym, and otherwise you can only access nSym? 

I think you can mostly fake this in D using private (differently-named) union members and property methods that assert if 'terminal' has the wrong value.
You can even make it so that the original names directly alias the private names in a non-debug build, and only use the checking version in a debug build[1].

Limitations:
* You can still access the private members (under the different name) from within the same module.
* You can't use operators with side-effects on the members (except for assignment of course). So no ++, +=, --, -=, ~=, etc.


[1]: Just make sure the code compiles in debug builds if you do this, because of the second limitation above (since it doesn't apply to the suggested non-debug version).
January 09, 2007
Hi Frits,
> I think you can mostly fake this in D using private (differently-named) union members and property methods that assert if 'terminal' has the wrong value.
> You can even make it so that the original names directly alias the private names in a non-debug build, and only use the checking version in a debug build[1].
Can you please offer an example based on ...
Modula

TYPE  NodePtr = POINTER TO Node;
        Node = RECORD suc, alt: NodePtr;
         CASE terminal: BOOLEAN OF
           TRUE:  tsym: CHAR |
           FALSE: nSym: HeadPtr
         END
       END;


----------------------------------
Do you mean something similar .... ?

Pascal

TYPE
  r2_rec = RECORD CASE INTEGER OF
  1:
    (e1: INTEGER;
     e2: INTEGER);
  2:
    (e3: REAL);
  END;


C/C++

typedef union {
  struct {
    int e1;
    int e2;
  } v1;
  float e3;
} R2Rec;

... or am I completely wrong?
Bjoern
January 09, 2007
Many thanks, this info will give me a Go!

 I guess in Modula2, if 'terminal' is true, you
> can only access tsym, and otherwise you can only access nSym? 

No. Nothing prevents you from making mistake.(In fact variant records are a pretty nice source for bugs)  Would be nice if D can do this better!
Bjoern
January 09, 2007
BLS wrote:
> Hi Frits,
>> I think you can mostly fake this in D using private (differently-named) union members and property methods that assert if 'terminal' has the wrong value.
>> You can even make it so that the original names directly alias the private names in a non-debug build, and only use the checking version in a debug build[1].
> Can you please offer an example based on ...
> Modula
> 
> TYPE  NodePtr = POINTER TO Node;
>         Node = RECORD suc, alt: NodePtr;
>          CASE terminal: BOOLEAN OF
>            TRUE:  tsym: CHAR |
>            FALSE: nSym: HeadPtr
>          END
>        END;

-----
struct Header{};    // to make it compile as-is

struct Node
{
    Node* suc;
    Node* alt;
    bool terminal;

    union
    {
        private char tsym_;
        private Header* nSym_;
    }

    debug
    {
        // Property versions, with full error checking
        // (This block is used if -debug is passed to the compiler)
        char tsym()
        in
        {
            assert(terminal == false);
        }
        body
        {
            return tsym_;
        }

        char tsym(char newval)
        out
        {
            assert(terminal == false);
        }
        body
        {
            terminal = false;
            return tsym_ = newval;
        }

        Header* nSym()
        in
        {
            assert(terminal == true);
        }
        body
        {
            return nSym_;
        }

        Header* nSym(Header* newval)
        out
        {
            assert(terminal == true);
        }
        body
        {
            terminal = true;
            return nSym_ = newval;
        }
    } else {
        // Hmm.. No way to automatically set 'terminal' in this
        // implementation. Damn.
        // (This block is used if -debug is NOT passed to the compiler)
        alias tsym_ tsym;
        alias nSym_ nSym;
    }
}
-----

It's unfortunately a bit wordy. It's also completely untested beyond the fact that it compiles ;).

I noticed a limitation of the optimization opportunity I mentioned:
In the debug version (with property setters) you can automatically adjust the value of 'terminal' depending on the last-set property.
This isn't possible with direct aliasing as far as I can see.
If this is a problem, you might want to always use the code in the first block. I see no reason the compiler can't inline the functions if passed the appropriate optimization flags, so it shouldn't really matter much.

Some notes:
* The first block of code (right after 'debug') is compiled in if -debug is passed to the compiler. Otherwise, the second (short) block is compiled.
* The 'in' and 'out' blocks are removed by the compiler if -release is present on the compiler command line.
* As mentioned in my previous post, tsym_ and nSym_ are accessible from code in the same module even though they are private.
* Another implementation option is to also rename 'terminal' and make it private, and then only provide a property getter function so the code using the Node type can't set it without also setting the appropriate union member.
* If you only ever set the union members right after constructing the Node instance, you may also want to think about classes and inheritance if you're OOP-inclined.
January 09, 2007
Thanks Frits,
a lot of interesting stuff for a D newbie like me.
Frits van Bommel schrieb:

> * If you only ever set the union members right after constructing the Node instance, you may also want to think about classes and inheritance if you're OOP-inclined.

Indeed the snippet is part of a table driven (better data-structure) general parser. So I allways check Terminal first and then I set the union values.
It was anyway my next question : How to implement this the OOP way ?
 and ... Can opCall() help somehow ?
Thanks for beeing so patient with me. Bjoern
January 09, 2007
BLS wrote:
> Thanks Frits,
> a lot of interesting stuff for a D newbie like me.
> Frits van Bommel schrieb:
> 
>> * If you only ever set the union members right after constructing the Node instance, you may also want to think about classes and inheritance if you're OOP-inclined.
> 
> Indeed the snippet is part of a table driven (better data-structure) general parser. So I allways check Terminal first and then I set the union values.
> It was anyway my next question : How to implement this the OOP way ?
>  and ... Can opCall() help somehow ?

OOP looks a lot cleaner:

-----
abstract class Node {
    Node suc;
    Node alt;
}

class TerminalNode : Node {
    char tsym;
}

class NonTerminalNode : Node {
    Header nSym;
}
-----

That's just the skeleton though. For starters, you'll want to either declare those members public or provide accessors ;).
Constructors would also be nice. Since you asked though, pretty much the same effect can be achieved with static opCall (one in each child class and/or two overloaded versions in Node itself) that creates a new object of the appropriate type and fills in the fields.

To check whether a node is a terminal or not, there are several options:
* Try to cast a node to (Non)TerminalNode. If it's not of the appropriate type, the cast returns null. This has the benefit that it also returns a usable reference if it _is_ of the appropriate type, so that you can access its special members.
* Add 'abstract bool isTerminal()' to Node and override it in the subclasses.
* Add a 'const bool isTerminal' field to Node and initialize it in the constructor from a parameter.
* Add 'TerminalNode asTerminal() { return null; }' to Node and override it in TerminalNode to read 'return this;'. (Do something similar with NonTerminalNode)
* {{ Probably some others I can't think of right now }}
January 09, 2007
"BLS" <Killing_Zoe@web.de> wrote in message news:eo0ih9$2odp$1@digitaldaemon.com...
> Many thanks, this info will give me a Go!
>
>  I guess in Modula2, if 'terminal' is true, you
>> can only access tsym, and otherwise you can only access nSym?
>
> No. Nothing prevents you from making mistake.(In fact variant records are a pretty nice source for bugs)  Would be nice if D can do this better! Bjoern

Hmm.. what's that 'terminal' member for then, anyway?


January 10, 2007
Jarrett Billingsley schrieb:
> "BLS" <Killing_Zoe@web.de> wrote in message news:eo0ih9$2odp$1@digitaldaemon.com...
> 
>>Many thanks, this info will give me a Go!
>>
>> I guess in Modula2, if 'terminal' is true, you
>>
>>>can only access tsym, and otherwise you can only access nSym?
>>
>>No. Nothing prevents you from making mistake.(In fact variant records are a pretty nice source for bugs)  Would be nice if D can do this better!
>>Bjoern
> 
> 
> Hmm.. what's that 'terminal' member for then, anyway? 
> 
> 
The official language definition (Quote N. Wirth/my interpretation) says that Mr. Compiler has to take care about it. The real world implementation is just an other story.
Bjoern, and  Thanks again man!