Thread overview
Need Advice: Union or Variant?
Nov 17, 2022
jwatson-CO-edu
Nov 17, 2022
H. S. Teoh
Nov 17, 2022
jwatson-CO-edu
Nov 17, 2022
H. S. Teoh
Nov 19, 2022
jwatson-CO-edu
Nov 19, 2022
jwatson-CO-edu
Nov 17, 2022
jwatson-CO-edu
November 17, 2022

I have an implementation of the "Little Scheme" educational programming language written in D, here".

It has many problems, but the one I want to solve first is the size of the "atoms" (units of data).

Atom is a struct that has fields for every possible type of data that the language supports. This means that a bool Atom unnecessarily takes up space in memory with fields for number, string, structure, etc.

Here is the definition:

enum F_Type{
    CONS, // Cons pair
    STRN, // String/Symbol
    NMBR, // Number
    EROR, // Error object
    BOOL, // Boolean value
    FUNC, // Function
}

struct Atom{
    F_Type  kind; // ---------------- What kind of atom this is
    Atom*   car; // ----------------- Left  `Atom` Pointer
    Atom*   cdr; // ----------------- Right `Atom` Pointer
    double  num; // ----------------- Number value
    string  str; // ----------------- String value, D-string underlies
    bool    bul; // ----------------- Boolean value
    F_Error err = F_Error.NOVALUE; // Error code
}

Question:
Where do I begin my consolidation of space within Atom? Do I use unions or variants?

November 17, 2022
On Thu, Nov 17, 2022 at 08:54:46PM +0000, jwatson-CO-edu via Digitalmars-d-learn wrote: [...]
> ```d
> enum F_Type{
>     CONS, // Cons pair
>     STRN, // String/Symbol
>     NMBR, // Number
>     EROR, // Error object
>     BOOL, // Boolean value
>     FUNC, // Function
> }
> 
> struct Atom{
>     F_Type  kind; // ---------------- What kind of atom this is
>     Atom*   car; // ----------------- Left  `Atom` Pointer
>     Atom*   cdr; // ----------------- Right `Atom` Pointer
>     double  num; // ----------------- Number value
>     string  str; // ----------------- String value, D-string underlies
>     bool    bul; // ----------------- Boolean value
>     F_Error err = F_Error.NOVALUE; // Error code
> }
> 
> ```
> Question:
> **Where do I begin my consolidation of space within `Atom`?  Do I use
> unions or variants?**

In this case, since you're already keeping track of what type of data is being stored in an Atom, use a union:

	struct Atom {
		F_Type kind;
		union {		// anonymous union
			Atom*   car; // ----------------- Left  `Atom` Pointer
			Atom*   cdr; // ----------------- Right `Atom` Pointer
			double  num; // ----------------- Number value
			string  str; // ----------------- String value, D-string underlies
			bool    bul; // ----------------- Boolean value
			F_Error err = F_Error.NOVALUE; // Error code
		}
	}

Use Variant if you don't want to keep track of the type yourself.


T

-- 
An elephant: A mouse built to government specifications. -- Robert Heinlein
November 17, 2022

On Thursday, 17 November 2022 at 20:54:46 UTC, jwatson-CO-edu wrote:

>

I have an implementation of the "Little Scheme" educational programming language written in D, here".

It has many problems, but the one I want to solve first is the size of the "atoms" (units of data).

Atom is a struct that has fields for every possible type of data that the language supports. This means that a bool Atom unnecessarily takes up space in memory with fields for number, string, structure, etc.

Here is the definition:

enum F_Type{
    CONS, // Cons pair
    STRN, // String/Symbol
    NMBR, // Number
    EROR, // Error object
    BOOL, // Boolean value
    FUNC, // Function
}

struct Atom{
    F_Type  kind; // ---------------- What kind of atom this is
    Atom*   car; // ----------------- Left  `Atom` Pointer
    Atom*   cdr; // ----------------- Right `Atom` Pointer
    double  num; // ----------------- Number value
    string  str; // ----------------- String value, D-string underlies
    bool    bul; // ----------------- Boolean value
    F_Error err = F_Error.NOVALUE; // Error code
}

Question:
Where do I begin my consolidation of space within Atom? Do I use unions or variants?

In general, I recommend std.sumtype, as it is one of the best D libraries for this purpose. It is implemented as a struct containing two fields: the kind and a union of all the possible types.
That said, one difficulty you are likely to face is with refactoring your code to use the match and tryMatch functions, as std.sumtype.SumType does not expose the underlying kind field.

Other notable alternatives are:

November 17, 2022
On Thursday, 17 November 2022 at 21:05:43 UTC, H. S. Teoh wrote:
>> Question:
>> **Where do I begin my consolidation of space within `Atom`?  Do I use
>> unions or variants?**
>
> In this case, since you're already keeping track of what type of data is being stored in an Atom, use a union:
>
> 	struct Atom {
> 		F_Type kind;
> 		union {		// anonymous union
> 			Atom*   car; // ----------------- Left  `Atom` Pointer
> 			Atom*   cdr; // ----------------- Right `Atom` Pointer
> 			double  num; // ----------------- Number value
> 			string  str; // ----------------- String value, D-string underlies
> 			bool    bul; // ----------------- Boolean value
> 			F_Error err = F_Error.NOVALUE; // Error code
> 		}
> 	}
>
> Use Variant if you don't want to keep track of the type yourself.
> T

Thank you!  This seems nice except there are a few fields that need to coexist.
I need {`car`, `cdr`} -or- {`num`} -or- {`str`} -or- {`bul`}.
`err` will be outside the union as well because I have decided that any type can have an error code attached.  As in an error number (other than NaN) can be returned instead of reserving certain numbers to represent errors.  Imagine if there was NaN for every datatype.


November 17, 2022

On Thursday, 17 November 2022 at 21:19:56 UTC, Petar Kirov [ZombineDev] wrote:

>

On Thursday, 17 November 2022 at 20:54:46 UTC, jwatson-CO-edu wrote:

>

I have an implementation of the "Little Scheme" educational programming language written in D, here".

It has many problems, but the one I want to solve first is the size of the "atoms" (units of data).

Atom is a struct that has fields for every possible type of data that the language supports. This means that a bool Atom unnecessarily takes up space in memory with fields for number, string, structure, etc.

[...]
Do I use unions or variants?**

In general, I recommend std.sumtype, as it is one of the best D libraries for this purpose. It is implemented as a struct containing two fields: the kind and a union of all the possible types.
That said, one difficulty you are likely to face is with refactoring your code to use the match and tryMatch functions, as std.sumtype.SumType does not expose the underlying kind field.

Other notable alternatives are:

Thank you! This is intriguing.
The different flavors of Atom I need will have either {car, cdr} -or- {num} -or- {str} -or- {bul}. Does SumType allow me to store the multiple fields {car, cdr} in one of the types, while the other types have only one field?

Since this is a dynamically-typed language, I need the atoms to both be interchangeable and to serve different purposes at the same time.

November 17, 2022
On Thu, Nov 17, 2022 at 10:16:04PM +0000, jwatson-CO-edu via Digitalmars-d-learn wrote:
> On Thursday, 17 November 2022 at 21:05:43 UTC, H. S. Teoh wrote:
[...]
> > 	struct Atom {
> > 		F_Type kind;
> > 		union {		// anonymous union
> > 			Atom*   car; // ----------------- Left  `Atom` Pointer
> > 			Atom*   cdr; // ----------------- Right `Atom` Pointer
> > 			double  num; // ----------------- Number value
> > 			string  str; // ----------------- String value, D-string underlies
> > 			bool    bul; // ----------------- Boolean value
> > 			F_Error err = F_Error.NOVALUE; // Error code
> > 		}
> > 	}
[...]
> Thank you!  This seems nice except there are a few fields that need to
> coexist.
> I need {`car`, `cdr`} -or- {`num`} -or- {`str`} -or- {`bul`}.
> `err` will be outside the union as well because I have decided that
> any type can have an error code attached.  As in an error number
> (other than NaN) can be returned instead of reserving certain numbers
> to represent errors.  Imagine if there was NaN for every datatype.
[...]

Just create a nested anonymous struct, like this:

 	struct Atom {
 		F_Type kind;
 		union {		// anonymous union
			struct {
				Atom*   car; // ----------------- Left  `Atom` Pointer
				Atom*   cdr; // ----------------- Right `Atom` Pointer
			}
			struct {
				double  num; // ----------------- Number value
				string  str; // ----------------- String value, D-string underlies
			}
 			bool    bul; // ----------------- Boolean value
 		}
 		F_Error err = F_Error.NOVALUE; // Error code
 	}


T

-- 
Meat: euphemism for dead animal. -- Flora
November 19, 2022
On Thursday, 17 November 2022 at 22:49:37 UTC, H. S. Teoh wrote:
> Just create a nested anonymous struct, like this:
>
>  	struct Atom {
>  		F_Type kind;
>  		union {		// anonymous union
> 			struct {
> 				Atom*   car; // ----------------- Left  `Atom` Pointer
> 				Atom*   cdr; // ----------------- Right `Atom` Pointer
> 			}
> 			struct {
> 				double  num; // ----------------- Number value
> 				string  str; // ----------------- String value, D-string underlies
> 			}
>  			bool    bul; // ----------------- Boolean value
>  		}
>  		F_Error err = F_Error.NOVALUE; // Error code
>  	}
>
>
> T
Thank you, something similar to what you suggested reduced the atom size from 72 bytes to 40.
November 19, 2022

On Saturday, 19 November 2022 at 03:38:26 UTC, jwatson-CO-edu wrote:

>

Thank you, something similar to what you suggested reduced the atom size from 72 bytes to 40.

Oh, based on another forum post I added constructors in addition to reducing the atom size 44%.

struct Atom{
    F_Type  kind; // What kind of atom this is
    union{
        double  num; // NMBR: Number value
        string  str; // STRN: String value, D-string
        bool    bul; // BOOL: Boolean value
        struct{ // ---- CONS: pair
            Atom* car; // Left  `Atom` Pointer
            Atom* cdr; // Right `Atom` Pointer
        }
        struct{ // ---- EROR: Code + Message
            F_Error err; // Error code
            string  msg; // Detailed desc
        }
    }
    // https://forum.dlang.org/post/omsbr8$7do$1@digitalmars.com
    this( double n ){ kind = F_Type.NMBR; num = n; } // make number
    this( string s ){ kind = F_Type.STRN; str = s; } // make string
    this( bool   b ){ kind = F_Type.BOOL; bul = b; } // make bool
    this( Atom* a, Atom* d ){ kind = F_Type.CONS; car = a; cdr = d; } // make cons
    this( F_Error e, string m ){ kind = F_Type.EROR; err = e; msg = m; } // make error
}