Jump to page: 1 2
Thread overview
Template-style polymorphism in table structure
Sep 04, 2016
data pulverizer
Sep 04, 2016
Lodovico Giaretta
Sep 04, 2016
data pulverizer
Sep 04, 2016
ZombineDev
Sep 04, 2016
data pulverizer
Sep 04, 2016
data pulverizer
Sep 04, 2016
data pulverizer
Sep 04, 2016
Lodovico Giaretta
Sep 04, 2016
data pulverizer
Sep 05, 2016
data pulverizer
Sep 05, 2016
data pulverizer
Sep 05, 2016
Lodovico Giaretta
Sep 09, 2016
Kagamin
September 04, 2016
I am trying to build a data table object with unrestricted column types. The approach I am taking is to build a generic interface BaseVector class and then a subtype GenericVector(T) which inherits from the BaseVector. I then to build a Table class which contains columns that is a BaseVector array to represent the columns in the table.

My main question is how to return GenericVector!(T) from the getCol() method in the Table class instead of BaseVector.

Perhaps my Table implementation somehow needs to be linked to GenericVector(T) or maybe I have written BaseTable instead and I need to do something like a GenericTable(T...). However, my previous approach created a tuple type data object but once created, the type structure (column type configuration) could not be changed so no addition/removal of columns.


------------------------------------------------
import std.stdio : writeln, write, writefln;
import std.format : format;

interface BaseVector{
    BaseVector get(size_t);
}

class GenericVector(T) : BaseVector{
    T[] data;
    alias data this;
    GenericVector get(size_t i){
        return new GenericVector!(T)(data[i]);
    }
    this(T[] arr){
        this.data = arr;
    }
    this(T elem){
        this.data ~= elem;
    }
    void append(T[] arr){
        this.data ~= arr;
    }

    override string toString() const {
        return format("%s", data);
    }
}

class Table{
private:
    BaseVector[] data;
public:
    // How to return GenericVector!(T) here instead of BaseVector
    BaseVector getCol(size_t i){
        return data[i];
    }
    this(BaseVector[] x ...){
        foreach(col; x)
            this.data ~= col;
    }
    this(BaseVector[] x){
        this.data ~= x;
    }
    this(Table x, BaseVector[] y ...){
        this.data = x.data;
        foreach(col; y){
            this.data ~= col;
        }
    }
    void append(BaseVector[] x ...){
        foreach(col; x)
            this.data ~= x;
    }
}


void main(){
    auto index = new GenericVector!(int)([1, 2, 3, 4, 5]);
    auto numbers = new GenericVector!(double)([1.1, 2.2, 3.3, 4.4, 5.5]);
    auto names = new GenericVector!(string)(["one", "two", "three", "four", "five"]);
    Table df = new Table(index, numbers, names);
    // I'd like this to be GenericVector!(T)
    writeln(typeid(df.getCol(0)));
}

September 04, 2016
On Sunday, 4 September 2016 at 09:55:53 UTC, data pulverizer wrote:
> [...]

Your code is not very D style and, based on your needs, there may be better ways to achieve your goal, but without knowing your use case, it's difficult to give correct advice.

Talking about that writeln statement, your code is not working because of a known compiler bug [1]. If you change your interface BaseVector to an abstract class and add the needed override annotation to GenericVector, then typeid returns the expected result.

[1] https://issues.dlang.org/show_bug.cgi?id=13833


September 04, 2016
On Sunday, 4 September 2016 at 09:55:53 UTC, data pulverizer wrote:
> I am trying to build a data table object with unrestricted column types. The approach I am taking is to build a generic interface BaseVector class and then a subtype GenericVector(T) which inherits from the BaseVector. I then to build a Table class which contains columns that is a BaseVector array to represent the columns in the table.
>
> My main question is how to return GenericVector!(T) from the getCol() method in the Table class instead of BaseVector.
>
> Perhaps my Table implementation somehow needs to be linked to GenericVector(T) or maybe I have written BaseTable instead and I need to do something like a GenericTable(T...). However, my previous approach created a tuple type data object but once created, the type structure (column type configuration) could not be changed so no addition/removal of columns.
>
>
> ------------------------------------------------
> import std.stdio : writeln, write, writefln;
> import std.format : format;
>
> interface BaseVector{
>     BaseVector get(size_t);
> }
>
> class GenericVector(T) : BaseVector{
>     T[] data;
>     alias data this;
>     GenericVector get(size_t i){
>         return new GenericVector!(T)(data[i]);
>     }
>     this(T[] arr){
>         this.data = arr;
>     }
>     this(T elem){
>         this.data ~= elem;
>     }
>     void append(T[] arr){
>         this.data ~= arr;
>     }
>
>     override string toString() const {
>         return format("%s", data);
>     }
> }
>
> class Table{
> private:
>     BaseVector[] data;
> public:
>     // How to return GenericVector!(T) here instead of BaseVector
>     BaseVector getCol(size_t i){
>         return data[i];
>     }
>     this(BaseVector[] x ...){
>         foreach(col; x)
>             this.data ~= col;
>     }
>     this(BaseVector[] x){
>         this.data ~= x;
>     }
>     this(Table x, BaseVector[] y ...){
>         this.data = x.data;
>         foreach(col; y){
>             this.data ~= col;
>         }
>     }
>     void append(BaseVector[] x ...){
>         foreach(col; x)
>             this.data ~= x;
>     }
> }
>
>
> void main(){
>     auto index = new GenericVector!(int)([1, 2, 3, 4, 5]);
>     auto numbers = new GenericVector!(double)([1.1, 2.2, 3.3, 4.4, 5.5]);
>     auto names = new GenericVector!(string)(["one", "two", "three", "four", "five"]);
>     Table df = new Table(index, numbers, names);
>     // I'd like this to be GenericVector!(T)
>     writeln(typeid(df.getCol(0)));
> }

Since BaseVector is a polymorphic type you can't know in advance (at compile-time) the type of the object at a particular index. The only way to get a typed result is to specify the type that you expect, by providing a type parameter to the function:

The cast operator will perform a dynamic cast at runtime which will return an object of the requested type, or null, if object is of some other type.

GenericVector!ExpectedType getTypedCol(ExpectedType)(size_t i){

    assert (cast(GenericVector!ExpectedType)data[i],
        format("The vector at col %s is not of type %s, but %s", i,
         ExpectedType.stringof, typeof(data[i])));

    return cast(GenericVector!ExpectedType)data[i];
}

void main(){
     auto index = new GenericVector!(int)([1, 2, 3, 4, 5]);
     auto numbers = new GenericVector!(double)([1.1, 2.2, 3.3, 4.4, 5.5]);
     auto names = new GenericVector!(string)(["one", "two", "three", "four", "five"]);
     Table df = new Table(index, numbers, names);

     if (typeid(df.getCol(0) == typeid(string))
         writeln(df.getTypedCol!string(0).data);

     else if (typeid(df.getCol(0) == typeid(int))
         writeln(df.getTypedCol!int(0).data);

     // and so on...
 }

Another way to approach the problem is to keep your data in an Algebraic.
(https://dpaste.dzfl.pl/7a4e9bf408d1):

import std.meta : AliasSeq;
import std.variant : Algebraic, visit;
import std.stdio : writefln;

alias AllowedTypes = AliasSeq!(int[], double[], string[]);
alias Vector = Algebraic!AllowedTypes;
alias Table = Vector[];

void main()
{
    Vector indexes = [1, 2, 3, 4, 5];
    Vector numbers = [1.1, 2.2, 3.3, 4.4, 5.5];
    Vector names = ["one", "two", "three", "four", "five"];

    Table table = [indexes, numbers, names];

    foreach (idx, col; table)
        col.visit!(
            (int[] indexColumn) =>
                 writefln("An index column at %s. Contents: %s", idx, indexColumn),
            (double[] numberColumn) =>
                 writefln("A number column at %s. Contents: %s", idx, numberColumn),
            (string[] namesColumn) =>
                 writefln("A string column at %s. Contents: %s", idx, namesColumn)
        );
}

Application output:
An index column at 0. Contents: [1, 2, 3, 4, 5]
A number column at 1. Contents: [1.1, 2.2, 3.3, 4.4, 5.5]
A string column at 2. Contents: ["one", "two", "three", "four", "five"]
September 04, 2016
On Sunday, 4 September 2016 at 09:55:53 UTC, data pulverizer wrote:
> My main question is how to return GenericVector!(T) from the getCol() method in the Table class instead of BaseVector.

I think I just solved my own query, change the BaseVector interface to a class and override it in the GenericVector(T) class:

----------------------------
class BaseVector{
    BaseVector get(size_t){
        return new BaseVector;
    };
}

class GenericVector(T) : BaseVector{
    T[] data;
    alias data this;
    override GenericVector get(size_t i){
        return new GenericVector!(T)(data[i]);
    }
    this(T[] arr){
        this.data = arr;
    }
    this(T elem){
        this.data ~= elem;
    }
    void append(T[] arr){
        this.data ~= arr;
    }

    override string toString() const {
        return format("%s", data);
    }
}

class Table{
// ... as before
}

void main(){
    auto index = new GenericVector!(int)([1, 2, 3, 4, 5]);
    auto numbers = new GenericVector!(double)([1.1, 2.2, 3.3, 4.4, 5.5]);
    auto names = new GenericVector!(string)(["one", "two", "three", "four", "five"]);
    Table df = new Table(index, numbers, names);
    // now prints table.GenericVector!int.GenericVector
    writeln(typeid(df.getCol(0)));
}
September 04, 2016
On Sunday, 4 September 2016 at 14:07:54 UTC, data pulverizer wrote:
@Lodovico Giaretta Thanks I just saw your update!
September 04, 2016
On Sunday, 4 September 2016 at 14:20:24 UTC, data pulverizer wrote:
> On Sunday, 4 September 2016 at 14:07:54 UTC, data pulverizer wrote:
> @Lodovico Giaretta Thanks I just saw your update!

@Lodovico Giaretta BTW what do you mean that my code is not very D style? Please expand on this ...
September 04, 2016
On Sunday, 4 September 2016 at 14:02:03 UTC, Lodovico Giaretta wrote:
> Your code is not very D style

... Well I guess I could have contracted the multiple constructors in GenericVector(T) and and DataFrame?
September 04, 2016
On Sunday, 4 September 2016 at 14:24:12 UTC, data pulverizer wrote:
> On Sunday, 4 September 2016 at 14:20:24 UTC, data pulverizer wrote:
>> On Sunday, 4 September 2016 at 14:07:54 UTC, data pulverizer wrote:
>> @Lodovico Giaretta Thanks I just saw your update!
>
> @Lodovico Giaretta BTW what do you mean that my code is not very D style? Please expand on this ...

The constructors can be less. In fact, a typesafe variadic ctor also works for the single element case and for the array case. But you already recognized that.

Instead of reinventing the wheel for your GenericVector!T, you could use an `alias this` to directly inherit all operation on the underlying array, without having to reimplement them (like your append method).

Your getCol(i) could become getCol!T(i) and return an instance of GenericVector!T directly, after checking that the required column has in fact that type:

GenericVector!T getCol!T(size_t i)
{
    if(typeid(cols[i]) == typeid(GenericVector!T))
        return cast(GenericVector!T)cols[i];
    else
        // assert(0) or throw exception
}

Another solution: if you don't need to dynamically change the type of the columns you can have the addColumn function create a new type. I show you with Tuples because it's easier:

Tuple!(T,U) append(U, T...)(Tuple!T tup, U col)
{
    return Tuple!(T,U)(tup.expand, col);
}

Tuple!int t1;
Tuple!(int, float) t2 = t1.append(2.0);
Tuple!(int, float, char) t3 = t2.append('c');
September 04, 2016
On Sunday, 4 September 2016 at 14:49:30 UTC, Lodovico Giaretta wrote:
> On Sunday, 4 September 2016 at 14:24:12 UTC, data pulverizer wrote:
>> On Sunday, 4 September 2016 at 14:20:24 UTC, data pulverizer wrote:
>> @Lodovico Giaretta BTW what do you mean that my code is not very D style? Please expand on this ...
>
> The constructors can be less. In fact, a typesafe variadic ctor also works for the single element case and for the array case. But you already recognized that.
>
> Instead of reinventing the wheel for your GenericVector!T, you could use an `alias this` to directly inherit all operation on the underlying array, without having to reimplement them (like your append method).
>
> Your getCol(i) could become getCol!T(i) and return an instance of GenericVector!T directly, after checking that the required column has in fact that type:
>
> GenericVector!T getCol!T(size_t i)
> {
>     if(typeid(cols[i]) == typeid(GenericVector!T))
>         return cast(GenericVector!T)cols[i];
>     else
>         // assert(0) or throw exception
> }
>
> Another solution: if you don't need to dynamically change the type of the columns you can have the addColumn function create a new type. I show you with Tuples because it's easier:
>
> Tuple!(T,U) append(U, T...)(Tuple!T tup, U col)
> {
>     return Tuple!(T,U)(tup.expand, col);
> }
>
> Tuple!int t1;
> Tuple!(int, float) t2 = t1.append(2.0);
> Tuple!(int, float, char) t3 = t2.append('c');

Thank you for the very useful suggestions, I shall take these forward. On the suggestion of creating Tuple-like tables, I already tried that but found as you said that once the table is created, adding/removing columns is essentially creating a different data type, which needs a new variable name each time.

I am building a table type I hope will be used for data manipulation for data science and statistics applications, so I require a data structure that can allow adding and removing columns of various types as well as a data structure that can cope with any type that hasn't been planned for, which is why I selected this polymorphic template approach. It is more flexible than other data structures I have seen in dynamic programming languages R's data frame and Python pandas. Even Scala's Spark dataframes rely on wrapping everything in Any and the user still has to write a special data structure for each new type. The only thing that is similar to this approach is Julia's DataFrame but Julia - though a very good programming language has limitations.

I feel as if I am constantly scratching the surface of what D can do, but I have recently managed to get more time on my hands and it looks as if that will continue into the future which will mean more focusing on D, improving my generic programming skills and hopefully creating some useful artifacts. Perhaps I need to read Andrei's Modern C++ Design book for a better way to think about generics.


September 05, 2016
On Sunday, 4 September 2016 at 14:49:30 UTC, Lodovico Giaretta wrote:
> Your getCol(i) could become getCol!T(i) and return an instance of GenericVector!T directly, after checking that the required column has in fact that type:
>
> GenericVector!T getCol!T(size_t i)
> {
>     if(typeid(cols[i]) == typeid(GenericVector!T))
>         return cast(GenericVector!T)cols[i];
>     else
>         // assert(0) or throw exception
> }
I just realized that typeid only gives the class and not the actual type, so the object will still need to be cast as you mentioned above, however your above function will not infer T, so the user will have to provide it. I wonder if there is a way to dispatch the right type by a dynamic cast or I fear that ZombineDev may be correct and the types will have to be limited, which I definitely want to avoid!


« First   ‹ Prev
1 2