Simple DataFrames library

October 29

Posted by Aravinda VK

Permalink

Aravinda VK

Permalink

Hello everyone,

I am happy to share my DataFrame library for D. My primary focus was to make it simple to use and I haven't spent a lot of time optimizing the code for memory and performance.

Example:

import std.stdio;
import std.algorithm;
import std.array;
import std.range;

import dataframes;

struct Product
{
    string name;
    double unitPrice;
    int quantity;
    double discount;
    double totalPrice;
}

const DISCOUNTS = [
    "WELCOME": 5,
    "HAPPY": 2
];

double[] applyDiscounts(Column!double values, string coupon = "")
{
    auto pct = coupon in DISCOUNTS;
    if (pct is null)
        return iota(values.length).map!("0.0").array;

    return values.map!(v => v * (*pct/100.0)).array;
}

void main(string[] args)
{
    auto coupon = args.length > 1 ? args[1] : "";
    auto df = new DataFrame!Product(
        name: ["p1", "p2", "p3", "p4"],
        unitPrice: [10.0, 15.0, 5.0, 20.0],
        quantity: [3, 1, 5, 2]
    );

    df.discount = df.unitPrice.applyDiscounts(coupon);
    df.totalPrice = (df.unitPrice - df.discount) * df.quantity;

    // Preview
    df.writeln;

    auto total = df.rows
        .map!(r => r.totalPrice)
        .sum;

    writeln("Total: ", total);
}

Highlights:

Creates a new Class with all the fields of the given struct as arrays.
Supports column operations like adding two columns or multiplying each values of the elements etc.
df.rows will return the list of Row with only reference to the main data.
Easy to use with std.algorithm goodies (Refer README).

Add dataframes to your project by running,

dub add dataframes

The code and the documentation are available on GitHub https://github.com/aravindavk/dataframes-d and https://code.dlang.org/packages/dataframes

Please feel free to use it and let me know your experience and suggestions.

Thanks
Aravinda

On Tuesday, 29 October 2024 at 16:11:08 UTC, Aravinda VK wrote:

[snip]

Please feel free to use it and let me know your experience and suggestions.

Thanks
Aravinda

Thanks for working on this. I'm a big fan of using dataframes in R or pandas in python.

That being said, I think there's more value in building dataframes either on top of or as a part of mir's ndslices. There was previously a project (magpie, I believe) that built dataframes on top of ndslices. My recollection is that the main issue for support within mir is that support for labels isn't fully implemented (also Ilya is less involved with the project these days). It's kind of complicated to implement some of this support and there is a concern that it can lead to breaking changes. Here's a list of tasks:
https://github.com/libmir/mir-algorithm/issues/426

Forums