Thread overview
Best way to read CSV data file into Mir (2d array) ndslice?
Sep 21, 2022
mw
Sep 21, 2022
jmh530
Sep 21, 2022
jmh530
Sep 21, 2022
mw
September 21, 2022

Hi,

I'm just wondering what is the best way to read CSV data file into Mir (2d array) ndslice? Esp. if it can parse date into int/float.

I searched a bit, but can't find any example.

Thanks.

September 21, 2022

On Wednesday, 21 September 2022 at 05:31:48 UTC, mw wrote:

>

Hi,

I'm just wondering what is the best way to read CSV data file into Mir (2d array) ndslice? Esp. if it can parse date into int/float.

I searched a bit, but can't find any example.

Thanks.

It probably can't hurt to try the simplest approach first. std.csv can return an input range that you can then use to create a ndslice. Offhand, I don't know what D tools are an alternative to std.csv for reading CSVs.

ndslice assumes homogenous data, but you can put the Dates (as Date types) as part of the labels (as Data Frames). However, there's a bit to be desired in terms of getting that functionality integrated into the rest of the package [1].

[1] https://github.com/libmir/mir-algorithm/issues/426

September 21, 2022

On Wednesday, 21 September 2022 at 13:08:14 UTC, jmh530 wrote:

>

On Wednesday, 21 September 2022 at 05:31:48 UTC, mw wrote:

>

Hi,

I'm just wondering what is the best way to read CSV data file into Mir (2d array) ndslice? Esp. if it can parse date into int/float.

I searched a bit, but can't find any example.

Thanks.

It probably can't hurt to try the simplest approach first. std.csv can return an input range that you can then use to create a ndslice. Offhand, I don't know what D tools are an alternative to std.csv for reading CSVs.

ndslice assumes homogenous data, but you can put the Dates (as Date types) as part of the labels (as Data Frames). However, there's a bit to be desired in terms of getting that functionality integrated into the rest of the package [1].

[1] https://github.com/libmir/mir-algorithm/issues/426

I just tried doing it with std.csv, but my version was a bit awkward since it doesn't seem quite so straightforward to just take the result of csvReader and put it in a array. I had to read it in there. I also wanted to allocate the array up front, but to do that I needed to know how big it was and ended up doing two passes on reading the data, which isn't ideal.

import std.csv;
import std.stdio: writeln;
import mir.ndslice.allocation: slice;

void main() {
    string text = "date,x1,x2\n1/31/2010,65,2.5\n2/28/2010,123,7.5";
    auto records_firstpass = text.csvReader!double(["x1","x2"]);
    auto records_secondpass = text.csvReader!double(["x1","x2"]);
    size_t len = 0;
    foreach (record; records_firstpass) {
        len++;
    }
    auto data = slice!double(len, 2);
    size_t i = 0;
    size_t j;
    foreach (record; records_secondpass)
    {
        j = 0;
        foreach (r; record) {
            data[i, j] = r;
            j++;
        }
        i++;
    }
    writeln(data);
}
September 21, 2022

On Wednesday, 21 September 2022 at 19:14:30 UTC, jmh530 wrote:

>

I just tried doing it with std.csv, but my version was a bit awkward since it doesn't seem quite so straightforward to just take the result of csvReader and put it in a array. I had to read it in there. I also wanted to allocate the array up front, but to do that I needed to know how big it was and ended up doing two passes on reading the data, which isn't ideal.

Thanks, as you said this isn't ideal.

For Mir to catch up with numpy, being able to easily read CSV to import data is a must to attract data scientists.

In numpy/pandas, it's just one liner.

I logged an issue here as a feature request:

https://github.com/libmir/mir-algorithm/issues/442