Best way to read CSV data file into Mir (2d array) ndslice? - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » Learn » Best way to read CSV data file into Mir (2d array) ndslice?

Thread overview

Best way to read CSV data file into Mir (2d array) ndslice?
Sep 21, 2022 mw
Sep 21, 2022 jmh530
Sep 21, 2022 jmh530
Sep 21, 2022 mw

September 21, 2022

Best way to read CSV data file into Mir (2d array) ndslice?

Posted by mw

mw

Hi,

I'm just wondering what is the best way to read CSV data file into Mir (2d array) ndslice? Esp. if it can parse date into int/float.

I searched a bit, but can't find any example.

Thanks.

September 21, 2022

Re: Best way to read CSV data file into Mir (2d array) ndslice?

Posted by jmh530
in reply to mw

jmh530

Posted in reply to mw

On Wednesday, 21 September 2022 at 05:31:48 UTC, mw wrote:

>

Hi,

I'm just wondering what is the best way to read CSV data file into Mir (2d array) ndslice? Esp. if it can parse date into int/float.

I searched a bit, but can't find any example.

Thanks.

It probably can't hurt to try the simplest approach first. std.csv can return an input range that you can then use to create a ndslice. Offhand, I don't know what D tools are an alternative to std.csv for reading CSVs.

ndslice assumes homogenous data, but you can put the Dates (as Date types) as part of the labels (as Data Frames). However, there's a bit to be desired in terms of getting that functionality integrated into the rest of the package [1].

[1] https://github.com/libmir/mir-algorithm/issues/426

September 21, 2022

Re: Best way to read CSV data file into Mir (2d array) ndslice?

Posted by jmh530
in reply to jmh530

jmh530

Posted in reply to jmh530

On Wednesday, 21 September 2022 at 13:08:14 UTC, jmh530 wrote:

>

On Wednesday, 21 September 2022 at 05:31:48 UTC, mw wrote:

>

Hi,

I'm just wondering what is the best way to read CSV data file into Mir (2d array) ndslice? Esp. if it can parse date into int/float.

I searched a bit, but can't find any example.

Thanks.

It probably can't hurt to try the simplest approach first. std.csv can return an input range that you can then use to create a ndslice. Offhand, I don't know what D tools are an alternative to std.csv for reading CSVs.

ndslice assumes homogenous data, but you can put the Dates (as Date types) as part of the labels (as Data Frames). However, there's a bit to be desired in terms of getting that functionality integrated into the rest of the package [1].

[1] https://github.com/libmir/mir-algorithm/issues/426

I just tried doing it with std.csv, but my version was a bit awkward since it doesn't seem quite so straightforward to just take the result of csvReader and put it in a array. I had to read it in there. I also wanted to allocate the array up front, but to do that I needed to know how big it was and ended up doing two passes on reading the data, which isn't ideal.

import std.csv;
import std.stdio: writeln;
import mir.ndslice.allocation: slice;

void main() {
    string text = "date,x1,x2\n1/31/2010,65,2.5\n2/28/2010,123,7.5";
    auto records_firstpass = text.csvReader!double(["x1","x2"]);
    auto records_secondpass = text.csvReader!double(["x1","x2"]);
    size_t len = 0;
    foreach (record; records_firstpass) {
        len++;
    }
    auto data = slice!double(len, 2);
    size_t i = 0;
    size_t j;
    foreach (record; records_secondpass)
    {
        j = 0;
        foreach (r; record) {
            data[i, j] = r;
            j++;
        }
        i++;
    }
    writeln(data);
}

September 21, 2022

Re: Best way to read CSV data file into Mir (2d array) ndslice?

Posted by mw
in reply to jmh530

mw

Posted in reply to jmh530

On Wednesday, 21 September 2022 at 19:14:30 UTC, jmh530 wrote:

>

I just tried doing it with std.csv, but my version was a bit awkward since it doesn't seem quite so straightforward to just take the result of csvReader and put it in a array. I had to read it in there. I also wanted to allocate the array up front, but to do that I needed to know how big it was and ended up doing two passes on reading the data, which isn't ideal.

Thanks, as you said this isn't ideal.

For Mir to catch up with numpy, being able to easily read CSV to import data is a must to attract data scientists.

In numpy/pandas, it's just one liner.

I logged an issue here as a feature request:

https://github.com/libmir/mir-algorithm/issues/442

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation