Thread overview | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
|
April 06, 2015 getting started with std.csv | ||||
---|---|---|---|---|
| ||||
Hi. I'm a D newbie(!) coming from a Fortran/C/Python background. I'm struggling with the many new concepts needed in order to make any sense out of the documentation or traceback messages (ranges/templates/...). For example, the std.csv documentation is great but all the examples read from a string rather than a file. I feel stupid but I'm having trouble with the simple step of modifying the examples to read from a file. I can read the whole file into a string in memory and then read the records from the string just fine with csvReader (example A below) or read a line at a time from the file and call csvReader using a single line (example B below), but neither solution is satisfactory. In practice I need to read files with up to 80 million records so I'd like to understand how to do this properly/efficiently. tia, Gerald Example A ========= import std.stdio, std.file, std.csv; void main() { std.file.write("test.csv", "0,1,abc\n2,3,def"); scope(exit) std.file.remove("test.csv"); auto lines = readText!(string)("test.csv"); struct Rec { int a,b; char[] c; } foreach (Rec r; csvReader!Rec(lines)) { writeln("struct -> ", r); } } Example B ========= import std.stdio, std.file, std.csv; void main() { std.file.write("test.csv", "0,1,abc\n2,3,def"); scope(exit) std.file.remove("test.csv"); struct Rec { int a,b; char[] c; } Rec r; foreach (line; File("test.csv", "r").byLine) { r = csvReader!Rec(line).front; writeln("struct -> ", r); } } Output ====== struct -> Rec(0, 1, "abc") struct -> Rec(2, 3, "def") |
April 07, 2015 Re: getting started with std.csv | ||||
---|---|---|---|---|
| ||||
Posted in reply to gjansen | I got this to work with: ``` import std.stdio, std.file, std.csv, std.range; void main() { std.file.write("test.csv", "0,1,abc\n2,3,def"); scope(exit) std.file.remove("test.csv"); static struct Rec { int a, b; char[] c; } auto file = File("test.csv", "r"); foreach (s; csvReader!Rec(file.byLine().joiner("\n"))) { writeln("struct -> ", s); } } ``` I am not sure about using `file.byLine()` here, because `byLine` reuses its buffer, but this is working correctly (for some reason, anyone can comment?) as far as I tested. |
April 07, 2015 Re: getting started with std.csv | ||||
---|---|---|---|---|
| ||||
Posted in reply to yazd | On Tuesday, 7 April 2015 at 05:49:48 UTC, yazd wrote:
> I got this to work with:
>
> ```
> import std.stdio, std.file, std.csv, std.range;
>
> void main()
> {
> std.file.write("test.csv", "0,1,abc\n2,3,def");
> scope(exit) std.file.remove("test.csv");
>
> static struct Rec { int a, b; char[] c; }
>
> auto file = File("test.csv", "r");
> foreach (s; csvReader!Rec(file.byLine().joiner("\n")))
> {
> writeln("struct -> ", s);
> }
> }
> ```
>
> I am not sure about using `file.byLine()` here, because `byLine` reuses its buffer, but this is working correctly (for some reason, anyone can comment?) as far as I tested.
Btw, joiner is a lazy algorithm. In other words, it doesn't join the whole file when it is called but only when needed. This reduces the memory requirements as you won't need the whole file in memory at once.
|
April 07, 2015 Re: getting started with std.csv | ||||
---|---|---|---|---|
| ||||
Posted in reply to yazd | On Tuesday, 7 April 2015 at 05:51:33 UTC, yazd wrote:
> On Tuesday, 7 April 2015 at 05:49:48 UTC, yazd wrote:
>> I got this to work with:
>>
>> ```
>> import std.stdio, std.file, std.csv, std.range;
>>
>> void main()
>> {
>> std.file.write("test.csv", "0,1,abc\n2,3,def");
>> scope(exit) std.file.remove("test.csv");
>>
>> static struct Rec { int a, b; char[] c; }
>>
>> auto file = File("test.csv", "r");
>> foreach (s; csvReader!Rec(file.byLine().joiner("\n")))
>> {
>> writeln("struct -> ", s);
>> }
>> }
>> ```
>>
>> I am not sure about using `file.byLine()` here, because `byLine` reuses its buffer, but this is working correctly (for some reason, anyone can comment?) as far as I tested.
>
> Btw, joiner is a lazy algorithm. In other words, it doesn't join the whole file when it is called but only when needed. This reduces the memory requirements as you won't need the whole file in memory at once.
Replace `std.range` with `std.algorithm`.
|
April 07, 2015 Re: getting started with std.csv | ||||
---|---|---|---|---|
| ||||
Posted in reply to yazd | Many thanks for the feedback yazd! I've tested the approach with a large csv file and it works fine. Unfortunately csvReader seems very convenient but it is no speed daemon. To my dismay it was much slower (about 4x) than a simple approach I am using in Python, which is essentially equivalent to chomp(line).split(','). I guess I'll have to keep studying and learning. Thx again. |
April 07, 2015 Re: getting started with std.csv | ||||
---|---|---|---|---|
| ||||
Posted in reply to gjansen | On Tuesday, 7 April 2015 at 09:44:11 UTC, gjansen wrote:
> Many thanks for the feedback yazd! I've tested the approach with a large csv file and it works fine. Unfortunately csvReader seems very convenient but it is no speed daemon. To my dismay it was much slower (about 4x) than a simple approach I am using in Python, which is essentially equivalent to chomp(line).split(','). I guess I'll have to keep studying and learning. Thx again.
What compiler are you using? What compilation flags?
|
April 07, 2015 Re: getting started with std.csv | ||||
---|---|---|---|---|
| ||||
Posted in reply to John Colvin | dmd -O (2.066.1) and gdc -O3 (4.9.2)
But... as I tried to convey, I was comparing apples to oranges. I have now rewritten the D test simply using split(',') instead of csvReader, to be more similar to the python test, and it runs about 2x faster in D with dmd and about 4x faster with gdc compared to Python 3.4.3. :-)
On Tuesday, 7 April 2015 at 10:47:14 UTC, John Colvin wrote:
> On Tuesday, 7 April 2015 at 09:44:11 UTC, gjansen wrote:
>> Many thanks for the feedback yazd! I've tested the approach with a large csv file and it works fine. Unfortunately csvReader seems very convenient but it is no speed daemon. To my dismay it was much slower (about 4x) than a simple approach I am using in Python, which is essentially equivalent to chomp(line).split(','). I guess I'll have to keep studying and learning. Thx again.
>
> What compiler are you using? What compilation flags?
|
April 07, 2015 Re: getting started with std.csv | ||||
---|---|---|---|---|
| ||||
Posted in reply to gjansen | On Tuesday, 7 April 2015 at 11:36:54 UTC, gjansen wrote:
> dmd -O (2.066.1) and gdc -O3 (4.9.2)
>
> But... as I tried to convey, I was comparing apples to oranges. I have now rewritten the D test simply using split(',') instead of csvReader, to be more similar to the python test, and it runs about 2x faster in D with dmd and about 4x faster with gdc compared to Python 3.4.3. :-)
>
> On Tuesday, 7 April 2015 at 10:47:14 UTC, John Colvin wrote:
>> On Tuesday, 7 April 2015 at 09:44:11 UTC, gjansen wrote:
>>> Many thanks for the feedback yazd! I've tested the approach with a large csv file and it works fine. Unfortunately csvReader seems very convenient but it is no speed daemon. To my dismay it was much slower (about 4x) than a simple approach I am using in Python, which is essentially equivalent to chomp(line).split(','). I guess I'll have to keep studying and learning. Thx again.
>>
>> What compiler are you using? What compilation flags?
also consider:
-inline and -release
for dmd and
-frelease
for gdc
With gdc, if you are building for a specific cpu family (e.g. broadwell) -march= can provide improvements. -march=native chooses the same as the host machine.
|
Copyright © 1999-2021 by the D Language Foundation