Newbie: Error parsing csv file with very long lines

Apr 23, 2016

salvari

Apr 23, 2016

rikki cattermole

Apr 23, 2016

Apr 23, 2016

Apr 23, 2016

Apr 23, 2016

Apr 23, 2016

Apr 23, 2016

Hello all! I'm trying to read a csv file (';' as separator) with very long lines. It seems to be really simple, I read the columns name with no problem. But as soon as the program parses the first line of data, the array containing the columns names seems to be overwrited. I'm using dmd: DMD64 D Compiler v2.071.0 My code: import std.stdio; import std.algorithm; import std.array; char[][] columns; void main() { LINE:foreach(line; stdin.byLine()){ if(line.startsWith("Interfaz")){ writeln("IN HERE"); columns = line.split(";"); writeln(columns); // Everything seems to be ok continue; } else{ auto linedata = line.split(";"); writefln("My line: %s", line); // Fine. writefln("LineData: %s", linedata); // Fine. Line data is ok writefln("Columns: %s", columns); // Wrong!!! columsn array // contains garbage data // from linedata } } }

April 23, 2016

Re: Newbie: Error parsing csv file with very long lines

Posted by rikki cattermole
in reply to salvari

Permalink

rikki cattermole

Posted in reply to salvari

Permalink

On 23/04/2016 10:40 PM, salvari wrote:
> Hello all!
>
> I'm trying to read a csv file (';' as separator) with very long lines.
>
> It seems to be really simple, I read the columns name with no problem.
> But as soon as the program parses the first line of data, the array
> containing the columns names seems to be overwrited.
>
> I'm using dmd: DMD64 D Compiler v2.071.0
>
> My code:
>
> import std.stdio;
> import std.algorithm;
> import std.array;
>
> char[][] columns;
>
>
> void main() {
>   LINE:foreach(line; stdin.byLine()){
>      if(line.startsWith("Interfaz")){
>        writeln("IN HERE");
>        columns = line.split(";");
>        writeln(columns);               // Everything seems to be ok
>        continue;
>      } else{
>        auto linedata = line.split(";");
>        writefln("My line: %s", line);        // Fine.
>        writefln("LineData: %s", linedata);   // Fine. Line data is ok
>        writefln("Columns: %s", columns);     // Wrong!!! columsn array
>                                              // contains garbage data
>                                              // from linedata
>      }
>    }
> }

Its probably using a buffer.
columns = line.dup.split(";");
Should fix it.

On Saturday, 23 April 2016 at 10:57:04 UTC, salvari wrote: > Fixed!!! > > Thanks a lot. :-) > > > But I have to think about this. I don't understand the failure. stdin.byLine() reuses its buffer. so the old arrays in columns point to the data in byLine's buffer and they get overwritten by subsequent calls. Also if you're trying to parse csv check out std.csv from the docs string str = "Hello;65;63.63\nWorld;123;3673.562"; struct Layout { string name; int value; double other; } auto records = csvReader!Layout(str,';'); foreach(record; records) { writeln(record.name); writeln(record.value); writeln(record.other); }

On 23/04/2016 10:57 PM, salvari wrote: > Fixed!!! > > Thanks a lot. :-) > > > But I have to think about this. I don't understand the failure. .dup duplicates memory. What this means is, it allocates a new block of memory and copies the values across. What byLine does is, read up to \n and copies it into a buffer of memory. Then you get access to said buffer aka line. So it reuses the memory containing said line, meaning no allocations beyond the first and growth of it.

On Saturday, 23 April 2016 at 11:18:08 UTC, rikki cattermole wrote: > On 23/04/2016 10:57 PM, salvari wrote: >> Fixed!!! >> >> Thanks a lot. :-) >> >> >> But I have to think about this. I don't understand the failure. > > .dup duplicates memory. > What this means is, it allocates a new block of memory and copies the values across. > > What byLine does is, read up to \n and copies it into a buffer of memory. > Then you get access to said buffer aka line. > So it reuses the memory containing said line, meaning no allocations beyond the first and growth of it. Now I understand. Slices are still biting me every now and then.

On Saturday, 23 April 2016 at 11:13:19 UTC, Nicholas Wilson wrote: > On Saturday, 23 April 2016 at 10:57:04 UTC, salvari wrote: >> Fixed!!! >> >> Thanks a lot. :-) >> >> >> But I have to think about this. I don't understand the failure. > > stdin.byLine() reuses its buffer. so the old arrays in columns point to the data in byLine's buffer and they get overwritten by subsequent calls. > > Also if you're trying to parse csv check out std.csv > > from the docs > > string str = "Hello;65;63.63\nWorld;123;3673.562"; > struct Layout > { > string name; > int value; > double other; > } > > auto records = csvReader!Layout(str,';'); > > foreach(record; records) > { > writeln(record.name); > writeln(record.value); > writeln(record.other); > } Thanks for your clue on std.csv! I think I will use it a lot. I totally missed it.

On Saturday, 23 April 2016 at 10:40:13 UTC, salvari wrote: > It seems to be really simple, I read the columns name with no problem. But as soon as the program parses the first line of data, the array containing the columns names seems to be overwrited. Another possibility yet not mentioned is to change foreach(line; stdin.byLine()) into foreach(line; stdin.byLineCopy()) to make the older lines' contents available after you read the next line.

Forums