Thread overview
Schroedinger's Ranges
Jun 03, 2021
vacuum_tube
Jun 03, 2021
Paul Backus
Jun 03, 2021
kdevel
Jun 03, 2021
Mike Parker
Jun 03, 2021
Mike Parker
Jun 03, 2021
kdevel
Jun 03, 2021
WebFreak001
June 03, 2021

I've been trying to make a struct for CSV parsing and manipulating. The code was as follows:

struct CSVData(bool HeaderFromFirstLine)
{
	char[][] header = [];
	char[][][] rest = [];

	this(string filename)
	{
		auto tmp = File(filename).byLine();
		
		if(HeaderFromFirstLine)
		{
			this.header = CSVData.parseCSV(tmp.front()).array;
			tmp.popFront();
		}

		this.rest = tmp.map!(e => parseCSV(e)).array;
	}

	static char[][] parseCSV(char[] str)
	{
		char[][] tmp = split(str, ",");
		return tmp;
	}
	
        void print()
	{
		writeln(this.header);
		foreach(e; this.text)
			writeln(e);
	}
}

void main()
{
	auto data = CSVData!true("testdata");
	data.print();
}

The "testdata" text file looked like this:

10,15,Hello world
stuff,,more stuff

And the output from running it looked like this:

["st", "ff", ",more stuff"]
["stuff", "", "more stuff"]

As you can see, the header field is not printing correctly. In an attempt to debug, I added several writelns to the constructor:

this(string filename)
{
	auto tmp = File(filename).byLine();
	
	if(HeaderFromFirstLine)
	{
		this.header = CSVData.parseCSV(tmp.front()).array;
		tmp.popFront();
		writeln(this.header);
	}

	this.text = tmp.map!(e => parseCSV(e)).array;
	writeln(this.header);
}

This produced the following output:

["10", "15", "Hello world"]
["st", "ff", ",more stuff"]
["st", "ff", ",more stuff"]
["stuff", "", "more stuff"]

I then tried commenting out the offending line (the one with the map) and got the expected result:

["10", "15", "Hello world"]
["10", "15", "Hello world"]
["10", "15", "Hello world"]

Finally, I replaced the offending line and called a different function on tmp:

writeln(tmp.front);

And got the following result:

["10", "15", "Hello world"]
stuff,,more stuff
["st", "ff", ",more stuff"]
["st", "ff", ",more stuff"]

So it appears that observing or modifying tmp somehow modifies header, despite not interacting with it in any visible way.

What is the reason for this? I'm guessing it either has to do with the internals of ranges, or that the arrays were messing up somehow, but I'm not sure.

Thanks in advance!

June 03, 2021

On Thursday, 3 June 2021 at 00:39:04 UTC, vacuum_tube wrote:

>

I've been trying to make a struct for CSV parsing and manipulating. The code was as follows:

struct CSVData(bool HeaderFromFirstLine)
{
	char[][] header = [];
	char[][][] rest = [];

	this(string filename)
	{
		auto tmp = File(filename).byLine();
		
		if(HeaderFromFirstLine)
		{
			this.header = CSVData.parseCSV(tmp.front()).array;
			tmp.popFront();
		}

		this.rest = tmp.map!(e => parseCSV(e)).array;
	}

[...]

>

The "testdata" text file looked like this:

10,15,Hello world
stuff,,more stuff

And the output from running it looked like this:

["st", "ff", ",more stuff"]
["stuff", "", "more stuff"]

File.byLine overwrites the previous line's data every time it reads a new line. If you want to store each line's data for later use, you need to use byLineCopy instead.

June 03, 2021

On Thursday, 3 June 2021 at 00:39:04 UTC, vacuum_tube wrote:

>

I've been trying to make a struct for CSV parsing and manipulating. The code was as follows:

struct CSVData(bool HeaderFromFirstLine)
{
	char[][] header = [];
	char[][][] rest = [];

[...]

additionally to the other comment, you probably want to use string (immutable(char)[]) instead of char[] here, as you want your data to stay the same and not be modified after assignment.

If you replace them with string and have your code be @safe, the compiler will tell you where you try to assign your char[] data that may be modified and in those cases you would want to call .idup to duplicate the data to make it persistent.

June 03, 2021

On Thursday, 3 June 2021 at 01:22:14 UTC, Paul Backus wrote:

> >
 auto tmp = File(filename).byLine();
>

File.byLine overwrites the previous line's data every time it reads a new line. If you want to store each line's data for later use, you need to use [byLineCopy][1] instead.

a) What is the rationale behind not making byLineCopy the default?

b) Does not compile:

csv.d(17): Error: function csv.CSVData!true.CSVData.parseCSV(char[] str) is not callable using argument types (string)
csv.d(17): cannot pass argument tmp.front() of type string to parameter char[] str
csv.d(21): Error: function csv.CSVData!true.CSVData.parseCSV(char[] str) is not callable using argument types (string)
csv.d(21): cannot pass argument e of type string to parameter char[] str
[...]/../../src/phobos/std/algorithm/iteration.d(525): instantiated from here: MapResult!(__lambda2, ByLineCopy!(immutable(char), char))
csv.d(21): instantiated from here: map!(ByLineCopy!(immutable(char), char))
csv.d(40): instantiated from here: CSVData!true

c) Reminds me of the necessity to add dups here and there. And reminds me of "helping the compiler" [1]?

[1] https://wiki.c2.com/?HelpingTheCompilerIsEvil

June 03, 2021

On Thursday, 3 June 2021 at 10:18:25 UTC, kdevel wrote:

>

a) What is the rationale behind not making byLineCopy the default?

byLine was the original implementation. byLineCopy was added later after the need for it became apparent.

June 03, 2021

On Thursday, 3 June 2021 at 10:30:24 UTC, Mike Parker wrote:

>

On Thursday, 3 June 2021 at 10:18:25 UTC, kdevel wrote:

>

a) What is the rationale behind not making byLineCopy the default?

byLine was the original implementation. byLineCopy was added later after the need for it became apparent.

See:

https://forum.dlang.org/post/lg4l7s$11rl$1@digitalmars.com

June 03, 2021
> > >

a) What is the rationale behind not making byLineCopy the default?

byLine was the original implementation. byLineCopy was added later after the need for it became apparent.

See:

https://forum.dlang.org/post/lg4l7s$11rl$1@digitalmars.com

THX. BTW byLineCopy defaults to immutable char. That's why one has to use

auto tmp = File(filename).byLineCopy!(char, char);

or

auto tmp = File(filename).byLine.map!dup;
June 03, 2021
On 6/3/21 9:00 AM, kdevel wrote:
>>>> a) What is the rationale behind not making byLineCopy the default?
>>>
>>> byLine was the original implementation. byLineCopy was added later after the need for it became apparent.
>>
>> See:
>>
>> https://forum.dlang.org/post/lg4l7s$11rl$1@digitalmars.com
> 
> THX. BTW byLineCopy defaults to immutable char. That's why one has to use
> 
>      auto tmp = File(filename).byLineCopy!(char, char);
> 
> or
> 
>      auto tmp = File(filename).byLine.map!dup;
> 
> 

I was going to suggest use byLineCopy!(char, char), because the second option with map makes a copy every time you call front.

And, my goodness, that is backwards for the template parameters. The terminator type should be determined by IFTI, it should never have been the first template parameter!

-Steve