Thread overview
[phobos] CSVRange: RFC
Jan 29, 2011
David Simcha
Jan 30, 2011
Jesse Phillips
Jan 30, 2011
David Simcha
Jan 30, 2011
Jesse Phillips
Jan 30, 2011
David Simcha
Jan 30, 2011
Jesse Phillips
Jan 31, 2011
Jesse Phillips
Feb 01, 2011
Jesse Phillips
January 29, 2011
I've written a small module for reading CSV and similar delimited files.  I've been meaning to do this for a while.  Basically, it allows reading a CSV file with O(1) memory usage (i.e. it can be parsed one character at a time) to a range of ranges of cells.  Quotes, escaped quotes, etc. are handled properly.  I tested it on a nasty CSV file produced by Affymetrix, and it works rather well.

CSVRange also allows for iteration over rows as a range of structs.  For example, let's say you had a file:

Height,Weight,Shoe Size
6.5,210,13
...

You could read this file lazily into a range of structs with something like:

struct Person
{
     float height;
     uint weight;
     uint shoeSize;
}

auto csvRange = csvFile(someCharacterRange, ',');
auto structs = csvStructRange(csvRange, ["Height", "Weight", "Shoe Size"]);

// Iterate lazily through the rows.
foreach(s; structs) {
     // Do stuff.
}

Note that this still works even if you have tons of columns you don't care about in the file.

Code:

http://dsource.org/projects/scrapple/browser/trunk/csvRange/csvRange.d

Docs:

http://cis.jhu.edu/~dsimcha/csvRange.html


January 29, 2011
That is about the same as what I have, though I was attempting to handle custom delimiters for fields, records, and quote.

https://github.com/he-the-great/JPDLibs/tree/csv

But about your code. I was getting a Range Violation with your unittests active. Also you don't handle a quoted empty field correctly. Otherwise you pass the unittest I ported from mine:

https://gist.github.com/802502

On Sat, Jan 29, 2011 at 3:44 PM, David Simcha <dsimcha at gmail.com> wrote:
> I've written a small module for reading CSV and similar delimited files. ?I've been meaning to do this for a while. ?Basically, it allows reading a CSV file with O(1) memory usage (i.e. it can be parsed one character at a time) to a range of ranges of cells. ?Quotes, escaped quotes, etc. are handled properly. ?I tested it on a nasty CSV file produced by Affymetrix, and it works rather well.
>
> CSVRange also allows for iteration over rows as a range of structs. ?For example, let's say you had a file:
>
> Height,Weight,Shoe Size
> 6.5,210,13
> ...
>
> You could read this file lazily into a range of structs with something like:
>
> struct Person
> {
> ? ?float height;
> ? ?uint weight;
> ? ?uint shoeSize;
> }
>
> auto csvRange = csvFile(someCharacterRange, ',');
> auto structs = csvStructRange(csvRange, ["Height", "Weight", "Shoe Size"]);
>
> // Iterate lazily through the rows.
> foreach(s; structs) {
> ? ?// Do stuff.
> }
>
> Note that this still works even if you have tons of columns you don't care about in the file.
>
> Code:
>
> http://dsource.org/projects/scrapple/browser/trunk/csvRange/csvRange.d
>
> Docs:
>
> http://cis.jhu.edu/~dsimcha/csvRange.html
>
>
> _______________________________________________
> phobos mailing list
> phobos at puremagic.com
> http://lists.puremagic.com/mailman/listinfo/phobos
>



-- 
Liberty means responsibility. That is why most men dread it. ? - George Bernard Shaw
January 30, 2011
Jesse,

I was unaware of your efforts.  At first glance, your lib looks pretty good.  I definitely think Phobos needs a real CSV parser, as I seem to write ad-hoc ones all the time.  Since your module mostly looks a little further along and better engineered than mine (mine was really just a prototype that I spent about half a day on), maybe we should focus on getting yours up to Phobos quality.  The one major feature yours is missing, though, is the ability for csvText() to extract a subset of the available columns by header.  I also like the idea of doing things by column header instead of hard coding the column order because it's less brittle if the layout changes.

--David Simcha

On 1/29/2011 10:47 PM, Jesse Phillips wrote:
> That is about the same as what I have, though I was attempting to handle custom delimiters for fields, records, and quote.
>
> https://github.com/he-the-great/JPDLibs/tree/csv
>
> But about your code. I was getting a Range Violation with your unittests active. Also you don't handle a quoted empty field correctly. Otherwise you pass the unittest I ported from mine:
>
> https://gist.github.com/802502
>
> On Sat, Jan 29, 2011 at 3:44 PM, David Simcha<dsimcha at gmail.com>  wrote:
>> I've written a small module for reading CSV and similar delimited files.
>>   I've been meaning to do this for a while.  Basically, it allows reading a
>> CSV file with O(1) memory usage (i.e. it can be parsed one character at a
>> time) to a range of ranges of cells.  Quotes, escaped quotes, etc. are
>> handled properly.  I tested it on a nasty CSV file produced by Affymetrix,
>> and it works rather well.
>>
>> CSVRange also allows for iteration over rows as a range of structs.  For example, let's say you had a file:
>>
>> Height,Weight,Shoe Size
>> 6.5,210,13
>> ...
>>
>> You could read this file lazily into a range of structs with something like:
>>
>> struct Person
>> {
>>     float height;
>>     uint weight;
>>     uint shoeSize;
>> }
>>
>> auto csvRange = csvFile(someCharacterRange, ',');
>> auto structs = csvStructRange(csvRange, ["Height", "Weight", "Shoe Size"]);
>>
>> // Iterate lazily through the rows.
>> foreach(s; structs) {
>>     // Do stuff.
>> }
>>
>> Note that this still works even if you have tons of columns you don't care about in the file.
>>
>> Code:
>>
>> http://dsource.org/projects/scrapple/browser/trunk/csvRange/csvRange.d
>>
>> Docs:
>>
>> http://cis.jhu.edu/~dsimcha/csvRange.html
>>
>>
>> _______________________________________________
>> phobos mailing list
>> phobos at puremagic.com
>> http://lists.puremagic.com/mailman/listinfo/phobos
>>
>
>

January 30, 2011
Looks like a good candidate for std.format, but I think it's a ways from getting there.

Code review:

#50 RefRange is really ForceInputRange. What do you need it for? It's unusual to want to reduce the capabilities of a range.

#78 isCharRange is incorrect. Correct version:

enum bool isCharRange = isInputRange!R && isSomeChar!(ElementType!R);

#100 Why not struct?

#102 private Appender!(char[]) _front;

#305 No comment?

#306 This is CsvRange not a CsvFile as it builds on another range (that may or may not be backed up by a file)

#386 No comment?

#387 The name is confusing - it's a class with struct in its name.

#582 We also need a way to read CSV files into string arrays in case the user just wants to do the parsing and decide on typing later. Seemingly the current design forces choice of type before parsing.

Documentation review:

* No spellchecking (e.g. 'teh')

* Malformatted Wikipedia URL

* No need for copying the license, a URL is sufficient.

* O(1) is a bit inaccurate - memory consumed is proportional to that of one element. What you might have meant is that it does not depend on the number of lines in a file or on the number of CSV elements in a line.

* A few artifacts have no examples.

* The example should compile. getCharRange() does not exist. FWIW your design should work with byLine().

* It's unclear what colHeaders do from the code and the documentation.


Andrei

On 01/29/2011 05:44 PM, David Simcha wrote:
> I've written a small module for reading CSV and similar delimited files. I've been meaning to do this for a while. Basically, it allows reading a CSV file with O(1) memory usage (i.e. it can be parsed one character at a time) to a range of ranges of cells. Quotes, escaped quotes, etc. are handled properly. I tested it on a nasty CSV file produced by Affymetrix, and it works rather well.
>
> CSVRange also allows for iteration over rows as a range of structs. For example, let's say you had a file:
>
> Height,Weight,Shoe Size
> 6.5,210,13
> ...
>
> You could read this file lazily into a range of structs with something like:
>
> struct Person
> {
> float height;
> uint weight;
> uint shoeSize;
> }
>
> auto csvRange = csvFile(someCharacterRange, ',');
> auto structs = csvStructRange(csvRange, ["Height", "Weight", "Shoe Size"]);
>
> // Iterate lazily through the rows.
> foreach(s; structs) {
> // Do stuff.
> }
>
> Note that this still works even if you have tons of columns you don't care about in the file.
>
> Code:
>
> http://dsource.org/projects/scrapple/browser/trunk/csvRange/csvRange.d
>
> Docs:
>
> http://cis.jhu.edu/~dsimcha/csvRange.html
>
>
> _______________________________________________
> phobos mailing list
> phobos at puremagic.com
> http://lists.puremagic.com/mailman/listinfo/phobos
January 30, 2011
Oh, I take no offense as I didn't make any effort to announce my work, as their were still things to add and I didn't want my public interface to be final. I was quite surprised that it up as an answer on SO two weeks after creating it[1].

You are correct in that I have completely ignored the header. It was the next thing I wanted work on, but then I realized that I wasn't handling \r\n and since my separators aren't hard coded I needed to make it possible to pass multiple separators for each and handle a range of dchar instead of just one. The latter resulted in a nice restructuring, but the former is giving me trouble.

I have a work in progress[2] which makes use of the new countUntil, with a small modification[3]. And I believe that if startsWith were updated to take a range of ranges, then this implementation should be able multiple separators much easier[4].

All that said, I wouldn't really say it is ready for review. I hope the two BugZilla entries are seen as good additions so that I don't need to deal with them.

1. http://stackoverflow.com/questions/4457481/extracting-values-from-comma-separated-lists/4457515#4457515

2. https://github.com/he-the-great/JPDLibs/tree/separator

3. http://d.puremagic.com/issues/show_bug.cgi?id=5507

4. http://d.puremagic.com/issues/show_bug.cgi?id=5508

On Sat, Jan 29, 2011 at 9:24 PM, David Simcha <dsimcha at gmail.com> wrote:
> Jesse,
>
> I was unaware of your efforts. ?At first glance, your lib looks pretty good. ?I definitely think Phobos needs a real CSV parser, as I seem to write ad-hoc ones all the time. ?Since your module mostly looks a little further along and better engineered than mine (mine was really just a prototype that I spent about half a day on), maybe we should focus on getting yours up to Phobos quality. ?The one major feature yours is missing, though, is the ability for csvText() to extract a subset of the available columns by header. ?I also like the idea of doing things by column header instead of hard coding the column order because it's less brittle if the layout changes.
>
> --David Simcha
January 30, 2011
Thanks for the review.  As I said before, this was a prototype.  I'm aware it' a ways from being Phobos quality, but I posted here anyway because I wanted to gauge whether there was sufficient interest and whether the high-level design was good enough for it to be worth cleaning up the details.  Responses to specific points below:

On 1/30/2011 2:11 PM, Andrei Alexandrescu wrote:
> Looks like a good candidate for std.format, but I think it's a ways from getting there.
>
> Code review:
>
> #50 RefRange is really ForceInputRange. What do you need it for? It's unusual to want to reduce the capabilities of a range.

Basically, when CsvLine makes changes to the range, the changes need to be visible from CsvFile, necessitating reference semantics. Furthermore, the assumption is that this lib will mostly be used with input ranges anyhow, so I don't see how that's much of a restriction in practice.
>
> #78 isCharRange is incorrect. Correct version:
>
> enum bool isCharRange = isInputRange!R && isSomeChar!(ElementType!R);

Good point.

>
> #100 Why not struct?

Again, because it needs reference semantics so that changes made to it (popping, etc.) by the user of the library are visible to the CsvFile struct.

>
> #102 private Appender!(char[]) _front;

I thought we were thinking of making Appender a nested struct defined in a function.
>
> #305 No comment?

It's type returned by csvFile().  The csvFile() function pretty much tells you what you need to know about how to use a CsvFile object.
>
> #306 This is CsvRange not a CsvFile as it builds on another range (that may or may not be backed up by a file)

Good point.

>
> #386 No comment?

Again, it's supposed to be instantiated from the csvStructRange() function, so this is where the documentation is.
>
> #387 The name is confusing - it's a class with struct in its name.

Agreed.  I was hoping someone would come up with a better name.  Since I'm
>
> #582 We also need a way to read CSV files into string arrays in case the user just wants to do the parsing and decide on typing later. Seemingly the current design forces choice of type before parsing.

Trivial with the rest of the lib, assuming you have a character range called charRange, though it might deserve a convenience function.

string[][] res;
auto csvIter = csvFile(charRange, ',');
foreach(row; csvIter) {
     res ~= array(map!"a.idup"(row));
}

>
> Documentation review:
>
> * No spellchecking (e.g. 'teh')
>
> * Malformatted Wikipedia URL
>
> * No need for copying the license, a URL is sufficient.
>

Good points.

> * O(1) is a bit inaccurate - memory consumed is proportional to that of one element. What you might have meant is that it does not depend on the number of lines in a file or on the number of CSV elements in a line.

Ok, very good point.  That's exactly what I meant.
>
> * A few artifacts have no examples.
>
> * The example should compile. getCharRange() does not exist. FWIW your design should work with byLine().

I would have liked to make it work with byLine(), but the problem is that a CSV file can have a newline character inside quotation marks, in which case this does not mean "start a new row".  Therefore, the proper way to parse a CSV file is character-by-character, not line by line. Ideally, std.stdio.File should have a .byChar range.  As far as I can tell, it doesn't.  Should I add one?  (I guess the proper way to do it would be to use byChunk() to buffer things and then pop off one character at a time from this.)

>
> * It's unclear what colHeaders do from the code and the documentation.
I'm not sure how to make it much more clear.  colHeaders selects the columns to be read from the CSV file by name.  It's assumed that the first row is a "header" and gives the name of each column.  The first entry of colHeaders should be what you want read into the first field of the struct, etc.

>
>
> Andrei
>
> On 01/29/2011 05:44 PM, David Simcha wrote:
>> I've written a small module for reading CSV and similar delimited files. I've been meaning to do this for a while. Basically, it allows reading a CSV file with O(1) memory usage (i.e. it can be parsed one character at a time) to a range of ranges of cells. Quotes, escaped quotes, etc. are handled properly. I tested it on a nasty CSV file produced by Affymetrix, and it works rather well.
>>
>> CSVRange also allows for iteration over rows as a range of structs. For example, let's say you had a file:
>>
>> Height,Weight,Shoe Size
>> 6.5,210,13
>> ...
>>
>> You could read this file lazily into a range of structs with something like:
>>
>> struct Person
>> {
>> float height;
>> uint weight;
>> uint shoeSize;
>> }
>>
>> auto csvRange = csvFile(someCharacterRange, ',');
>> auto structs = csvStructRange(csvRange, ["Height", "Weight", "Shoe
>> Size"]);
>>
>> // Iterate lazily through the rows.
>> foreach(s; structs) {
>> // Do stuff.
>> }
>>
>> Note that this still works even if you have tons of columns you don't care about in the file.
>>
>> Code:
>>
>> http://dsource.org/projects/scrapple/browser/trunk/csvRange/csvRange.d
>>
>> Docs:
>>
>> http://cis.jhu.edu/~dsimcha/csvRange.html
>>
>>
>> _______________________________________________
>> phobos mailing list
>> phobos at puremagic.com
>> http://lists.puremagic.com/mailman/listinfo/phobos
> _______________________________________________
> phobos mailing list
> phobos at puremagic.com
> http://lists.puremagic.com/mailman/listinfo/phobos
>

January 30, 2011
In some way your comments apply to my implementation, I'll answer assuming the latest design I have.

https://github.com/he-the-great/JPDLibs/tree/separator

And as David mentioned, it probably would be best to go with mine, but it too is not ready for inclusion.

On Sun, Jan 30, 2011 at 11:11 AM, Andrei Alexandrescu <andrei at erdani.com> wrote:
> Code review:
>
> #50 RefRange is really ForceInputRange. What do you need it for? It's unusual to want to reduce the capabilities of a range.
>
> #78 isCharRange is incorrect. Correct version:
>
> enum bool isCharRange = isInputRange!R && isSomeChar!(ElementType!R);
>
> #100 Why not struct?

Answering for all of the above as mine doesn't use any of these.

I make use of two ranges and a token function. Most of the work is done by csvNextToken and has the most unittests against it. I then build a range which iterates each token until the end of the record, this is struct Record. On top of that I have a range which iterates over each Record, this is RecordList.

I avoid the need to create reference semantics by having both Record and csvNextToken take a ref parameter of the Range.

> #102 private Appender!(char[]) _front;

As my result may not be an array, I use contactination. I do not have a constraint on this yet...

> #306 This is CsvRange not a CsvFile as it builds on another range (that may or may not be backed up by a file)

As said, mine are named Record and RecordList, and I believe these are good names as we aren't dealing with lines or files. The only concern I have is that Record has many means, though I believe I am using it correctly for CSV row data.

CsvRecord? DataRecord?

> #387 The name is confusing - it's a class with struct in its name.

I believe this is an implementation detail. I use csvText!CastTo(...) where CastTo can be a struct. It will be interesting to see what I do when I get heading support put in.

> #582 We also need a way to read CSV files into string arrays in case the user just wants to do the parsing and decide on typing later. Seemingly the current design forces choice of type before parsing.

Mine defaults to a slice of the original range, with the exception of quoted entries. A helper function to return a Range[][] could be made. But I think just leaving standard Range semantics is best.

> Documentation review:
>
> * O(1) is a bit inaccurate - memory consumed is proportional to that of one element. What you might have meant is that it does not depend on the number of lines in a file or on the number of CSV elements in a line.

I believe my new method has a better footprint then David's, but it no longer operates on an InputRange, but a range with slicing and appending. If there is no quoted data it just returns the slice of the needed data, otherwise it appends slices of the data to a new Range.

It may be useful to have an implementation which doesn't make use of these.

> * A few artifacts have no examples.

I haven't done much for examples either.

> * The example should compile. getCharRange() does not exist. FWIW your
> design should work with byLine().

As David said, you really can't do this. You can make it work, but it is more trouble then it is worth. I built my design off of the ideas found in Splitter.

Feedback and suggestion are welcome, but it really hasn't seen the comment or code standard which it needs for easy understanding.
January 31, 2011
Without having studied the code closely, I could say that asking for an input range with slicing is quite a tall order that virtually restricts you to random-access ranges.

An input range only allows you to move one character forward and never save your position or go back. A range with slicing in this context means that we can confidently calculate how much of the range we need to take, and that automatically requires the range to be able to go forward and then restart from a previous position.

Regarding overall design and user-level API, it may be reasonable to assume that:

1. CSV readers are usually often for reading an entire file through the end, so optimizations that are mostly applicable to reading one single line are unnecessary. At the same time, optimizations for repeated use of empty/front/popFront are likely to be beneficial.

2. An entire line's representation as strings must fit in memory as a requirement.

As such, David's implementation that works on a character stream is the most general and the theoretical perfect one because one character of lookahead is all CSV needs. At the same time, if an implementation assuming (1) and (2) above has considerable advantages (speed, convenience) then it might trump the theoretically perfect one.

David, LockingTextReader in std.stdio implements character level input straight from a file. It does so very slowly by means of getc/ungetc, which is the only portable way. I know how to make it faster on Linux and OSX and Walter knows how to make it faster on Windows, but we never got around to it.


Andrei

On 01/30/2011 02:21 PM, Jesse Phillips wrote:
> In some way your comments apply to my implementation, I'll answer assuming the latest design I have.
>
> https://github.com/he-the-great/JPDLibs/tree/separator
>
> And as David mentioned, it probably would be best to go with mine, but it too is not ready for inclusion.
>
> On Sun, Jan 30, 2011 at 11:11 AM, Andrei Alexandrescu<andrei at erdani.com>  wrote:
>> Code review:
>>
>> #50 RefRange is really ForceInputRange. What do you need it for? It's unusual to want to reduce the capabilities of a range.
>>
>> #78 isCharRange is incorrect. Correct version:
>>
>> enum bool isCharRange = isInputRange!R&&  isSomeChar!(ElementType!R);
>>
>> #100 Why not struct?
>
> Answering for all of the above as mine doesn't use any of these.
>
> I make use of two ranges and a token function. Most of the work is done by csvNextToken and has the most unittests against it. I then build a range which iterates each token until the end of the record, this is struct Record. On top of that I have a range which iterates over each Record, this is RecordList.
>
> I avoid the need to create reference semantics by having both Record and csvNextToken take a ref parameter of the Range.
>
>> #102 private Appender!(char[]) _front;
>
> As my result may not be an array, I use contactination. I do not have a constraint on this yet...
>
>> #306 This is CsvRange not a CsvFile as it builds on another range (that may or may not be backed up by a file)
>
> As said, mine are named Record and RecordList, and I believe these are good names as we aren't dealing with lines or files. The only concern I have is that Record has many means, though I believe I am using it correctly for CSV row data.
>
> CsvRecord? DataRecord?
>
>> #387 The name is confusing - it's a class with struct in its name.
>
> I believe this is an implementation detail. I use csvText!CastTo(...) where CastTo can be a struct. It will be interesting to see what I do when I get heading support put in.
>
>> #582 We also need a way to read CSV files into string arrays in case the user just wants to do the parsing and decide on typing later. Seemingly the current design forces choice of type before parsing.
>
> Mine defaults to a slice of the original range, with the exception of quoted entries. A helper function to return a Range[][] could be made. But I think just leaving standard Range semantics is best.
>
>> Documentation review:
>>
>> * O(1) is a bit inaccurate - memory consumed is proportional to that of one element. What you might have meant is that it does not depend on the number of lines in a file or on the number of CSV elements in a line.
>
> I believe my new method has a better footprint then David's, but it no longer operates on an InputRange, but a range with slicing and appending. If there is no quoted data it just returns the slice of the needed data, otherwise it appends slices of the data to a new Range.
>
> It may be useful to have an implementation which doesn't make use of these.
>
>> * A few artifacts have no examples.
>
> I haven't done much for examples either.
>
>> * The example should compile. getCharRange() does not exist. FWIW your
>> design should work with byLine().
>
> As David said, you really can't do this. You can make it work, but it is more trouble then it is worth. I built my design off of the ideas found in Splitter.
>
> Feedback and suggestion are welcome, but it really hasn't seen the
> comment or code standard which it needs for easy understanding.
> _______________________________________________
> phobos mailing list
> phobos at puremagic.com
> http://lists.puremagic.com/mailman/listinfo/phobos
January 31, 2011
On Sun, Jan 30, 2011 at 10:52 PM, Andrei Alexandrescu <andrei at erdani.com> wrote:
> Without having studied the code closely, I could say that asking for an input range with slicing is quite a tall order that virtually restricts you to random-access ranges.

I agree, the two benefits I saw was returning the original content for probably most data, and easier to implement for separators which are more then one character.

> An input range only allows you to move one character forward and never save your position or go back. A range with slicing in this context means that we can confidently calculate how much of the range we need to take, and that automatically requires the range to be able to go forward and then restart from a previous position.

True, ForwardRange with slicing and appending.

> Regarding overall design and user-level API, it may be reasonable to assume that:
>
> 1. CSV readers are usually often for reading an entire file through the end, so optimizations that are mostly applicable to reading one single line are unnecessary. At the same time, optimizations for repeated use of empty/front/popFront are likely to be beneficial.

I could see streaming an infinite amount of data too, though CSV is probably not the way to do that.

I think optimizing for repeated use of empty/front will not depend on the approach taken.

> 2. An entire line's representation as strings must fit in memory as a requirement.

I don't think either implementation requires the entire record to be in memory in string form. Both will operate on each field value and stop processing before the entire record is read.

> As such, David's implementation that works on a character stream is the most general and the theoretical perfect one because one character of lookahead is all CSV needs. At the same time, if an implementation assuming (1) and (2) above has considerable advantages (speed, convenience) then it might trump the theoretically perfect one.

I think if we find benefits to my second approach, I think it would be worth having an implementation for both. The InputRange version would be restricted to just CSV text which doesn't use custom separators.

I don't have my heart set on which one should be placed in Phobos, just want to make it clear why I changed directions, especially since I think it will be most common to just read in an entire anyway. But agree it is very restrictive if an implementation to handle InputRange isn't available.
January 31, 2011
Ok, so to prevent theoretical debate on performance I decided to get a CSV file large enough to do testing. I didn't come across any with a quick Google search, so I wrote a program to generate one[1], just needs a list of words. Though the file is hard coded for english.words[2]

1. https://gist.github.com/805392
2. ftp://nic.funet.fi/pub/unix/security/dictionaries/DEC-collection/

Anyway looping over a 25MB file took about 10 sec while David's was only about 5 sec. With some digging around I found that a Record did not modify the RecordList data, storing a ref parameter is of no use. A quick attempt to use pointers failed. To check my hypotheses I modified[3] the code to not use Record when given a struct. The result was about half a second faster than David's. Interestingly, optimizations will slow mine down by one second and do nothing to David's.

I don't really know the best way to test memory footprint.

3. https://github.com/he-the-great/JPDLibs/tree/csvoptimize

Also my implementation accepts a heading now.