October 28, 2011
Piotr Szturmaj Wrote:

> dsimcha wrote:
> > Formal review of Jesse Phillips's CSV parser module begins today and runs through November . Following that, a vote will take place for one week. Please post any comments about this library in this thread.
> >
> > Docs: http://nascent.freeshell.org/programming/D/doc/phobos/std_csv.html
> >
> > Code:
> > https://github.com/he-the-great/phobos/tree/csv
> 
> Could csvText!Struct(...) be generalized to also support Tuples and (eventually) classes?

I restricted to struct since a class had to be new'd. I and then realized most likely it won't be the final destination. It is better just letting the user move the items over themselves.

Tuples should just work since they are just struct. Will double check.
October 28, 2011
Ary Manzana Wrote:

> I like the idea of reading a CSV into a struct, or treating all fields as ints.
> 
> But, in my experience, CSVs are never perfect... because humans fill them, and humans aren't perfect or don't know "Oh, of course I must fill only numbers in this column, otherwise programs will explode".
> 
> So although it's a nice functionality, I think just returning everything as strings is more than enough.

If you are dealing with incorrect data you must deal with it. If you expect everything to be an int, and it isn't what are you going to do? I should check how easy it is to recover from an error.
October 28, 2011
Vladimir Panteleev Wrote:

> On Fri, 28 Oct 2011 16:18:27 +0300, dsimcha <dsimcha@yahoo.com> wrote:
> 
> > Docs: http://nascent.freeshell.org/programming/D/doc/phobos/std_csv.html
> 
> Documentation should have examples of basic usage at the top. It should be possible to figure out how to use a CSV parser within 10 seconds of looking at the docs.

Sounds good. What kind of example should be first? Maybe something that just gives you an itterable:

auto records = csvText(text);

> It looks like this is only a CSV parser, but not a CSV formatter? That's pretty weird. It'd be like std.xml or std.json only having reading functionality.

I have done quite a bit of CSV output and none of them have been the same. The only similarity is surrounding with quotes and escaping quotes.

"\"" ~ text.replace("\"", "\"\"") ~ "\""

The problem is that how your data is stored and where it needs to go is different for each app. I don't see a reason to provide a output function where one must convert their data into a range of struct and specify an output range, when that conversion is the hard part.

I would consider a function that does escaping/quoting only if the element needs it. But such a function couldn't work on an Input Range.
October 28, 2011
Forgot to answer one.

dsimcha Wrote:

> Why should the order of the heading provided to csvText matter?  csvText should simply rearrange its results.

Because I do not store anymore information then requested. I must keep pumping out values as I receive them, so order can not be rearranged. With a struct, you must match the header to the order of fields. This is because I can not match it by type and requiring that your field names match the file heading limits what can be in a heading (spaces, punctuation).

Providing specialization for associative arrays also looks like a good option to resolve this.
October 28, 2011
On 10/28/11 1:14 PM, Jesse Phillips wrote:
> Ary Manzana Wrote:
>
>> I like the idea of reading a CSV into a struct, or treating all fields
>> as ints.
>>
>> But, in my experience, CSVs are never perfect... because humans fill
>> them, and humans aren't perfect or don't know "Oh, of course I must fill
>> only numbers in this column, otherwise programs will explode".
>>
>> So although it's a nice functionality, I think just returning everything
>> as strings is more than enough.
>
> If you are dealing with incorrect data you must deal with it. If you expect everything to be an int, and it isn't what are you going to do? I should check how easy it is to recover from an error.

You are right.

I see that you throw ConvException, because when some to!... fails it throws that. But you don't catch it. So you'd get something like "Can't convert 'hello' to an int", but you loose the information of which row and column caused the problem. So maybe catching it and throwing a more detailed exception...

Also, sometimes people fill out numbers with some comments, like "1 apple", so maybe a less strict parsing for numbers might be good (as an option).

Just some comments :)
October 28, 2011
Jacob Carlborg Wrote:

> If a header is specified, is it possible to iterate over the content as an associative array, something like this:
> 
> string str = "a,b,c\nHello,65,63.63\nWorld,123,3673.562";
> auto records = csvText(str, ["b"]);
> 
> foreach (header, value ; records) {}
> 
> Or accessing a value by header:
> 
> foreach (record ; records)
> {
>      auto value = record["a"];
>      auto val = record.b; // perhaps using opDispatch as well
> }
> 
> -- 
> /Jacob Carlborg

If I implemented it this way then the entire record would be read in when a header exists.

auto records = csvText!(string[string])(str, ["b"]);

Would that be acceptable?
October 28, 2011
Ary Manzana Wrote:

> You are right.
> 
> I see that you throw ConvException, because when some to!... fails it throws that. But you don't catch it. So you'd get something like "Can't convert 'hello' to an int", but you loose the information of which row and column caused the problem. So maybe catching it and throwing a more detailed exception...

I'll look into that.

> Also, sometimes people fill out numbers with some comments, like "1 apple", so maybe a less strict parsing for numbers might be good (as an option).

Loose rules are hard to setup to make everyone happy. I think this use case should be left to the user. I just want to get the information out quickly. Adding a row count shouldn't be too bad.
October 28, 2011
On 2011-10-28 19:07, Jesse Phillips wrote:
> Jacob Carlborg Wrote:
>
>> If a header is specified, is it possible to iterate over the content as
>> an associative array, something like this:
>>
>> string str = "a,b,c\nHello,65,63.63\nWorld,123,3673.562";
>> auto records = csvText(str, ["b"]);
>>
>> foreach (header, value ; records) {}
>>
>> Or accessing a value by header:
>>
>> foreach (record ; records)
>> {
>>       auto value = record["a"];
>>       auto val = record.b; // perhaps using opDispatch as well
>> }
>>
>> --
>> /Jacob Carlborg
>
> If I implemented it this way then the entire record would be read in when a header exists.
>
> auto records = csvText!(string[string])(str, ["b"]);
>
> Would that be acceptable?

That would be acceptable to me, don't know what how other would see it.

-- 
/Jacob Carlborg
October 28, 2011
On Fri, 28 Oct 2011 19:41:35 +0300, Jesse Phillips <jessekphillips+D@gmail.com> wrote:

> Vladimir Panteleev Wrote:
>
>> On Fri, 28 Oct 2011 16:18:27 +0300, dsimcha <dsimcha@yahoo.com> wrote:
>>
>> > Docs:
>> > http://nascent.freeshell.org/programming/D/doc/phobos/std_csv.html
>>
>> Documentation should have examples of basic usage at the top. It should be
>> possible to figure out how to use a CSV parser within 10 seconds of
>> looking at the docs.
>
> Sounds good. What kind of example should be first? Maybe something that just gives you an itterable:
>
> auto records = csvText(text);

That's not a very good example, since it doesn't show anything about how to use the result of the csvText call. If it's an iterable, the example should use it in a foreach loop.

IMO the example should be a very simple realistic use case, for example printing the data in a user-friendly way:
writefln("%s works as a %s and earns %d per year", ...);

>> It looks like this is only a CSV parser, but not a CSV formatter? That's
>> pretty weird. It'd be like std.xml or std.json only having reading
>> functionality.
>
> I have done quite a bit of CSV output and none of them have been the same. The only similarity is surrounding with quotes and escaping quotes.
>
> "\"" ~ text.replace("\"", "\"\"") ~ "\""
>
> The problem is that how your data is stored and where it needs to go is different for each app. I don't see a reason to provide a output function where one must convert their data into a range of struct and specify an output range, when that conversion is the hard part.
>
> I would consider a function that does escaping/quoting only if the element needs it. But such a function couldn't work on an Input Range.

I think you're overcomplicating this.

You already have a way of specifying the specific format of the CSV file (delimiters, quote characters) for the parser. Use the same idiom for the formatter.

The formatter should accept data in the same format (datatypes) as what's emitted by the parser. If you use functional idioms (ranges), it ought to be possible to pipe it all together neatly.

The specifics don't matter very much, but it should be possible to put some data (array of string arrays, array of structs or whatnot) into a .csv file in a single line of code. Surely this isn't hard?

-- 
Best regards,
 Vladimir                            mailto:vladimir@thecybershadow.net
October 28, 2011
On 10/28/2011 6:18 AM, dsimcha wrote:
> Formal review of Jesse Phillips's CSV parser module begins today and runs
> through November . Following that, a vote will take place for one week. Please
> post any comments about this library in this thread.
>
> Docs:
> http://nascent.freeshell.org/programming/D/doc/phobos/std_csv.html
>
> Code:
> https://github.com/he-the-great/phobos/tree/csv

First off, mucho thanks to Jesse for writing this. It's an important module for Phobos.

This is a review of the documentation:

Include this link: http://en.wikipedia.org/wiki/Comma-separated_values so the programmer can learn more about CSV.

"This parser will loosely follow the RFC-4180"

How exactly will it differ from RFC-4180?

"No exceptions are thrown due to incorrect CSV."

What happens, then, when the CSV is incorrect?

"csvText"

Be more specific about what csvText() returns.

I'm confused about the heading parameter to csvText().

"the Content of the input"

What is that? There's no parameter named "Content".

"RecordList"

"Range which provides access to CSV Records and Fields."

What is the Range? Do you mean RecordList is a Range? And what does "provides access" mean? If RecordList is a Range, why is it labeled a "List"?

"Array of the heading contained in the file."

Now I'm really confused as to what a heading is. I thought it was the first line? Now there's an array of them?

No examples given for constructors.

No documentation for front() or empty().

"Record"

"Returned by a RecordList when Contents is a non-struct."

RecordList is a struct, it does not return anything.

No examples.

"csvNextToken"

Example?