View mode: basic / threaded / horizontal-split · Log in · Help
November 11, 2011
Re: Review of Jesse Phillips's CSV Parser
On Fri, 11 Nov 2011 13:50:52 +0200, Don <nospam@nospam.com> wrote:

> Far more horrible than that -- in an old Excel from around 2000 the user  
> interface used US settings for reading short dates, if the day of the  
> month was 12 or less. Otherwise it uses the locale. But it always uses  
> the locale for displaying them.
>
> So 11/1, 12/1, 13/1 gets changed to 1 Nov, 1 Dec, 13 Jan.
>
> I wrote a macro called "November 9" (9/11) for swapping them back if  
> they were in the buggy range.

Wow. Reminds me of this:  
http://thedailywtf.com/Articles/TransAtlantic-Time-Trap.aspx

-- 
Best regards,
 Vladimir                            mailto:vladimir@thecybershadow.net
November 11, 2011
Re: Review of Jesse Phillips's CSV Parser
On Fri, 11 Nov 2011 02:17:40 -0800, Jonathan M Davis wrote:

> Actually, I'd argue that Row would be better than Record, since it _is_
> a row in a table. Personally, I'd find it to be more immediately clear
> that way. With Record, I have to figure out what the heck it is a record
> of, whereas Row is immediately obvious in this context.
> 
> - Jonathan M Davis

It _is_ a record of data, just as much as it is a row in a table. The RFC 
I reference continuously refers to records and never even says "row."
November 11, 2011
Re: Review of Jesse Phillips's CSV Parser
On Friday, November 11, 2011 15:25:29 Jesse Phillips wrote:
> On Fri, 11 Nov 2011 02:17:40 -0800, Jonathan M Davis wrote:
> > Actually, I'd argue that Row would be better than Record, since it _is_
> > a row in a table. Personally, I'd find it to be more immediately clear
> > that way. With Record, I have to figure out what the heck it is a record
> > of, whereas Row is immediately obvious in this context.
> > 
> > - Jonathan M Davis
> 
> It _is_ a record of data, just as much as it is a row in a table. The RFC
> I reference continuously refers to records and never even says "row."

Well, then you have a good argument for keeping it as Record. But spreadsheets 
are tables of rows and columns, and a CSV file is a spreadsheet. Personally, I 
find the term record overly vague and wouldn't use it for much of anything - 
not to mention that it makes me think of the analog music device rather than 
anything else. So, I don't particularly like the name Record. I wouldn't use 
it in reference to a DB either. I'd use the term row there as well. But others 
may not agree with me, and if the RFC uses the term record, then that's a 
definite argument for using Record instead of Row.

- Jonathan M Davis
November 11, 2011
Re: Review of Jesse Phillips's CSV Parser
On 11 November 2011 19:07, Jonathan M Davis <jmdavisProg@gmx.com> wrote:

> On Friday, November 11, 2011 15:25:29 Jesse Phillips wrote:
> > On Fri, 11 Nov 2011 02:17:40 -0800, Jonathan M Davis wrote:
> > > Actually, I'd argue that Row would be better than Record, since it _is_
> > > a row in a table. Personally, I'd find it to be more immediately clear
> > > that way. With Record, I have to figure out what the heck it is a
> record
> > > of, whereas Row is immediately obvious in this context.
> > >
> > > - Jonathan M Davis
> >
> > It _is_ a record of data, just as much as it is a row in a table. The RFC
> > I reference continuously refers to records and never even says "row."
>
> Well, then you have a good argument for keeping it as Record. But
> spreadsheets
> are tables of rows and columns, and a CSV file is a spreadsheet.
> Personally, I
> find the term record overly vague and wouldn't use it for much of anything
> -
> not to mention that it makes me think of the analog music device rather
> than
> anything else. So, I don't particularly like the name Record. I wouldn't
> use
> it in reference to a DB either. I'd use the term row there as well. But
> others
> may not agree with me, and if the RFC uses the term record, then that's a
> definite argument for using Record instead of Row.


+1 for row... although I appreciate the RFC says record all over the place.
That said though, upon reading it, I think that might be the worst RFC ever
written ;)
November 12, 2011
Re: Review of Jesse Phillips's CSV Parser
On Thu, 10 Nov 2011 22:17:10 -0500, Jesse Phillips <jessekphillips+d@gmail.com> wrote:
> On Mon, 07 Nov 2011 23:28:20 -0500, Robert Jacques wrote:
>> On Sun, 06 Nov 2011 19:00:42 -0500, Jesse Phillips
>> <jessekphillips+d@gmail.com> wrote:
[snip]
>> I also have some API design concerns. Primarily, most of the type names
>> are overly broad and unclear. For example: Record, Records, Malformed
>> could just as easily be part of a database API. And because of this, not
>> only does one reading the code not know what the object is, but if
>> namespace conflicts can occur, then code reviewers will probably make
>> the wrong assumption half the time as to what Record is and not bother
>> looking it up.
>
> The reason it is name like what you would find in a database, is because
> they are the same thing. It is a row of data. In past discussion about
> existing name clashes it was concluded that was what modules, static, and
> named imports wore for.
>
> Malformed, I just don't have any clue what to actually call it. No
> CSVMalformed doesn't help.
>
> But then you go on to the point below, that you shouldn't even see them.
>
>> Furthermore, Records and Record, are essentially internal ranges/structs
>> returned by csvReader. In the tradition of std.algorithms these should
>> be defined inside the csvReader function (preferably) or as CsvReader /
>> CsvReader.Record.
>
> I was going to explain how wrong you are and how I tried having all the
> options part of csvReader but it just muddies up the clean API when
> customizing something. But actually I already found the solution with my
> previous updates and incorrectly chose to move Malformed customization
> out.
>
> Long story short the problems I see:
>
> * The documentation still needs to exist and will be a little harder to
> do without these
> * Records contains the heading field, this is abnormal for an InputRange
> and means it won't show as an auto link if it becomes a hidden range.
> * Currently you can find exactly which function call result in a specific
> error, if I hide the ranges you don't.
>
> I'm expecting that introducing CSVFormatter will require another review,
> at that time I can try making two forms of documentation to see which is
> preferred.
>
> Thank you! Very useful food for thought.

Modules, static and named imports are extremely useful tools for making disparate third party libraries work together. But Phobos is one (big) library, not a bunch of separate ones. We shouldn't ever need to use these tools to resolve Phobos name conflicts. In fact one of the (many) goals of D is to scale down to quick and dirty scripts. rdmd helps with this a lot, but the other piece of this puzzle is 'import std.all;' which doesn't work today. And while I don't think we should break existing code just to fix this problem, going forward, as we update and add libraries to Phobos, we need to strive to prevent name collisions. And, like std.file and std.stdio, I'd expect std.csv and std.databases to be used together often. Furthermore, your primary case for exposing these structs is for documentation purposes. As the user isn't expected to actually instantiate these structs (or at least not often), naming them CsvReader and CsvReader.Row is perfectly acceptable. Lastly, I'd like to  
draw your attention to std.algorithm.findSplit. findSplit has to return three ranges, which it does so using a Tuple. Records is simply a manual tuple of the headings and the row ranges, so why not leverage Phobos for this instead?
November 16, 2011
Re: Review of Jesse Phillips's CSV Parser
On Sat, 12 Nov 2011 00:21:55 -0500, Robert Jacques wrote:

> On Thu, 10 Nov 2011 22:17:10 -0500, Jesse Phillips
> <jessekphillips+d@gmail.com> wrote:
>> On Mon, 07 Nov 2011 23:28:20 -0500, Robert Jacques wrote:
>>> On Sun, 06 Nov 2011 19:00:42 -0500, Jesse Phillips
>>> <jessekphillips+d@gmail.com> wrote:
> [snip]
>>> I also have some API design concerns. Primarily, most of the type
>>> names are overly broad and unclear. For example: Record, Records,
>>> Malformed could just as easily be part of a database API. And because
>>> of this, not only does one reading the code not know what the object
>>> is, but if namespace conflicts can occur, then code reviewers will
>>> probably make the wrong assumption half the time as to what Record is
>>> and not bother looking it up.
>>
>> The reason it is name like what you would find in a database, is
>> because they are the same thing. It is a row of data. In past
>> discussion about existing name clashes it was concluded that was what
>> modules, static, and named imports wore for.
>>
>> Malformed, I just don't have any clue what to actually call it. No
>> CSVMalformed doesn't help.
>>
>> But then you go on to the point below, that you shouldn't even see
>> them.
>>
>>> Furthermore, Records and Record, are essentially internal
>>> ranges/structs returned by csvReader. In the tradition of
>>> std.algorithms these should be defined inside the csvReader function
>>> (preferably) or as CsvReader / CsvReader.Record.
>>
>> I was going to explain how wrong you are and how I tried having all the
>> options part of csvReader but it just muddies up the clean API when
>> customizing something. But actually I already found the solution with
>> my previous updates and incorrectly chose to move Malformed
>> customization out.
>>
>> Long story short the problems I see:
>>
>> * The documentation still needs to exist and will be a little harder to
>> do without these * Records contains the heading field, this is abnormal
>> for an InputRange and means it won't show as an auto link if it becomes
>> a hidden range.
>> * Currently you can find exactly which function call result in a
>> specific error, if I hide the ranges you don't.
>>
>> I'm expecting that introducing CSVFormatter will require another
>> review, at that time I can try making two forms of documentation to see
>> which is preferred.
>>
>> Thank you! Very useful food for thought.
> 
> Modules, static and named imports are extremely useful tools for making
> disparate third party libraries work together. But Phobos is one (big)
> library, not a bunch of separate ones. We shouldn't ever need to use
> these tools to resolve Phobos name conflicts. In fact one of the (many)
> goals of D is to scale down to quick and dirty scripts. rdmd helps with
> this a lot, but the other piece of this puzzle is 'import std.all;'
> which doesn't work today. And while I don't think we should break
> existing code just to fix this problem, going forward, as we update and
> add libraries to Phobos, we need to strive to prevent name collisions.
> And, like std.file and std.stdio, I'd expect std.csv and std.databases
> to be used together often. Furthermore, your primary case for exposing
> these structs is for documentation purposes. As the user isn't expected
> to actually instantiate these structs (or at least not often), naming
> them CsvReader and CsvReader.Row is perfectly acceptable. 

I like the point on std.all, but isn't std.csv.Rows and std.csv.Row about 
the same as your example names? And if dmd could do partial matches (not 
exactly requesting such a feature) csv.Rows, csv.Row.

> Lastly, I'd
> like to draw your attention to std.algorithm.findSplit. findSplit has to
> return three ranges, which it does so using a Tuple. Records is simply a
> manual tuple of the headings and the row ranges, so why not leverage
> Phobos for this instead?

Records requires access to the header for building the associative array. 
So while I still could return both, the range would still be holding the 
header. If you think it would make an improvement then you'd probably 
want to update your vote to no. Namely what I would see coming from that:


auto records = csvReader(txt);

records.header ...;
foreach(row; records.rows) { ... }
November 16, 2011
Re: Review of Jesse Phillips's CSV Parser
On Tue, 15 Nov 2011 22:30:51 -0500, Jesse Phillips <jessekphillips+d@gmail.com> wrote:
> On Sat, 12 Nov 2011 00:21:55 -0500, Robert Jacques wrote:
>> On Thu, 10 Nov 2011 22:17:10 -0500, Jesse Phillips
>> <jessekphillips+d@gmail.com> wrote:
>>> On Mon, 07 Nov 2011 23:28:20 -0500, Robert Jacques wrote:
>>>> On Sun, 06 Nov 2011 19:00:42 -0500, Jesse Phillips
>>>> <jessekphillips+d@gmail.com> wrote:

[snip]

>> Modules, static and named imports are extremely useful tools for making
>> disparate third party libraries work together. But Phobos is one (big)
>> library, not a bunch of separate ones. We shouldn't ever need to use
>> these tools to resolve Phobos name conflicts. In fact one of the (many)
>> goals of D is to scale down to quick and dirty scripts. rdmd helps with
>> this a lot, but the other piece of this puzzle is 'import std.all;'
>> which doesn't work today. And while I don't think we should break
>> existing code just to fix this problem, going forward, as we update and
>> add libraries to Phobos, we need to strive to prevent name collisions.
>> And, like std.file and std.stdio, I'd expect std.csv and std.databases
>> to be used together often. Furthermore, your primary case for exposing
>> these structs is for documentation purposes. As the user isn't expected
>> to actually instantiate these structs (or at least not often), naming
>> them CsvReader and CsvReader.Row is perfectly acceptable.
>
> I like the point on std.all, but isn't std.csv.Rows and std.csv.Row about
> the same as your example names?

Yes and no. Yes, from the point of view of reading 'std.csv.Rows' in existing code. No, from the point of view writing code. You're much more likely to write just 'Rows' and then have to go through a quick-compile-fix-compile loop. Worse, consider the people using the colliding std.databases.Row struct, which now have to use the fully qualified name. I don't feel that overly succinct names are important to std.csv, as most users will be using 'auto records = csvReader(txt);' and will never have to type out std.csv.Rows or csvRows or whatever.

> And if dmd could do partial matches (not
> exactly requesting such a feature) csv.Rows, csv.Row.
>> Lastly, I'd
>> like to draw your attention to std.algorithm.findSplit. findSplit has to
>> return three ranges, which it does so using a Tuple. Records is simply a
>> manual tuple of the headings and the row ranges, so why not leverage
>> Phobos for this instead?
>
> Records requires access to the header for building the associative array.
> So while I still could return both, the range would still be holding the
> header. If you think it would make an improvement then you'd probably
> want to update your vote to no. Namely what I would see coming from that:
>
>
> auto records = csvReader(txt);
>
> records.header ...;
> foreach(row; records.rows) { ... }

Thank you for explaining. I wasn't seeing the implementation efficiency that comes from the custom struct approach.
November 16, 2011
Re: Review of Jesse Phillips's CSV Parser
On Tue, 15 Nov 2011 23:12:57 -0500, Robert Jacques wrote:

> Thank you for explaining. I wasn't seeing the implementation efficiency
> that comes from the custom struct approach.

No, thank you. It is good to have someone fighting a different opinion. 
It is because of you we can have this bike-shedding and not even have 
shed :D. Had the hiding of both structs not worked so well I would have 
changed it (as I would have thought on it more, realized it was two 
against zero, and wouldn't be distracted with the idea of simplifying the 
docs)
Next ›   Last »
2 3 4 5 6
Top | Discussion index | About this forum | D home