Thread overview
Add duration parsing to core.time?
Aug 20, 2013
Justin Whear
Aug 20, 2013
Jonathan M Davis
Aug 21, 2013
Jonathan M Davis
Aug 21, 2013
Brad Anderson
August 20, 2013
While working on a configuration file parser, I found myself trying to decide which units to use for various time variables (e.g. `expireInterval`) which is silly because we have an excellent Duration structure in core.time.  I was pleased to discover that Duration has a toString method which prints a nice, human-readable description. Unfortunately, there appears to be no corresponding parse method.  Turns out that it's surprisingly easy to write thanks to the existing functionality in std.conv: http://dpaste.dzfl.pl/1500b834

It appears that DPaste stumbles over the unicode 'μs' in the units enum, so here's a test invocation and output:

$ dmd -unittest test_duration.d && ./test_duration '12 hours, 30 minutes'
'1w2d20m12h5m2s'
12 hours and 30 minutes
1 week, 2 days, 12 hours, 25 minutes, and 2 secs

I've made the implementation more flexible than simply parsing the very
standard output of Duration.toString by adding more unit synonyms and
making whitespace, commas, and 'and' optional.  All this really requires
is a sequence of digits followed by a unit name, possibly repeating;
leading to the very compact form used in '1w2d20m12h5m2s'.
All validation is performed by the two calls to std.conv.parse, so
invalid strings should fail (e.g. 'four madeupunits').

One possible improvement is to support written-out numbers such as "seven" and "forty-two", but I suspect this would entail a much more involved implementation.

Thoughts on including something like this core.time?  My thought is that Duration could have a `this(string)` with a non-consuming version of this function for automatic to! support in addition to providing parse.

Justin
August 20, 2013
On Tuesday, August 20, 2013 17:57:19 Justin Whear wrote:
> While working on a configuration file parser, I found myself trying to decide which units to use for various time variables (e.g. `expireInterval`) which is silly because we have an excellent Duration structure in core.time. I was pleased to discover that Duration has a toString method which prints a nice, human-readable description. Unfortunately, there appears to be no corresponding parse method. Turns out that it's surprisingly easy to write thanks to the existing functionality in std.conv: http://dpaste.dzfl.pl/1500b834
> 
> It appears that DPaste stumbles over the unicode 'μs' in the units enum, so here's a test invocation and output:
> 
> $ dmd -unittest test_duration.d && ./test_duration '12 hours, 30 minutes'
> '1w2d20m12h5m2s'
> 12 hours and 30 minutes
> 1 week, 2 days, 12 hours, 25 minutes, and 2 secs
> 
> I've made the implementation more flexible than simply parsing the very
> standard output of Duration.toString by adding more unit synonyms and
> making whitespace, commas, and 'and' optional. All this really requires
> is a sequence of digits followed by a unit name, possibly repeating;
> leading to the very compact form used in '1w2d20m12h5m2s'.
> All validation is performed by the two calls to std.conv.parse, so
> invalid strings should fail (e.g. 'four madeupunits').
> 
> One possible improvement is to support written-out numbers such as "seven" and "forty-two", but I suspect this would entail a much more involved implementation.
> 
> Thoughts on including something like this core.time? My thought is that Duration could have a `this(string)` with a non-consuming version of this function for automatic to! support in addition to providing parse.

If such a function were added, it would be fromString on Duration, and it would accept the exact format that toString uses (and only that format). Anything more complicated would have to be part of a functionality relating to user-defined format strings, which I haven't finished yet. That'll probably end up in std.datetime.format at some point after I've finished splitting std.datetime.

- Jonathan M Davis
August 21, 2013
On Tuesday, August 20, 2013 15:35:20 Jonathan M Davis wrote:
> On Tuesday, August 20, 2013 17:57:19 Justin Whear wrote:
> > While working on a configuration file parser, I found myself trying to decide which units to use for various time variables (e.g. `expireInterval`) which is silly because we have an excellent Duration structure in core.time. I was pleased to discover that Duration has a toString method which prints a nice, human-readable description. Unfortunately, there appears to be no corresponding parse method. Turns out that it's surprisingly easy to write thanks to the existing functionality in std.conv: http://dpaste.dzfl.pl/1500b834
> > 
> > It appears that DPaste stumbles over the unicode 'μs' in the units enum, so here's a test invocation and output:
> > 
> > $ dmd -unittest test_duration.d && ./test_duration '12 hours, 30 minutes'
> > '1w2d20m12h5m2s'
> > 12 hours and 30 minutes
> > 1 week, 2 days, 12 hours, 25 minutes, and 2 secs
> > 
> > I've made the implementation more flexible than simply parsing the very
> > standard output of Duration.toString by adding more unit synonyms and
> > making whitespace, commas, and 'and' optional. All this really requires
> > is a sequence of digits followed by a unit name, possibly repeating;
> > leading to the very compact form used in '1w2d20m12h5m2s'.
> > All validation is performed by the two calls to std.conv.parse, so
> > invalid strings should fail (e.g. 'four madeupunits').
> > 
> > One possible improvement is to support written-out numbers such as "seven" and "forty-two", but I suspect this would entail a much more involved implementation.
> > 
> > Thoughts on including something like this core.time? My thought is that Duration could have a `this(string)` with a non-consuming version of this function for automatic to! support in addition to providing parse.
> 
> If such a function were added, it would be fromString on Duration, and it would accept the exact format that toString uses (and only that format). Anything more complicated would have to be part of a functionality relating to user-defined format strings, which I haven't finished yet. That'll probably end up in std.datetime.format at some point after I've finished splitting std.datetime.

And actually, I really don't like the idea of adding a function for parsing the result of Duration's toString. Duration's toString was intended for human legibility, not for being written out and the read in again. std.datetime has several to*String functions with corresponding from*String functions, but they're all in standard formats, whereas Duration's toString is not. So, if any kind of from*String is going to be added to Duration, then a standard format needs to be used and a corresponding to*String function created. There are several standard formats for dates and times, so I assume that there's one for durations as well, but I'd have to look into it. Preferably something from ISO 8601 would be used if it has a standard string format for durations, since that's the main ISO standard for time-related stuff.

In general, I'm very much opposed to functions which try and parse arbitrary strings as they're incredibly error-prone and have to guess at what you mean. In pretty much any case where the string was emitted by a computer in the first place rather than a human, that's just plain sloppy, and ideally, a human would be required to put a string in a standard format when inputting it (or input the values separately rather than as a string) in order to avoid intepretation errors.

- Jonathan M Davis
August 21, 2013
On Wednesday, 21 August 2013 at 06:46:49 UTC, Jonathan M Davis wrote:
> In general, I'm very much opposed to functions which try and parse arbitrary
> strings as they're incredibly error-prone and have to guess at what you mean.
> In pretty much any case where the string was emitted by a computer in the first
> place rather than a human, that's just plain sloppy, and ideally, a human
> would be required to put a string in a standard format when inputting it (or
> input the values separately rather than as a string) in order to avoid
> intepretation errors.
>
> - Jonathan M Davis

I agree completely and can speak from experience.  We used wxWidget's wxDateTime class for years at work and its ParseDateTime which allows free format strings. It was a source of never ending problems for us until we finally stopped using it.  The implementation was fine, it's just that dates are not amenable to unstructured reading. Date strings with locale information embedded in them may be doable but they are basically nonexistent.

Date strings are a lot like string encodings.  They are unsafe to use without knowing a definitive format/encoding.