October 09, 2010
On Saturday 09 October 2010 03:46:42 Denis wrote:
> This is an interesting discussion that I personally find very important.
> 
> One of the most annoying things in Phobos is that modules are too large and by too large I mean they contain lots and lots of stuff that are loosely related to each other.
> 
> I'm strongly against design like that. As a programmer I've been grown with idea of module separation in mind. At work, we are spending tens of hours discussing the same problem: code dependency reduction. In Phobos, it's just a disaster - every module is a monster that heavily depends or other monsters alike.
> 
> I've heard an opinion that it's hard for people to remember too many modules names. However, they know that you need to import std.array to use Appender, or std.algorithm to use sort etc. As such, it shouldn't be any harder for them to import std.algorithm.sort; or import std.array.appender; Alternatively, they may import std.algorithm.all and have the whole bulk of features at their disposal, but from my experience, that's not what is needed anyway. Here is a random import snippet from my code:
> 
> import core.memory : GC;
> import core.atomic : cas;
> import core.stdc.string : memcpy;
> import core.sync.semaphore : Semaphore;
> 
> In ddmd, I have
> 
> module dmd.expression.Add;
> module dmd.expression.And;
> module dmd.expression.ArrayLength;
> etc
> 
> each declaring just *one* method and only imports stuff that that particular module needs. It helps reducing module dependencies *a lot*.

I, on the other hand, think that it's horrible to be importing functions or types individually. I'd sooner want to import the entirety of Phobos than import an individual function.

Now, you don't want modules to get too large to be sure, but I'm not aware of a single module at this point that I'd consider too large. std.container may get that way once it has a full complement of containers, but none are currently what I'd consider too big. My datetime code is pushing it as a single module, but I don't think that it's really a problem that way except that the file itself gets quite large.

Certainly, it's far easier to know that an algorithm is likely to be in std.algorithm than to have a host of different modules with different types of algorithms in them and have to figure out what is where. Obviously some stuff splits out fairly nicely (like string and range being their own modules), but taking something like the Java approach and essentially having a class per module would create _way_ too many modules. I think that, for the most part, the current module scheme in Phobos is pretty good. I expect that we'll have to breakdown at some point and introduce a multi-level hierarchy for at least some modules rather than have the module hierarchy be entirely flat, but for the most part, what we have is good. What we lack is functionality, not good module layout.

And really, if for some reason which I can't comprehend, you actually _want_ to import functions individually (like you and various other folks out there seem to like to), you can still do that. And as soon as you're importing them individually, they could each be in their own module for all it matters with regards to importing (though obviously that would be bad for the documentation). So, I'd say that it's generally better to have modules that are on the large side rather than on the small side.

- Jonathan M Davis
October 09, 2010
On Saturday 09 October 2010 04:16:59 Michel Fortin wrote:
> Le 2010-10-09 ? 1:47, Jonathan M Davis a ?crit :
> > Which is why PosixTimeZone and WindowsTimeZone will be getting that information from the OS, but the OS does not make it easy. On Posix, you have to actually read in the time zone files from disk, and on Windows, you have to read the registry. No system calls are provided to properly deal with time zones. Honestly, time zone support for anything other than the local time zone is very poor on both Posix and Windows systems. And Windows won't even let you set the time zone for your program without setting for the whole OS. It's not a pleasant situation really, but I hope to be able to overcome it well enough that D programmers won't have to worry about it.
> 
> I had the "pleasure" to work with time zones on Windows once, what a mess!
> 
> On OSX, Cocoa has an API for that, but you can probably get it the posix way too.

I wasn't aware that there was an API. I either need to use the API (at which point, I'd end up with a MacOSXTimeZone in addition to PosixTimeZone and WindowsTimeZone), or I need to know where the time zone files are in Mac OS X. As I understand it, Mac OS X does use the same time zone files as Linux, but I don't know where it keeps them. Linux uses /usr/share/zoneinfo, but I have no idea if Mac OS X even has /usr, let along /usr/share/zoneinfo. Unfortunately, I have no access to either a Mac or a machine with FreeBSD, so some of the OS-specific stuff in my code may need some work (though PosixTimeZone would likely be the main culprit).

- Jonathan M Davis
October 09, 2010
On Sat, Oct 9, 2010 at 3:23 PM, Jonathan M Davis <jmdavisProg at gmx.com> wrote:
>
> I, on the other hand, think that it's horrible to be importing functions or types individually. I'd sooner want to import the entirety of Phobos than import an individual function.
>

That's just a matter of preference, no one is pushing anyone to do so.
Everyone is free to choose the level of granularity that they want
when they import stuff.
Code separation isn't about that at all. After all from user's
perspective writing

import std.stdio : writeln;
import std.alorithm;

isn't any different from writing

import std.stdio.writeln;
import std.alorithm.all;

but there is quite a difference for code maintainer - in the latter case all the related code is isolated, e.g. std.algorithm.sort only contains stuff that it uses.

Imagine a D newcomer that is interested in the std.algorithm.sort implementation. What we offer now is "dig into that mess the std.algorithms is" (sorry, Andrei, std.algorithms is written well but it is very similar to C++ <algorithm> header in terms of readability - you need a *very* deep code knowledge to understand it).

On the other hand, "sort" (or any other class/method) only uses a handful of other symbols (SwapStrategy, binaryFun, lessFun, sortImpl, assumeSorted and a few others), but by and large it is very hard to tell where this particular symbol comes from and it is quite hard to jump back and forth between source files to find proper definition. But it would be A TON easier if std.algorithm.sort was a (relatively) small file with all the primitives used declared together, followed by an implementation:

import std.functional.binaryFun;
import std.functional.lessFun;
import std.algorithm.isSorted;
...

/**
Sorts a random-access range according to predicate $(D less).
...
*/

SortedRange!(Range, less)
sort(alias less = "a < b", SwapStrategy ss = SwapStrategy.unstable,
Range)(Range r)
{
    ...
}

unittest
{
 ...
}

It's small, it's clean, it's easy to understand.

> Now, you don't want modules to get too large to be sure, but I'm not aware of a single module at this point that I'd consider too large. std.container may get that way once it has a full complement of containers, but none are currently what I'd consider too big. My datetime code is pushing it as a single module, but I don't think that it's really a problem that way except that the file itself gets quite large.
>

I didn't tell you split datetime into a few modules, it's just there is nothing wrong to do so, but current Phobos policy discourages (or even prohibits) that.

> Certainly, it's far easier to know that an algorithm is likely to be in std.algorithm than to have a host of different modules with different types of algorithms in them and have to figure out what is where. Obviously some stuff splits out fairly nicely (like string and range being their own modules), but taking something like the Java approach and essentially having a class per module would create _way_ too many modules. I think that, for the most part, the current module scheme in Phobos is pretty good. I expect that we'll have to breakdown at some point and introduce a multi-level hierarchy for at least some modules rather than have the module hierarchy be entirely flat, but for the most part, what we have is good. What we lack is functionality, not good module layout.
>
> And really, if for some reason which I can't comprehend, you actually _want_ to import functions individually (like you and various other folks out there seem to like to), you can still do that.

You certainly can do that, but you get little advantages and many
disadvantages for doing so.
Once again, it isn't any different from user point of view (same
namespace pollution, slightly increased compilation time in case of
import std.algorithm : sort; over std.algorith.sort), but it's very
different from code maintainer's point of view. It starts to matter *a
lot* when your source code grows large, and Phobos is already large.

> And as soon as you're importing them
> individually, they could each be in their own module for all it matters with
> regards to importing (though obviously that would be bad for the documentation).

Not so obvious. Just compare tango.io.stream (has many submodules) with std.stream (whole bunch of stuff merged): http://dsource.org/projects/tango/docs/stable/ http://www.digitalmars.com/d/2.0/phobos/std_stream.html

> So, I'd say that it's generally better to have modules that are on the large side rather than on the small side.
>
> - Jonathan M Davis
> _______________________________________________
> phobos mailing list
> phobos at puremagic.com
> http://lists.puremagic.com/mailman/listinfo/phobos
>
October 09, 2010
On Oct 8, 2010, at 10:47 PM, Jonathan M Davis wrote:

> On Friday 08 October 2010 22:30:02 Walter Bright wrote:
> 
>>> There are multiple time zone classes because it uses polymorphism to deal with the rules for a given time zone. The really basic LocalTime and UTC deal with most cases, but for anyone who really needs to deal with multiple time zones, PosixTimeZone and WindowsTimeZone will be invaluable (I'd love to only have one of those, but Windows just doesn't deal with time zones like the Posix world does).
>> 
>> I agree some kind of polymorphism for time zones is necessary. [rant] Timezones should be an operating system service, if for no other reason than it sucks to have all your apps break when some tinhorn country decides to change their daylight savings time, etc.[/rant]
> 
> Which is why PosixTimeZone and WindowsTimeZone will be getting that information from the OS, but the OS does not make it easy. On Posix, you have to actually read in the time zone files from disk, and on Windows, you have to read the registry. No system calls are provided to properly deal with time zones. Honestly, time zone support for anything other than the local time zone is very poor on both Posix and Windows systems. And Windows won't even let you set the time zone for your program without setting for the whole OS. It's not a pleasant situation really, but I hope to be able to overcome it well enough that D programmers won't have to worry about it.

libicu does a decent job with timezones, but the library is immense.  I'd love to have this functionality in Phobos.
October 09, 2010
On 10/9/10 0:30 CDT, Walter Bright wrote:
> I agree some kind of polymorphism for time zones is necessary. [rant] Timezones should be an operating system service, if for no other reason than it sucks to have all your apps break when some tinhorn country decides to change their daylight savings time, etc.[/rant]

Broke many Ubuntu installations a couple of years ago.

Andrei
October 09, 2010
On Oct 9, 2010, at 4:29 AM, Jonathan M Davis wrote:

> On Saturday 09 October 2010 04:16:59 Michel Fortin wrote:
>> Le 2010-10-09 ? 1:47, Jonathan M Davis a ?crit :
>>> Which is why PosixTimeZone and WindowsTimeZone will be getting that information from the OS, but the OS does not make it easy. On Posix, you have to actually read in the time zone files from disk, and on Windows, you have to read the registry. No system calls are provided to properly deal with time zones. Honestly, time zone support for anything other than the local time zone is very poor on both Posix and Windows systems. And Windows won't even let you set the time zone for your program without setting for the whole OS. It's not a pleasant situation really, but I hope to be able to overcome it well enough that D programmers won't have to worry about it.
>> 
>> I had the "pleasure" to work with time zones on Windows once, what a mess!
>> 
>> On OSX, Cocoa has an API for that, but you can probably get it the posix way too.
> 
> I wasn't aware that there was an API. I either need to use the API (at which point, I'd end up with a MacOSXTimeZone in addition to PosixTimeZone and WindowsTimeZone), or I need to know where the time zone files are in Mac OS X. As I understand it, Mac OS X does use the same time zone files as Linux, but I don't know where it keeps them. Linux uses /usr/share/zoneinfo, but I have no idea if Mac OS X even has /usr, let along /usr/share/zoneinfo.

It does.  OSX is based on BSD so the directory structure is pretty standard.  All the OSX-specific stuff lives in a parallel directory structure.
October 09, 2010

Andrei Alexandrescu wrote:
> On 10/9/10 0:30 CDT, Walter Bright wrote:
>> I agree some kind of polymorphism for time zones is necessary. [rant] Timezones should be an operating system service, if for no other reason than it sucks to have all your apps break when some tinhorn country decides to change their daylight savings time, etc.[/rant]
>
> Broke many Ubuntu installations a couple of years ago.
>

It should be a device driver, or at least something easily updated.
October 09, 2010
On 10/8/10 16:04 CDT, Jonathan M Davis wrote:
> Okay. Here is my current datetime code for review.

Solid work. I'm on vacation and I gave it a thorough read last night. David's parallelism library would have come first, but that's not vacation reading...

A word of caution. I've been on the boost mailing list for a good while and they have the best review process ever. The result is invariably twofold: (a) a very good quality library; (b) a suicidal-depressive library author.

You need to expect to take quite a few hits from me and others, but please don't take it personally. Also, a word for the other reviewers: it's much easier (and therefore more luring) to bury a submission for its weaknesses instead of suggesting concrete steps to make it stronger.

High-level thoughts
===================

* At 40KLOC, the library is probably on a par with the first Netscape browser. This is a bit extravagant for a date and time library, and last thing we want is Phobos to become a collection of elaborate trivialities (I'm not saying date/time is a trivial matter, but of the 40KLOC there are probably only a couple thousand that do actual work, of which a couple hundred that do nontrivial work). We should look into ways to reduce its sheer size, and, most importantly, the size of its user-visible API. (More concrete suggestions to follow.)

* The library has a fundamental inconsistency: it can't decide whether it goes with sheer "long" representation for various durations (days, months, hnseconds etc.) versus typed representations (MonthDuration, HNDuration, JointDuration). This hurts everybody everywhere: the library spends real estate to implement durations, and then the API explodes because it doesn't use them so it needs to be explicit about measurement units. For example, we have a hecatomb of addXxx functions: addDays, addWeeks, addThis, addThat, addTheOther. The design should have exploited typed durations and support += for durations. Then you write d += add(weeks(3)); and so on.

* Speaking of which, we come back at Walter's point: should we go with "long" for representing various durations and then make the meaning clear with function names, OR should we go with explicit strong types for times and durations? It's a good question, and arguments could be made for either case. Using "long" would probably trim away 60% of the code size. On the other hand, if we look at successful date and time libraries, we see that they do tend to be elaborate about granular types. So my suggestion is to not mess with success and go with elaborate typing. BUT I do want to cut a good fraction of sheer repetition by use of the language. AND I also want to not be able to find "long" in the library except in e.g. multiplication of a duration with a scalar and in the private definition of the typed abstractions.

* This brings me to one beef I have: the library is repetitive and tedious. This is because it makes very little use of generative and generic capabilities. Most of it could have been written in C with structs and functions. I'll follow up with concrete suggestions.

* The library uses at least one actively unrecommended technique (global alias for built-ins) and one questionable or at least not-yet-widely-used technique (use of a class as a namespace). Explanations follow.

Details
=======

* core.d is by far the weakest part of the library. In fact, it being the first one I've read, I got biased early on against the library. I'm very glad I continued to read through.

* Global aliases of the form

alias short  Year;              /// Unit for holding a Gregorian year.

are a poor technique in C, C++, and D. I've read articles and rants about it in several places. In brief, the problem is that such aliases lure their user into thinking they are genuine types with the usual safeguards that types offer (such as disallowing illegal operations during compilation time and/or run time).

DayOfMonth dom;
Month m;
...
m += dom; // won't compile I guess?!?
dom = 32; // throw I suppose?!?

Recommendation: eliminate these. At best make them private and give them the "Rep" (representation) suffix. But I suspect you don't even need them.

* This enum:

enum Month : MonthOfYear { jan = 1, /// January.
                            feb,     /// February.
                            mar,     /// March.
                            apr,     /// April.
                            may,     /// May.
                            jun,     /// June.
                            jul,     /// July.
                            aug,     /// August.
                            sep,     /// September.
                            oct,     /// October.
                            nov,     /// November.
                            dec      /// December.
                          }

The doc comments are not needed. Just put "/// Month" for the first and "/// Ditto" for all others.

* Similar remark about comments to:

enum DayOfWeek : ushort { sunday = 0, /// Sunday.
                           monday,     /// Monday.
                           tuesday,    /// Tuesday.
                           wednesday,  /// Wednesday.
                           thursday,   /// Thursday.
                           friday,     /// Friday.
                           saturday    /// Saturday.
                         }

The names are self-explanatory, no need to repeat them! I also notice that (a) DayOfWeek uses ushort directly (why not ubyte?) instead of going through a global alias (like Month) that I'd recommend you remove anyway.

* These types suggest to me that we need to finally implement a Bounded type such that Bounded!(ubyte, 1, 7) would be a good representation for a day of week. We discussed Bounded a fair amount in the newsgroup a while ago. I think that could save a lot of code down the road (e.g. in duration.d), in addition to being a good library artifact of its own.

* This enum and its kin:

enum AllowDayOverflow
{
     /// Yes, allow day overflow.
     yes,

     /// No, don't allow day overflow.
     no
}

should be like this:

enum AllowDayOverflow
{
     /// No, don't allow day overflow.
     no,
     /// Yes, allow day overflow.
     yes
}

such that code can write:

void fun(AllowDayOverflow allowDayOverflow) {
    ...
    if (allowDayOverflow) ...
    ...
    if (!allowDayOverflow) ...
}

Right now writing such works but with the reverse effect.

* "this(string msg, string file = __FILE__, size_t line = __LINE__, Throwable next = null) nothrow" -> the __FILE__ and __LINE__ are useless (they are yours, not your caller's). You need to make them template parameters to work, see other places in Phobos. At least that's what I recall. Walter?

* "final class TUnitConv" used as a namespace. Can't we organize things such that we introduce few enough names for such namespaces to be unneeded?

* "static long to(TUnit tuFrom, TUnit tuTo)(long value)" is a spark of brilliance that I'd love to see a lot more of. It's essentially the difference between reading and memorizing 90 function names versus memorizing one function name and an enum with 10 values. That being said, the rest of the code sadly doesn't transform this spark into a full-blown explosion - it goes right back to triviality such as e.g. hnsecsToDays. Those are hardly needed in the implementation (use template constraints) and DEFINITELY not needed in the documentation. They just pollute it.

Suggestions: (a) throw TUnitConv away; (b) define this and friends at global scope:

long convert(TUnit from, TUnit to)(long);

That being said, convert() does go against the grain of elaborate typing for durations. I'd accept it as a low-level function, but really the library currently fosters elaborate types for durations... so there's a disconnect here.

* To be specific: throw away yearsToMonths et al. No need for them.

* I suggest even experimenting with strings instead of TUnit. Really TUnit is a vestige from C++ and Java, but we do accept string template parameters so how about

assert(convert!("years", "months")(2) == 24);

etc. Opinions?

* All string functions in core.d make me nervous because they hint to no support for locales, and are very pedestrian as they go. I'd prefer at least some sort of table with names for "year", "years" etc. in conjunction with reusable code, instead of rigid, nonreusable code with inline strings. Same about monthOfYearToString etc.

* validXxx and validXxxOfYyy is in desperate need for a template approach a la to/convert. Virtually anything that encodes types in function names is such a candidate.

* As another random example:

DayOfWeek getDayOfWeek(DayOfGregorianCal day) pure nothrow

could/should be (in client use):

get!(TUnit.dayOfWeek, TUnit.dayOfGregorianCal)(day);

* Generally, it's clear from above why I was displeased with core.d. At 1952 lines, it's a lot of pain for very little gain. It encodes date artifacts in function names leading to an explosion of API functions. It uses too little generic mechanisms to reduce both API size and implementation size.

* On to duration.d. It's simple: there's no need for three durations. Rename JointDuration to Duration and throw everything else. I'd find myself hard pressed to produce situations in which efficiency considerations make it impossible to cope with a 128-bit representation.

* If you do want to go fine-grained durations without the aggravation, use templates. Then you can have Duration!HNSeconds, Duration!Seconds, Duration!Months, whatever you want, without inflicting pain on yourself and others. If you play your cards right, such durations would match 100% the templates you define in core.d and elsewhere so you don't need to make a lot of implementation effort. BUT again I think one duration is enough. Maximum two - short duration and duration.

* timezone.d looks okay.

* At 23KLOC, timepoint.d better had a walkOnWater() function in there.

* currLocalTime, currUTC, currOtherTZ should be current!(TUnit.localTime), current!(TUnit.utc) - bringing again API consolidation into discussion.

(need to finish this just about here)

* other.d - bad name. Like Walter, I think the physical design should be one .di file and one or more .d files.

* Duration and Interval should be consolidated.


Andrei
October 09, 2010
Andrei Alexandrescu <andrei at erdani.com> wrote:
> * "this(string msg, string file = __FILE__, size_t line = __LINE__, Throwable next = null) nothrow" -> the __FILE__ and __LINE__ are useless (they are yours, not your caller's). You need to make them template parameters to work, see other places in Phobos. At least that's what I recall. Walter?

You're wrong, they are replaced at call point.


> long convert(TUnit from, TUnit to)(long);

Couldn't this be more succinct by copying std.conv.to's system?

i.e:
assert( convert!years( months( 24 ) ) == 2 );

-- 
Simen
October 10, 2010
On Sun, Oct 10, 2010 at 12:54 AM, Simen Kjaeraas <simen.kjaras at gmail.com> wrote:
> Andrei Alexandrescu <andrei at erdani.com> wrote:
>>
>> * "this(string msg, string file = __FILE__, size_t line = __LINE__, Throwable next = null) nothrow" -> the __FILE__ and __LINE__ are useless (they are yours, not your caller's). You need to make them template parameters to work, see other places in Phobos. At least that's what I recall. Walter?
>
> You're wrong, they are replaced at call point.
>

I could swear it worked like Andrei said, but apparently it's not. Oh well, we live and learn?.