February 25, 2013
23-Feb-2013 21:14, H. S. Teoh пишет:
>> P.S. Time to go for the formal review?
> [...]
>
> Alright, I decided to just jump in and re-review std.uni. I *really*
> want to see this in Phobos, the sooner the better.
>

Great. Sorry, I had to put your comments on the back-burner, and then I'd found out that std.uni no longer compiles. New release darn it...

Turned out that some screws were tightened up in the compiler and some meaningless qualifiers are no longer accepted. That plus some vague shit with apparently obligatory @property on save() for isForwardRange trait (wtf?).

Phh... OK, now let's go over these.

> Here are some comments:
>
> - In the first part of the docs, Terminology section, under "Code unit":
>    I think you mistyped a ddoc macro, it should be ($(D char)) instead of
>    (($D char)).

Fixed.

> - lineSep, paraSep: are these fixed values? It would be nice to indicate
>    what their values are.


Yup, a carry-over from the old std.uni. Documented.

> - UnicodeDecomposition: it would be nice to document the values in this
>    enum.
>
> - normalize(): I think your code example has a duplicated line (NFKC
>    example appears twice).
>

Fixed.

> - allowedIn(): How about an example where a character is *not* allowed
>    in a normalization form?
>

These are typically hard to recognize visually but I'll try :)

> - InversionList.opBinary: I still prefer ^ instead of ~ for symmetric
>    difference. In D, ~ means "append", and it's very confusing when x~y
>    means symmetric difference instead of append.

I need more then a single opinion on this matter. Yes, I don't quite like '~' but it's the symbol used in std.regex patterns and to make matters worse '^' means something completely different there.

> - unicode.opDispatch: it would be nice to provide links to official
>    Unicode documentation that lists all blocks/scripts, as a reference.

I provided one, but it's probably not listed where it should be, see first paragraphs that outline the stuff in this module.

I'm thinking I'd better compose a small table of _guaranteed_ properties. This would allow me to safely cleanup the ridiculous sets later on. Since the stuff was extracted from Unicode data files, even I'm not sure which sets are there exactly :)

The guaranteed ones are Scripts, Blocks, General Category and few nice sets like ASCII (plus HangulSyllableType, of course!).

> - combiningClass: maybe provide a link to official Unicode docs that
>    list combining class values?

If there was one good link... I might make an enum with symbolic names for the ones that are in use and have meaning.

> OK, a lot of this is just nitpicks... but overall, this new std.uni
> looks very good. Looking forward to it being merged into Phobos!
>

And for that to happen somebody has to put on the review manager's robe and do the ceremony (hint) :)

I the meantime I need to present it also as a pull request to help reviewing the code.

-- 
Dmitry Olshansky
February 26, 2013
On 2013-02-25 19:21, Dmitry Olshansky wrote:

> What can I say Phobos is an example of software evolution ;)

Is the new std.uni completely backwards compatible with the old one. Otherwise we have the same problem as with std.process. But in this case we have another name that is actually better.

I'm having some problem understanding the obsession in shortening names, even when the full name isn't particular long.

-- 
/Jacob Carlborg
February 26, 2013
On Tuesday, February 26, 2013 08:20:54 Jacob Carlborg wrote:
> On 2013-02-25 19:21, Dmitry Olshansky wrote:
> > What can I say Phobos is an example of software evolution ;)
> 
> Is the new std.uni completely backwards compatible with the old one. Otherwise we have the same problem as with std.process. But in this case we have another name that is actually better.
> 
> I'm having some problem understanding the obsession in shortening names, even when the full name isn't particular long.

Well, it can get pretty bad with module names when you're forced to give the full import path. For instance, std.string, std.ascii, and std.uni all have toLower, and std.unicode.toLower is definitely longer than std.uni.toLower.

In general, names should be as long as they need to be in order to be properly clear and descriptive but no longer. Making names too short makes code harder to read and understand, and making them too long makes it harder to fit as much in a line of code without it getting too long.

That being said, std.unicode is probably a better name than std.uni, but at this point, it's better to maintain backwards compatibility than to rename it. If we need to rename it because of changes in the API, then going with std.unicode makes sense, but if the necessary changes are backwards compatible, then we should avoid renaming it and thus avoid breaking code.

- Jonathan M Davis
February 26, 2013
On 2013-02-26 08:34, Jonathan M Davis wrote:

> Well, it can get pretty bad with module names when you're forced to give the
> full import path. For instance, std.string, std.ascii, and std.uni all have
> toLower, and std.unicode.toLower is definitely longer than std.uni.toLower.

If you think that "std.unicode.toLower" is too long then create an alias for it.

BTW, I don't think it's fair to use the fully qualified name when comparing. Then we can never have a nested package hierarchy in Phobos or we will get names looking like:

std.a.b.c.d

> In general, names should be as long as they need to be in order to be properly
> clear and descriptive but no longer. Making names too short makes code harder
> to read and understand, and making them too long makes it harder to fit as much
> in a line of code without it getting too long.
>
> That being said, std.unicode is probably a better name than std.uni, but at
> this point, it's better to maintain backwards compatibility than to rename it.
> If we need to rename it because of changes in the API, then going with
> std.unicode makes sense, but if the necessary changes are backwards
> compatible, then we should avoid renaming it and thus avoid breaking code.

-- 
/Jacob Carlborg
February 26, 2013
On Tuesday, February 26, 2013 08:40:02 Jacob Carlborg wrote:
> On 2013-02-26 08:34, Jonathan M Davis wrote:
> > Well, it can get pretty bad with module names when you're forced to give the full import path. For instance, std.string, std.ascii, and std.uni all have toLower, and std.unicode.toLower is definitely longer than std.uni.toLower.
> If you think that "std.unicode.toLower" is too long then create an alias for it.
> 
> BTW, I don't think it's fair to use the fully qualified name when comparing. Then we can never have a nested package hierarchy in Phobos or we will get names looking like:
> 
> std.a.b.c.d

It's still something that needs to be considered. Having overly long names _will_ create problems, as will overly short names. A balance is needed. Fully qualified names just show one of the costs to longer names. Picking good names is a bit of an art. Going the C route of overly short names is a mistake as is going the Java route of overly descriptive ones.

In the case of std.uni vs std.unicode, I probably would agree that std.unicode is better, because it's more destriptive, whereas std.uni is arguably not descriptive enough as nice as its length may be. So, if we were starting from scratch, std.unicode would make more sense. However, as we're not starting from scratch, and switching to std.unicode would break a lot of code, it's not worth switching to std.unicode just for a more descriptive name. It would be a good choice if we had to change the name for other reasons (e.g. the API needed to be changed in a non-backwards-compatible manner), but improving the name isn't a good enough reason.

- Jonathan M Davis
February 26, 2013
Am 25.02.2013 19:31, schrieb Dmitry Olshansky:
> 24-Feb-2013 12:32, dennis luehring пОшет:
>> would it make sense to incoporate test from the ICU testsuite - there
>> are api tests and many data tests around
>
> For key algorithms I'm using consortium's test data files + plus
> running random generated stress-tests against ICU.
>
> It might make sense to incorporate some of their tests but I'm wondering
> if it'll end up only as a difference in the API.

are doing any benchmark tests again other implementations like ICU or Qt's QString, etc.?

are there any operations that are slower then other implementations?

February 26, 2013
26-Feb-2013 11:20, Jacob Carlborg пишет:
> On 2013-02-25 19:21, Dmitry Olshansky wrote:
>
>> What can I say Phobos is an example of software evolution ;)
>
> Is the new std.uni completely backwards compatible with the old one.
> Otherwise we have the same problem as with std.process. But in this case
> we have another name that is actually better.
>

It is backwards compatible and was created to be a drop in replacement as one of its requirements. In any case the API of std.uni is small and simple enough to keep it as is.

> I'm having some problem understanding the obsession in shortening names,
> even when the full name isn't particular long.

Let's not go there. And uni is not obsession at all. POSIX 'O_CREAT' is.

-- 
Dmitry Olshansky
February 26, 2013
26-Feb-2013 12:27, dennis luehring пишет:
> Am 25.02.2013 19:31, schrieb Dmitry Olshansky:
>> 24-Feb-2013 12:32, dennis luehring пОшет:
>>> would it make sense to incoporate test from the ICU testsuite - there
>>> are api tests and many data tests around
>>
>> For key algorithms I'm using consortium's test data files + plus
>> running random generated stress-tests against ICU.
>>
>> It might make sense to incorporate some of their tests but I'm wondering
>> if it'll end up only as a difference in the API.
>
> are doing any benchmark tests again other implementations like ICU or
> Qt's QString, etc.?

Not all that much but this is on todo list.
If anything it takes a fair bit of time to install the lib,
decipher the API and see how to make it do things like normalization
of UTF32 buffer in memory.

The 2 major moments I was concerned with are:
a) all old things should not get slower then std.uni (in fact they are many times faster)
b) correct (as in spec) and in line with others - currently only ICU but that's the defacto standard

> are there any operations that are slower then other implementations?

Till the data is available. It would be of tremendous help if you can test it against some library that you are comfortable with.

-- 
Dmitry Olshansky
February 26, 2013
On 2013-02-26 17:51, Dmitry Olshansky wrote:

> Let's not go there. And uni is not obsession at all. POSIX 'O_CREAT' is.

The obsession is that there's quite a lot of shortened names in D.

-- 
/Jacob Carlborg
February 26, 2013
On Tuesday, February 26, 2013 20:34:13 Jacob Carlborg wrote:
> On 2013-02-26 17:51, Dmitry Olshansky wrote:
> > Let's not go there. And uni is not obsession at all. POSIX 'O_CREAT' is.
> 
> The obsession is that there's quite a lot of shortened names in D.

Really? There are a few (particularly older ones), but for the most part, I'd argue that that's not the case at all. In general, IMHO, Phobos does an excellent job of having function names that are reasonably descriptive and are of a reasonable length. And, if anything, we tend to err on the side of being too long (e.g. ElementEncodingType).

- Jonathan M Davis