Formal Review of std.uni

First off, Dconf is this next weekend and effects the schedule of this review. Review will be held for 3 weeks, instead of holding off a week I'm extending the period and starting the review now. (Dmitry may be unable to respond due to his being a speaker) This is a replacement module for the current std.uni by Dmitry Olshansky. The std.uni module provides an implementation of fundamental Unicode algorithms and data structures. To use this module, install 2.63 beta, import uni; and not std.uni, compile two files from the source uni.d unicode_tables.d Docs: http://blackwhale.github.io/phobos/uni.html Source: https://github.com/blackwhale/gsoc-bench-2012 DMD Beta: http://forum.dlang.org/post/517C8552.7040704@digitalmars.com It should be noted that inclusion into Phobos may require addressing inter-dependencies, see "Reducing the inter-dependencies" http://forum.dlang.org/post/kl8hn8$bm3$1@digitalmars.com

On 2013-04-28 18:56, Jesse Phillips wrote: > First off, Dconf is this next weekend and effects the schedule of this > review. Review will be held for 3 weeks, instead of holding off a week > I'm extending the period and starting the review now. (Dmitry may be > unable to respond due to his being a speaker) > > This is a replacement module for the current std.uni by Dmitry > Olshansky. The std.uni module provides an implementation of fundamental > Unicode algorithms and data structures. Two minor things: * The docs for "decomposeHangul" contains a ddoc macro * "toLower" and "toUpper" mention overloads which take strings instead of a dchar. I cannot find those overloads in the std.uni module. If they refer to another module it should say so -- /Jacob Carlborg

29-Apr-2013 10:45, Jacob Carlborg пишет: > On 2013-04-28 18:56, Jesse Phillips wrote: >> First off, Dconf is this next weekend and effects the schedule of this >> review. Review will be held for 3 weeks, instead of holding off a week >> I'm extending the period and starting the review now. (Dmitry may be >> unable to respond due to his being a speaker) >> >> This is a replacement module for the current std.uni by Dmitry >> Olshansky. The std.uni module provides an implementation of fundamental >> Unicode algorithms and data structures. > > Two minor things: > > * The docs for "decomposeHangul" contains a ddoc macro Thanks, fixed that and the same typo elsewhere. > * "toLower" and "toUpper" mention overloads which take strings instead > of a dchar. I cannot find those overloads in the std.uni module. If they > refer to another module it should say so > Technically these should be in std.string and are there but incorrect. Plus the impossible to implement toLowerInPlace/toUpperInPlace since length may change during case-conversion. Darn... I got to implement them in std.uni then. Not much work given that all of pieces are already here. -- Dmitry Olshansky

I was working on a list of suggestions, but turned it into a pull request instead, as the list had gotten big enough to be inconvenient to go through manually. The pull focuses on improving some of the phrasing in the DDoc comments. https://github.com/blackwhale/gsoc-bench-2012/pull/2

29-Apr-2013 22:50, Jonathan M Davis пишет: > On Monday, April 29, 2013 22:13:09 Dmitry Olshansky wrote: >> Technically these should be in std.string and are there but incorrect. > > Then fix them there. > I think it will take a certain amount of leaked implementation details to get it done at least half-decently. In particular to re-use the same case-folding table as for case-insensitive matching. Would be a strategic mistake IMHO to spread the internals across 2 modules. Then std.string could provide a public alias(es) as it imports std.uni anyway? Going further if we are to preserve the status quo then std.uni will declare them as package to not advertise their new location. > - Jonathan M Davis -- Dmitry Olshansky

30-Apr-2013 02:40, Walter Bright пишет: > Looks nice. > Cool, this means we are getting there ;) > The glossary should be alphabetized. > > "combining class" is defined under combiningClass - should it be in the > glossary instead? > Good ideas. Linked combining class as everything else. -- Dmitry Olshansky

On Tuesday, April 30, 2013 03:02:17 Dmitry Olshansky wrote: > 29-Apr-2013 22:50, Jonathan M Davis пишет: > > On Monday, April 29, 2013 22:13:09 Dmitry Olshansky wrote: > >> Technically these should be in std.string and are there but incorrect. > > > > Then fix them there. > > I think it will take a certain amount of leaked implementation details to get it done at least half-decently. In particular to re-use the same case-folding table as for case-insensitive matching. > > Would be a strategic mistake IMHO to spread the internals across 2 modules. > > Then std.string could provide a public alias(es) as it imports std.uni > anyway? > > Going further if we are to preserve the status quo then std.uni will declare them as package to not advertise their new location. An alias would be one option. The primary issue I see is that the general design of the modules has been that std.ascii and std.uni operate on individual characters and not strings, whereas std.string operates on strings (generally going with the unicode versions of things when there's a choice rather than the ASCII ones). And having overloads in std.uni which operate on strings doesn't follow that organizational model. Usually, std.string has called the std.uni functions to do what it's needed, and no implementation details have been leaked. I haven't looked at what you've done with std.uni yet, so I don't know how well that would work. But toLower and toUpper are far from them only functions which would be affected by something like this (icmp would be another obvious one). But even if we decided to reorganize where some functionality or functions live, we shouldn't have std.uni replacing std.string functions while leaving the old functions in std.string. So, worst case, aliases should be used, but if at all reasonable, I'd prefer to keep the module organizational structure that we've had with regards to how characters and string functionality is organized. - Jonathan M Davis

April 30, 2013

Re: Formal Review of std.uni

Posted by Dmitry Olshansky
in reply to Jonathan M Davis

Permalink

Dmitry Olshansky

Posted in reply to Jonathan M Davis

Permalink

30-Apr-2013 04:12, Jonathan M Davis пишет:
> On Tuesday, April 30, 2013 03:02:17 Dmitry Olshansky wrote:
>> 29-Apr-2013 22:50, Jonathan M Davis пишет:
>>> On Monday, April 29, 2013 22:13:09 Dmitry Olshansky wrote:
>>>> Technically these should be in std.string and are there but incorrect.
>>>
>>> Then fix them there.
>>
>> I think it will take a certain amount of leaked implementation details
>> to get it done at least half-decently. In particular to re-use the same
>> case-folding table as for case-insensitive matching.
>>
>> Would be a strategic mistake IMHO to spread the internals across 2 modules.
>>
>> Then std.string could provide a public alias(es) as it imports std.uni
>> anyway?
>>
>> Going further if we are to preserve the status quo then std.uni will
>> declare them as package to not advertise their new location.
>
> An alias would be one option. The primary issue I see is that the general
> design of the modules has been that std.ascii and std.uni operate on
> individual characters and not strings, whereas std.string operates on strings
> (generally going with the unicode versions of things when there's a choice
> rather than the ASCII ones). And having overloads in std.uni which operate on
> strings doesn't follow that organizational model. Usually, std.string has
> called the std.uni functions to do what it's needed, and no implementation
> details have been leaked. I haven't looked at what you've done with std.uni
> yet, so I don't know how well that would work. But toLower and toUpper are far
> from them only functions which would be affected by something like this (icmp
> would be another obvious one).

Unicode --> can't be done on character by character basis

>
> But even if we decided to reorganize where some functionality or functions
> live, we shouldn't have std.uni replacing std.string functions while leaving
> the old functions in std.string. So, worst case, aliases should be used, but
> if at all reasonable, I'd prefer to keep the module organizational structure
> that we've had with regards to how characters and string functionality is
> organized.

Aliases it is then.

-- 
Dmitry Olshansky

Forums