Jump to page: 1 2
Thread overview
Formal Review of std.uni
Apr 28, 2013
Jesse Phillips
Apr 29, 2013
Jacob Carlborg
Apr 29, 2013
Dmitry Olshansky
Apr 29, 2013
Jonathan M Davis
Apr 29, 2013
Dmitry Olshansky
Apr 30, 2013
Jonathan M Davis
Apr 30, 2013
Dmitry Olshansky
Apr 30, 2013
Jonathan M Davis
May 12, 2013
Dmitry Olshansky
Apr 29, 2013
Brian Schott
Apr 29, 2013
Walter Bright
Apr 29, 2013
Dmitry Olshansky
May 12, 2013
Dmitry Olshansky
May 13, 2013
Jacob Carlborg
May 18, 2013
Jesse Phillips
May 19, 2013
Dmitry Olshansky
May 19, 2013
Dmitry Olshansky
May 21, 2013
Jakob Ovrum
April 28, 2013
First off, Dconf is this next weekend and effects the schedule of this review. Review will be held for 3 weeks, instead of holding off a week I'm extending the period and starting the review now. (Dmitry may be unable to respond due to his being a speaker)

This is a replacement module for the current std.uni by Dmitry Olshansky. The std.uni module provides an implementation of fundamental Unicode algorithms and data structures.

To use this module, install 2.63 beta, import uni; and not std.uni, compile two files from the source uni.d unicode_tables.d

Docs:
http://blackwhale.github.io/phobos/uni.html

Source:
https://github.com/blackwhale/gsoc-bench-2012

DMD Beta:
http://forum.dlang.org/post/517C8552.7040704@digitalmars.com

It should be noted that inclusion into Phobos may require addressing inter-dependencies, see "Reducing the inter-dependencies"
http://forum.dlang.org/post/kl8hn8$bm3$1@digitalmars.com
April 29, 2013
On 2013-04-28 18:56, Jesse Phillips wrote:
> First off, Dconf is this next weekend and effects the schedule of this
> review. Review will be held for 3 weeks, instead of holding off a week
> I'm extending the period and starting the review now. (Dmitry may be
> unable to respond due to his being a speaker)
>
> This is a replacement module for the current std.uni by Dmitry
> Olshansky. The std.uni module provides an implementation of fundamental
> Unicode algorithms and data structures.

Two minor things:

* The docs for "decomposeHangul" contains a ddoc macro

* "toLower" and "toUpper" mention overloads which take strings instead of a dchar. I cannot find those overloads in the std.uni module. If they refer to another module it should say so

-- 
/Jacob Carlborg
April 29, 2013
29-Apr-2013 10:45, Jacob Carlborg пишет:
> On 2013-04-28 18:56, Jesse Phillips wrote:
>> First off, Dconf is this next weekend and effects the schedule of this
>> review. Review will be held for 3 weeks, instead of holding off a week
>> I'm extending the period and starting the review now. (Dmitry may be
>> unable to respond due to his being a speaker)
>>
>> This is a replacement module for the current std.uni by Dmitry
>> Olshansky. The std.uni module provides an implementation of fundamental
>> Unicode algorithms and data structures.
>
> Two minor things:
>
> * The docs for "decomposeHangul" contains a ddoc macro

Thanks, fixed that and the same typo elsewhere.

> * "toLower" and "toUpper" mention overloads which take strings instead
> of a dchar. I cannot find those overloads in the std.uni module. If they
> refer to another module it should say so
>

Technically these should be in std.string and are there but incorrect. Plus the impossible to implement toLowerInPlace/toUpperInPlace since length may change during case-conversion.

Darn... I got to implement them in std.uni then. Not much work given that all of pieces are already here.

-- 
Dmitry Olshansky
April 29, 2013
On Monday, April 29, 2013 22:13:09 Dmitry Olshansky wrote:
> Technically these should be in std.string and are there but incorrect.

Then fix them there.

- Jonathan M Davis
April 29, 2013
I was working on a list of suggestions, but turned it into a pull request instead, as the list had gotten big enough to be inconvenient to go through manually. The pull focuses on improving some of the phrasing in the DDoc comments.

https://github.com/blackwhale/gsoc-bench-2012/pull/2

April 29, 2013
Looks nice.

The glossary should be alphabetized.

"combining class" is defined under combiningClass - should it be in the glossary instead?


April 29, 2013
29-Apr-2013 22:50, Jonathan M Davis пишет:
> On Monday, April 29, 2013 22:13:09 Dmitry Olshansky wrote:
>> Technically these should be in std.string and are there but incorrect.
>
> Then fix them there.
>

I think it will take a certain amount of leaked implementation details to get it done at least half-decently. In particular to re-use the same case-folding table as for case-insensitive matching.

Would be a strategic mistake IMHO to spread the internals across 2 modules.

Then std.string could provide a public alias(es) as it imports std.uni anyway?

Going further if we are to preserve the status quo then std.uni will declare them as package to not advertise their new location.

> - Jonathan M Davis



-- 
Dmitry Olshansky
April 29, 2013
30-Apr-2013 02:40, Walter Bright пишет:
> Looks nice.
>

Cool, this means we are getting there ;)

> The glossary should be alphabetized.
>
> "combining class" is defined under combiningClass - should it be in the
> glossary instead?
>

Good ideas. Linked combining class as everything else.

-- 
Dmitry Olshansky
April 30, 2013
On Tuesday, April 30, 2013 03:02:17 Dmitry Olshansky wrote:
> 29-Apr-2013 22:50, Jonathan M Davis пишет:
> > On Monday, April 29, 2013 22:13:09 Dmitry Olshansky wrote:
> >> Technically these should be in std.string and are there but incorrect.
> > 
> > Then fix them there.
> 
> I think it will take a certain amount of leaked implementation details to get it done at least half-decently. In particular to re-use the same case-folding table as for case-insensitive matching.
> 
> Would be a strategic mistake IMHO to spread the internals across 2 modules.
> 
> Then std.string could provide a public alias(es) as it imports std.uni
> anyway?
> 
> Going further if we are to preserve the status quo then std.uni will declare them as package to not advertise their new location.

An alias would be one option. The primary issue I see is that the general design of the modules has been that std.ascii and std.uni operate on individual characters and not strings, whereas std.string operates on strings (generally going with the unicode versions of things when there's a choice rather than the ASCII ones). And having overloads in std.uni which operate on strings doesn't follow that organizational model. Usually, std.string has called the std.uni functions to do what it's needed, and no implementation details have been leaked. I haven't looked at what you've done with std.uni yet, so I don't know how well that would work. But toLower and toUpper are far from them only functions which would be affected by something like this (icmp would be another obvious one).

But even if we decided to reorganize where some functionality or functions live, we shouldn't have std.uni replacing std.string functions while leaving the old functions in std.string. So, worst case, aliases should be used, but if at all reasonable, I'd prefer to keep the module organizational structure that we've had with regards to how characters and string functionality is organized.

- Jonathan M Davis
April 30, 2013
30-Apr-2013 04:12, Jonathan M Davis пишет:
> On Tuesday, April 30, 2013 03:02:17 Dmitry Olshansky wrote:
>> 29-Apr-2013 22:50, Jonathan M Davis пишет:
>>> On Monday, April 29, 2013 22:13:09 Dmitry Olshansky wrote:
>>>> Technically these should be in std.string and are there but incorrect.
>>>
>>> Then fix them there.
>>
>> I think it will take a certain amount of leaked implementation details
>> to get it done at least half-decently. In particular to re-use the same
>> case-folding table as for case-insensitive matching.
>>
>> Would be a strategic mistake IMHO to spread the internals across 2 modules.
>>
>> Then std.string could provide a public alias(es) as it imports std.uni
>> anyway?
>>
>> Going further if we are to preserve the status quo then std.uni will
>> declare them as package to not advertise their new location.
>
> An alias would be one option. The primary issue I see is that the general
> design of the modules has been that std.ascii and std.uni operate on
> individual characters and not strings, whereas std.string operates on strings
> (generally going with the unicode versions of things when there's a choice
> rather than the ASCII ones). And having overloads in std.uni which operate on
> strings doesn't follow that organizational model. Usually, std.string has
> called the std.uni functions to do what it's needed, and no implementation
> details have been leaked. I haven't looked at what you've done with std.uni
> yet, so I don't know how well that would work. But toLower and toUpper are far
> from them only functions which would be affected by something like this (icmp
> would be another obvious one).

Unicode --> can't be done on character by character basis

>
> But even if we decided to reorganize where some functionality or functions
> live, we shouldn't have std.uni replacing std.string functions while leaving
> the old functions in std.string. So, worst case, aliases should be used, but
> if at all reasonable, I'd prefer to keep the module organizational structure
> that we've had with regards to how characters and string functionality is
> organized.

Aliases it is then.

-- 
Dmitry Olshansky
« First   ‹ Prev
1 2