New std.uni: ready for more beating - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » New std.uni: ready for more beating

Thread overview

New std.uni: ready for more beating
Jan 29, 2013 Dmitry Olshansky
Jan 31, 2013 Dmitry Olshansky
Jan 31, 2013 H. S. Teoh
Feb 02, 2013 Dmitry Olshansky
Jan 31, 2013 David Nadlinger
Feb 23, 2013 H. S. Teoh
Feb 25, 2013 Dmitry Olshansky
Feb 24, 2013 dennis luehring
Feb 25, 2013 Dmitry Olshansky
Feb 26, 2013 dennis luehring
Feb 26, 2013 Dmitry Olshansky
Feb 25, 2013 tn
Feb 25, 2013 Dmitry Olshansky
Feb 26, 2013 Jacob Carlborg
Feb 26, 2013 Jonathan M Davis
Feb 26, 2013 Jacob Carlborg
Feb 26, 2013 Jonathan M Davis
Feb 26, 2013 Dmitry Olshansky
Feb 26, 2013 Jacob Carlborg
Feb 26, 2013 Jonathan M Davis
Feb 27, 2013 Jacob Carlborg
Feb 27, 2013 Jonathan M Davis
Feb 28, 2013 Zach the Mystic
Feb 28, 2013 Jonathan M Davis
Feb 28, 2013 Zach the Mystic
Feb 28, 2013 Jacob Carlborg
Feb 28, 2013 Zach the Mystic

January 29, 2013

New std.uni: ready for more beating

Posted by Dmitry Olshansky

Dmitry Olshansky

Recap:
During a couple of rounds of the informal review new std.uni had its docs happily destroyed, and later re-written based on the feedback.

Notable changes:

- Fixed a couple of latent bugs (ouch!)

- unicode.xyz helper was redesigned to have a clear path for extension to properties other then binary ones. For instance to get all of code points with hangul syllable type L (leading Jamo):

auto leadingJamo = unicode.hangulSyllableType("L");

- Squeezed extra 31Kb slack from object-file size (32 bits, more on 64). Now all of the packed tables occupy around 350Kb (32bits) and
If you happen to know some tricks to reduce object file size (and in turn the executable size), please chime in.

Code & benchmark: https://github.com/blackwhale/gsoc-bench-2012

Docs: http://blackwhale.github.com/phobos/uni.html
(looks far better without the JS jump-table)

It's a standalone module at the moment. To use in place of current std.uni replace 'std.uni'->'uni' in your programs and compare the results. Make sure that both uni and unicode_tables modules are linked in, rdmd can take care of this dependency.

P.S. Time to go for the formal review?

P.P.S. Got to catch some sleep ...
-- 
Dmitry Olshansky

January 31, 2013

Re: New std.uni: ready for more beating

Posted by Dmitry Olshansky
in reply to Dmitry Olshansky

Dmitry Olshansky

Posted in reply to Dmitry Olshansky

30-Jan-2013 01:52, Dmitry Olshansky пишет:
> Recap:
> During a couple of rounds of the informal review new std.uni had its
> docs happily destroyed, and later re-written based on the feedback.
>

[snip]

> - Squeezed extra 31Kb slack from object-file size (32 bits, more on 64).
> Now all of the packed tables occupy around 350Kb (32bits) and
> If you happen to know some tricks to reduce object file size (and in
> turn the executable size), please chime in.

My post got lost in the ether apparently. And it even wasn't complete - and on 64bits it's 464Kb of tables alone. Needless to say I'm worried about these sizes getting too large given that D is pretty much statically linked ATM.

>
> Code & benchmark: https://github.com/blackwhale/gsoc-bench-2012
>
> Docs: http://blackwhale.github.com/phobos/uni.html
> (looks far better without the JS jump-table)
>
> It's a standalone module at the moment. To use in place of current
> std.uni replace 'std.uni'->'uni' in your programs and compare the
> results. Make sure that both uni and unicode_tables modules are linked
> in, rdmd can take care of this dependency.

Let me make it more explicit.

I'm looking for a review manager and anybody willing to revive the review process instead of venting steam on proper property (pun intended) design and seeking a value in requiring parens on no-arg call (or proving otherwise).


-- 
Dmitry Olshansky

January 31, 2013

Re: New std.uni: ready for more beating

Posted by H. S. Teoh
in reply to Dmitry Olshansky

H. S. Teoh

Posted in reply to Dmitry Olshansky

On Thu, Jan 31, 2013 at 11:27:57PM +0400, Dmitry Olshansky wrote:
> 30-Jan-2013 01:52, Dmitry Olshansky пишет:
> >Recap:
> >During a couple of rounds of the informal review new std.uni had its
> >docs happily destroyed, and later re-written based on the feedback.
> >
> 
> [snip]
> 
> >- Squeezed extra 31Kb slack from object-file size (32 bits, more on
> >64).  Now all of the packed tables occupy around 350Kb (32bits) and
> >If you happen to know some tricks to reduce object file size (and in
> >turn the executable size), please chime in.
> 
> My post got lost in the ether apparently. And it even wasn't complete - and on 64bits it's 464Kb of tables alone. Needless to say I'm worried about these sizes getting too large given that D is pretty much statically linked ATM.

It didn't get lost. I saw it. I just haven't had the chance to review it yet. :)

[...]
> Let me make it more explicit.
> 
> I'm looking for a review manager and anybody willing to revive the review process instead of venting steam on proper property (pun intended) design and seeking a value in requiring parens on no-arg call (or proving otherwise).
[...]

Yeah I've basically resorted to thread-deleting the entire @property thread along with its several unending sibling threads. It's not so much that I don't care about it, as that it's just gotten too long-winded and tiring. I'm ready to throw up my hands and let it all go down the pipes.

I don't think I've the time/energy to be a review manager, but I *will* try to get to reviewing the code again sometime soon. IMNSHO, getting the new std.uni into Phobos is *far* more important (and far more profitable!) than the mountain out of molehill that is the current property discussion.

T

-- 
I'm still trying to find a pun for "punishment"...

January 31, 2013

Re: New std.uni: ready for more beating

Posted by David Nadlinger
in reply to Dmitry Olshansky

David Nadlinger

Posted in reply to Dmitry Olshansky

On Thursday, 31 January 2013 at 19:27:59 UTC, Dmitry Olshansky wrote:
> I'm looking for a review manager and anybody willing to revive the review process instead of venting steam on proper property (pun intended) design and seeking a value in requiring parens on no-arg call (or proving otherwise).

If nobody else steps up in the next few days, I'll do it. But I really hope somebody beats me to it, as I'd rather focus completely on getting a new 2.061-based LDC release out.

David

February 02, 2013

Re: New std.uni: ready for more beating

Posted by Dmitry Olshansky
in reply to H. S. Teoh

Dmitry Olshansky

Posted in reply to H. S. Teoh

31-Jan-2013 23:48, H. S. Teoh пишет:
> On Thu, Jan 31, 2013 at 11:27:57PM +0400, Dmitry Olshansky wrote:
>> 30-Jan-2013 01:52, Dmitry Olshansky пишет:
>>> Recap:
>>> During a couple of rounds of the informal review new std.uni had its
>>> docs happily destroyed, and later re-written based on the feedback.
>>>
>>
>> [snip]
>>
>>> - Squeezed extra 31Kb slack from object-file size (32 bits, more on
>>> 64).  Now all of the packed tables occupy around 350Kb (32bits) and
>>> If you happen to know some tricks to reduce object file size (and in
>>> turn the executable size), please chime in.
>>
>> My post got lost in the ether apparently. And it even wasn't complete
>> - and on 64bits it's 464Kb of tables alone. Needless to say I'm
>> worried about these sizes getting too large given that D is pretty
>> much statically linked ATM.
>
> It didn't get lost. I saw it. I just haven't had the chance to review it
> yet. :)
>

Great, I think I was spoiled by the great speed of the previous destructive review. I guess no news is good news :)

>
> [...]
>> Let me make it more explicit.
>>
>> I'm looking for a review manager and anybody willing to revive the
>> review process instead of venting steam on proper property (pun
>> intended) design and seeking a value in requiring parens on no-arg
>> call (or proving otherwise).
> [...]
>
> Yeah I've basically resorted to thread-deleting the entire @property
> thread along with its several unending sibling threads. It's not so much
> that I don't care about it, as that it's just gotten too long-winded and
> tiring. I'm ready to throw up my hands and let it all go down the pipes.
>
> I don't think I've the time/energy to be a review manager, but I *will*
> try to get to reviewing the code again sometime soon. IMNSHO, getting
> the new std.uni into Phobos is *far* more important (and far more
> profitable!) than the mountain out of molehill that is the current
> property discussion.
>
>
> T
>


-- 
Dmitry Olshansky

February 23, 2013

Re: New std.uni: ready for more beating

Posted by H. S. Teoh
in reply to Dmitry Olshansky

H. S. Teoh

Posted in reply to Dmitry Olshansky

On Wed, Jan 30, 2013 at 01:52:20AM +0400, Dmitry Olshansky wrote:
> Recap:
> During a couple of rounds of the informal review new std.uni had its
> docs happily destroyed, and later re-written based on the feedback.
> 
> Notable changes:
> 
> - Fixed a couple of latent bugs (ouch!)
> 
> - unicode.xyz helper was redesigned to have a clear path for extension to properties other then binary ones. For instance to get all of code points with hangul syllable type L (leading Jamo):
> 
> auto leadingJamo = unicode.hangulSyllableType("L");
> 
> - Squeezed extra 31Kb slack from object-file size (32 bits, more on
> 64). Now all of the packed tables occupy around 350Kb (32bits) and
> If you happen to know some tricks to reduce object file size (and in
> turn the executable size), please chime in.
> 
> Code & benchmark: https://github.com/blackwhale/gsoc-bench-2012
> 
> Docs: http://blackwhale.github.com/phobos/uni.html
> (looks far better without the JS jump-table)
> 
> It's a standalone module at the moment. To use in place of current std.uni replace 'std.uni'->'uni' in your programs and compare the results. Make sure that both uni and unicode_tables modules are linked in, rdmd can take care of this dependency.
> 
> P.S. Time to go for the formal review?
[...]

Alright, I decided to just jump in and re-review std.uni. I *really* want to see this in Phobos, the sooner the better.

Here are some comments:

- In the first part of the docs, Terminology section, under "Code unit":
  I think you mistyped a ddoc macro, it should be ($(D char)) instead of
  (($D char)).

- lineSep, paraSep: are these fixed values? It would be nice to indicate
  what their values are.

- UnicodeDecomposition: it would be nice to document the values in this
  enum.

- normalize(): I think your code example has a duplicated line (NFKC
  example appears twice).

- allowedIn(): How about an example where a character is *not* allowed
  in a normalization form?

- InversionList.opBinary: I still prefer ^ instead of ~ for symmetric
  difference. In D, ~ means "append", and it's very confusing when x~y
  means symmetric difference instead of append.

- unicode.opDispatch: it would be nice to provide links to official
  Unicode documentation that lists all blocks/scripts, as a reference.

- combiningClass: maybe provide a link to official Unicode docs that
  list combining class values?


OK, a lot of this is just nitpicks... but overall, this new std.uni looks very good. Looking forward to it being merged into Phobos!


T

-- 
Marketing: the art of convincing people to pay for what they didn't need before which you can't deliver after.

February 24, 2013

Re: New std.uni: ready for more beating

Posted by dennis luehring
in reply to Dmitry Olshansky

dennis luehring

Posted in reply to Dmitry Olshansky

would it make sense to incoporate test from the ICU testsuite - there are api tests and many data tests around

some can find the tests in the current release under

icu4c-50_1_2-src\icu\source\test

Am 29.01.2013 22:52, schrieb Dmitry Olshansky:
> Recap:
> During a couple of rounds of the informal review new std.uni had its
> docs happily destroyed, and later re-written based on the feedback.
>
> Notable changes:
>
> - Fixed a couple of latent bugs (ouch!)
>
> - unicode.xyz helper was redesigned to have a clear path for extension
> to properties other then binary ones. For instance to get all of code
> points with hangul syllable type L (leading Jamo):
>
> auto leadingJamo = unicode.hangulSyllableType("L");
>
> - Squeezed extra 31Kb slack from object-file size (32 bits, more on 64).
> Now all of the packed tables occupy around 350Kb (32bits) and
> If you happen to know some tricks to reduce object file size (and in
> turn the executable size), please chime in.
>
> Code & benchmark: https://github.com/blackwhale/gsoc-bench-2012
>
> Docs: http://blackwhale.github.com/phobos/uni.html
> (looks far better without the JS jump-table)
>
> It's a standalone module at the moment. To use in place of current
> std.uni replace 'std.uni'->'uni' in your programs and compare the
> results. Make sure that both uni and unicode_tables modules are linked
> in, rdmd can take care of this dependency.
>
> P.S. Time to go for the formal review?
>
> P.P.S. Got to catch some sleep ...
>

February 25, 2013

Re: New std.uni: ready for more beating

Posted by tn
in reply to Dmitry Olshansky

tn

Posted in reply to Dmitry Olshansky

Hi. Just a couple stupid questions:

* What is the relation between std.uni and std.utf? Why is two modules needed? Seems confusing to me. Shouldn't these be combined? If not, then please explain the the distinction in the beginning of the module documentation.

* Shouldn't the module be renamed to std.unicode? We do not have std.arr, std.alg or std.cont either. To me, it is not at all obvious what std.uni contains based on the module name.

February 25, 2013

Re: New std.uni: ready for more beating

Posted by Dmitry Olshansky
in reply to tn

Dmitry Olshansky

Posted in reply to tn

25-Feb-2013 22:08, tn пишет:
> Hi. Just a couple stupid questions:
>
> * What is the relation between std.uni and std.utf? Why is two modules
> needed? Seems confusing to me. Shouldn't these be combined? If not, then
> please explain the the distinction in the beginning of the module
> documentation.

std.uni was the C's "ctype" of the Unicode. Except it failed to deliver even this starting with about Unicode 5.1.

std.utf is all about encoding/decoding UTF-8, UTF-16. If I were designing it from scratch (and what the hell I might one day have to)
I'd put these into std.encoding or even std.encoding.utf.

I'd probably put a small note that basic encoding is both:
a) built-in into the language (foreach)
b) to be found in std.utf

>
> * Shouldn't the module be renamed to std.unicode?

Good idea. But part of the reason was fixing the existing std.uni to:
a) let it work in Unicode 6.1 world (and even 6.2 as of now)
b) make it faster when dealing with Unicode code points in all of the isAlpha etc. functions.
c) add a bunch of new cool tools for Unicode

Basically the API is a superset of the existing one. I didn't want to change the name.

> We do not have
> std.arr, std.alg or std.cont either. To me, it is not at all obvious
> what std.uni contains based on the module name.

What can I say Phobos is an example of software evolution ;)


-- 
Dmitry Olshansky

February 25, 2013

Re: New std.uni: ready for more beating

Posted by Dmitry Olshansky
in reply to dennis luehring

Dmitry Olshansky

Posted in reply to dennis luehring

24-Feb-2013 12:32, dennis luehring пишет:
> would it make sense to incoporate test from the ICU testsuite - there
> are api tests and many data tests around

For key algorithms I'm using consortium's test data files + plus
running random generated stress-tests against ICU.

It might make sense to incorporate some of their tests but I'm wondering if it'll end up only as a difference in the API.

That being said tests are already unwieldy and largely run as a separate programs depending on the said data files.

unittests there are mostly sanity and self-agreement between components kind of tests.

> some can find the tests in the current release under
>
> icu4c-50_1_2-src\icu\source\test


-- 
Dmitry Olshansky

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation