View mode: basic / threaded / horizontal-split · Log in · Help
January 02, 2013
numericValue for (unicode) characters
There is an ER that would allow to convert characters to numebers:
http://d.puremagic.com/issues/show_bug.cgi?id=5543

For example: '1' => 1
Or, unicode considered: 'Ⅶ' => 7

Long story short, it was decided that it wasn't std.conv.to's job 
to do this conversion, but rather, there should be a function 
called "numericValue" inside std.uni and std.ascii that would do 
this job.

What remains are defining how these methods should work. Things 
to keep in mind:
- ASCII to int should be fast.
- unicode numeric values span from -0.5 to 1.0e12.
- unicode numeric values can be fractional.
- ALL unicode numeric values can be EXACTLY represented in a 
double.

Given these observations, I'd like to propose these:

//------------------------------
//std.ascii.numericValue
/** Given an ascii character, returns that character's
    numeric value if it is numeric ($(D isNumeric)),
    and -1 otherwise
 */
pure @safe nothrow
int numericValue(dchar c);
//------------------------------
//std.uni.numericValue
/** Given a unicode character, returns that character's
    numeric value if it is numeric ($(D isNumeric)),
    and throws an exception otherwise
 */
pure @safe
double numericValue(dchar c);
//------------------------------

The rationale for this:
std.ascii: I think returning -1 as a magic number should help 
keep the code faster and with less clutter than with exceptions. 
returning an int is the obvious choice for numbers that span -1 
to 10.

std.uni: double is the only type that can hold all ranges of 
unicode's numeric values.
This time, uni throws exceptions. This is for two reasons:
1. Choosing a magic number is difficult, and error prone. Correct 
code would have to look like: "if (std.uni.numericValue(c) > 
-0.7) {...}"
2. When dealing with unicode, overhead of the exception is 
probably cleaner and not as critical as with ascii.

***********************************************
Thoughts?

I wanted to get this ER moved forward. I don't think 
uni.numericValue will be finished soon, but I would have wanted 
std.ascii's done sooner rather than later.
January 02, 2013
Re: numericValue for (unicode) characters
monarch_dodra:

> The rationale for this:
> std.ascii: I think returning -1 as a magic number should help 
> keep the code faster and with less clutter than with exceptions.

For the ASCII version I have two use cases:
- Where I want to go fast&unsafe I just use "c - '0'".
- When I want more safety I'd like to use something as to!(), 
that raises exceptions in case of errors.

A function that works on ASCII and returns -1 doesn't give me 
much more than "c - '0'". So maybe exceptions are good in the 
ASCII case too.

There is also std.typecons.nullable, it's a possibility for 
std.uni.numericValue. Generally Phobos should eat more of its dog 
food :-)

Bye,
bearophile
January 02, 2013
Re: numericValue for (unicode) characters
1/2/2013 7:24 PM, bearophile пишет:
> monarch_dodra:
>
>> The rationale for this:
>> std.ascii: I think returning -1 as a magic number should help keep the
>> code faster and with less clutter than with exceptions.
>
> For the ASCII version I have two use cases:
> - Where I want to go fast&unsafe I just use "c - '0'".
> - When I want more safety I'd like to use something as to!(), that
> raises exceptions in case of errors.
>
> A function that works on ASCII and returns -1 doesn't give me much more
> than "c - '0'". So maybe exceptions are good in the ASCII case too.
>

Then we can maybe just drop this function? What's wrong with
if(std.ascii.isNumeric(a))
   a -= '0';
else
   enforce(false);

I mean that the time to look it up in std library is much bigger then to 
roll your own with any of the 2 semantics.

Unlike the unicode version, of course. Then IMO having the std.ascii one 
is mostly just for symmetry and thus I think that both should just use 
some sentinel value.

> There is also std.typecons.nullable, it's a possibility for
> std.uni.numericValue. Generally Phobos should eat more of its dog food :-)
>

double.nan sounds more like it.

> Bye,
> bearophile


-- 
Dmitry Olshansky
January 02, 2013
Re: numericValue for (unicode) characters
On 1/2/13 3:13 PM, Dmitry Olshansky wrote:
> 1/2/2013 7:24 PM, bearophile пишет:
>> monarch_dodra:
>>
>>> The rationale for this:
>>> std.ascii: I think returning -1 as a magic number should help keep the
>>> code faster and with less clutter than with exceptions.
>>
>> For the ASCII version I have two use cases:
>> - Where I want to go fast&unsafe I just use "c - '0'".
>> - When I want more safety I'd like to use something as to!(), that
>> raises exceptions in case of errors.
>>
>> A function that works on ASCII and returns -1 doesn't give me much more
>> than "c - '0'". So maybe exceptions are good in the ASCII case too.
>>
>
> Then we can maybe just drop this function? What's wrong with
> if(std.ascii.isNumeric(a))
> a -= '0';
> else
> enforce(false);

Unnecessary flow :o).

enforce(std.ascii.isNumeric(a));
a -= '0';


Andrei
January 02, 2013
Re: numericValue for (unicode) characters
1/3/2013 12:21 AM, Andrei Alexandrescu пишет:
> On 1/2/13 3:13 PM, Dmitry Olshansky wrote:
>> 1/2/2013 7:24 PM, bearophile пишет:
>>> monarch_dodra:
>>>
>>>> The rationale for this:
>>>> std.ascii: I think returning -1 as a magic number should help keep the
>>>> code faster and with less clutter than with exceptions.
>>>
>>> For the ASCII version I have two use cases:
>>> - Where I want to go fast&unsafe I just use "c - '0'".
>>> - When I want more safety I'd like to use something as to!(), that
>>> raises exceptions in case of errors.
>>>
>>> A function that works on ASCII and returns -1 doesn't give me much more
>>> than "c - '0'". So maybe exceptions are good in the ASCII case too.
>>>
>>
>> Then we can maybe just drop this function? What's wrong with
>> if(std.ascii.isNumeric(a))
>> a -= '0';
>> else
>> enforce(false);
>
> Unnecessary flow :o).
>
> enforce(std.ascii.isNumeric(a));
> a -= '0';

Yup, and it's 2 lines then. And if one really wants to chain it:
map(a => enforce(std.ascii.isNumeric(a)), a -= '0')(...);

Hardly makes it Phobos candidate then ;)


-- 
Dmitry Olshansky
January 02, 2013
Re: numericValue for (unicode) characters
On Wednesday, 2 January 2013 at 20:49:38 UTC, Dmitry Olshansky 
wrote:
>
> Yup, and it's 2 lines then. And if one really wants to chain it:
> map(a => enforce(std.ascii.isNumeric(a)), a -= '0')(...);
>
> Hardly makes it Phobos candidate then ;)

Well, just because its almost trivial to us doesn't mean it hurts 
to have it. The fact that you can even operate on chars in such a 
fashion (c - '0') is not obvious to everyone: I've seen time and 
time again code such as:
//----
if (97 <= c && c <= 122)
    c -= 97;
//----

numericValue helps keep things clean and self documented.

What's more, it helps keep ascii complete. Code originally 
written for ascii is easily upgreable to support uni (and 
vice-versa). Further more, *writing* "std.ascii.numericValue" 
self documents ascii only support, which is less obvious than 
code using "c - '0'":

In the original pull request to "improve" conv.to, the fact that 
it did not support unicode didn't even cross our minds. Seeing 
"std.ascii.numericValue" raises the eyebrow. It *forces* unicode 
consideration (regardless of which is right, it can't be ignored).

Really, by the rationale of "it's 2 lines", we shouldn't even 
have "std.ascii.isNumeric" at all...

On Wednesday, 2 January 2013 at 20:13:32 UTC, Dmitry Olshansky 
wrote:
> 1/2/2013 7:24 PM, bearophile пишет:
>> There is also std.typecons.nullable, it's a possibility for
>> std.uni.numericValue. Generally Phobos should eat more of its 
>> dog food :-)
>>
>
> double.nan sounds more like it.

Hum... nan. I like it.
January 02, 2013
Re: numericValue for (unicode) characters
On Wed, Jan 02, 2013 at 11:15:31PM +0100, monarch_dodra wrote:
> On Wednesday, 2 January 2013 at 20:49:38 UTC, Dmitry Olshansky
> wrote:
> >
> >Yup, and it's 2 lines then. And if one really wants to chain it:
> >map(a => enforce(std.ascii.isNumeric(a)), a -= '0')(...);
> >
> >Hardly makes it Phobos candidate then ;)
> 
> Well, just because its almost trivial to us doesn't mean it hurts to
> have it. The fact that you can even operate on chars in such a
> fashion (c - '0') is not obvious to everyone: I've seen time and
> time again code such as:
> //----
> if (97 <= c && c <= 122)
>     c -= 97;
> //----
> 
> numericValue helps keep things clean and self documented.

+1. Code intent is important.


[...]
> On Wednesday, 2 January 2013 at 20:13:32 UTC, Dmitry Olshansky
> wrote:
> >1/2/2013 7:24 PM, bearophile пишет:
> >>There is also std.typecons.nullable, it's a possibility for
> >>std.uni.numericValue. Generally Phobos should eat more of its
> >>dog food :-)
> >>
> >
> >double.nan sounds more like it.
> 
> Hum... nan. I like it.

+1 for nan. It's about time we used nan for something useful beyond just
an annoying default value for floating-point variables. :)


T

-- 
People say I'm indecisive, but I'm not sure about that. -- YHL, CONLANG
January 03, 2013
Re: numericValue for (unicode) characters
Dmitry Olshansky:

> Yup, and it's 2 lines then. And if one really wants to chain it:
> map(a => enforce(std.ascii.isNumeric(a)), a -= '0')(...);
>
> Hardly makes it Phobos candidate then ;)

I think you meant to write:

map(a => enforce(std.ascii.isNumeric(a)), a - '0')(...);

To avoid some bugs I try to not use the comma expression like 
that.

Compare that code with:

map!numericValue(...);

Bye,
bearophile
January 03, 2013
Re: numericValue for (unicode) characters
1/3/2013 2:15 AM, monarch_dodra пишет:
> On Wednesday, 2 January 2013 at 20:49:38 UTC, Dmitry Olshansky wrote:
>>
>> Yup, and it's 2 lines then. And if one really wants to chain it:
>> map(a => enforce(std.ascii.isNumeric(a)), a -= '0')(...);
>>
>> Hardly makes it Phobos candidate then ;)
>
> Well, just because its almost trivial to us doesn't mean it hurts to
> have it. The fact that you can even operate on chars in such a fashion
> (c - '0') is not obvious to everyone: I've seen time and time again code
> such as:
> //----
> if (97 <= c && c <= 122)
>      c -= 97;
> //----
>
> numericValue helps keep things clean and self documented.
>
> What's more, it helps keep ascii complete. Code originally written for
> ascii is easily upgreable to support uni (and vice-versa). Further more,
> *writing* "std.ascii.numericValue" self documents ascii only support,
> which is less obvious than code using "c - '0'":
>
> In the original pull request to "improve" conv.to, the fact that it did
> not support unicode didn't even cross our minds. Seeing
> "std.ascii.numericValue" raises the eyebrow. It *forces* unicode
> consideration (regardless of which is right, it can't be ignored).
>
> Really, by the rationale of "it's 2 lines", we shouldn't even have
> "std.ascii.isNumeric" at all...
>

I don't mind adding because of completeness and/or symmetry stand point 
as I said.

I do see another cool issue popping up though. It's a problem of how the 
anti-hijacking works. Say we add numericValue right now to std.ascii but 
not std.uni. A release later we have numericValue in std.uni (well 
hopefully they are both in the same 2.062 ;) ).

Now take this code:
map!numericValue(...)

If the code also happens to import std.uni it's going to stop compiling.

That's one of reasons I think our hopes on stability (as in compiles in 
5 years from now) are ill placed as we can't have it until the library 
is essentially dead in stone.


-- 
Dmitry Olshansky
January 03, 2013
Re: numericValue for (unicode) characters
On Thursday, 3 January 2013 at 08:23:06 UTC, Dmitry Olshansky 
wrote:
> Now take this code:
> map!numericValue(...)
>
> If the code also happens to import std.uni it's going to stop 
> compiling.

Hum... We could always "camp" the std.uni's numericValue function?

//----
double numericValue()(dchar c) const nothrow @safe
{
    static assert(false, "Sorry, std.uni.numericValue is not yet 
implemented");
}
//----

This would avoid the breakage you mentioned.
« First   ‹ Prev
1 2 3
Top | Discussion index | About this forum | D home