Thread overview | ||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
January 02, 2013 numericValue for (unicode) characters | ||||
---|---|---|---|---|
| ||||
There is an ER that would allow to convert characters to numebers: http://d.puremagic.com/issues/show_bug.cgi?id=5543 For example: '1' => 1 Or, unicode considered: 'Ⅶ' => 7 Long story short, it was decided that it wasn't std.conv.to's job to do this conversion, but rather, there should be a function called "numericValue" inside std.uni and std.ascii that would do this job. What remains are defining how these methods should work. Things to keep in mind: - ASCII to int should be fast. - unicode numeric values span from -0.5 to 1.0e12. - unicode numeric values can be fractional. - ALL unicode numeric values can be EXACTLY represented in a double. Given these observations, I'd like to propose these: //------------------------------ //std.ascii.numericValue /** Given an ascii character, returns that character's numeric value if it is numeric ($(D isNumeric)), and -1 otherwise */ pure @safe nothrow int numericValue(dchar c); //------------------------------ //std.uni.numericValue /** Given a unicode character, returns that character's numeric value if it is numeric ($(D isNumeric)), and throws an exception otherwise */ pure @safe double numericValue(dchar c); //------------------------------ The rationale for this: std.ascii: I think returning -1 as a magic number should help keep the code faster and with less clutter than with exceptions. returning an int is the obvious choice for numbers that span -1 to 10. std.uni: double is the only type that can hold all ranges of unicode's numeric values. This time, uni throws exceptions. This is for two reasons: 1. Choosing a magic number is difficult, and error prone. Correct code would have to look like: "if (std.uni.numericValue(c) > -0.7) {...}" 2. When dealing with unicode, overhead of the exception is probably cleaner and not as critical as with ascii. *********************************************** Thoughts? I wanted to get this ER moved forward. I don't think uni.numericValue will be finished soon, but I would have wanted std.ascii's done sooner rather than later. |
January 02, 2013 Re: numericValue for (unicode) characters | ||||
---|---|---|---|---|
| ||||
Posted in reply to monarch_dodra | monarch_dodra:
> The rationale for this:
> std.ascii: I think returning -1 as a magic number should help keep the code faster and with less clutter than with exceptions.
For the ASCII version I have two use cases:
- Where I want to go fast&unsafe I just use "c - '0'".
- When I want more safety I'd like to use something as to!(), that raises exceptions in case of errors.
A function that works on ASCII and returns -1 doesn't give me much more than "c - '0'". So maybe exceptions are good in the ASCII case too.
There is also std.typecons.nullable, it's a possibility for std.uni.numericValue. Generally Phobos should eat more of its dog food :-)
Bye,
bearophile
|
January 02, 2013 Re: numericValue for (unicode) characters | ||||
---|---|---|---|---|
| ||||
Posted in reply to bearophile | 1/2/2013 7:24 PM, bearophile пишет: > monarch_dodra: > >> The rationale for this: >> std.ascii: I think returning -1 as a magic number should help keep the >> code faster and with less clutter than with exceptions. > > For the ASCII version I have two use cases: > - Where I want to go fast&unsafe I just use "c - '0'". > - When I want more safety I'd like to use something as to!(), that > raises exceptions in case of errors. > > A function that works on ASCII and returns -1 doesn't give me much more > than "c - '0'". So maybe exceptions are good in the ASCII case too. > Then we can maybe just drop this function? What's wrong with if(std.ascii.isNumeric(a)) a -= '0'; else enforce(false); I mean that the time to look it up in std library is much bigger then to roll your own with any of the 2 semantics. Unlike the unicode version, of course. Then IMO having the std.ascii one is mostly just for symmetry and thus I think that both should just use some sentinel value. > There is also std.typecons.nullable, it's a possibility for > std.uni.numericValue. Generally Phobos should eat more of its dog food :-) > double.nan sounds more like it. > Bye, > bearophile -- Dmitry Olshansky |
January 02, 2013 Re: numericValue for (unicode) characters | ||||
---|---|---|---|---|
| ||||
Posted in reply to Dmitry Olshansky | On 1/2/13 3:13 PM, Dmitry Olshansky wrote: > 1/2/2013 7:24 PM, bearophile пишет: >> monarch_dodra: >> >>> The rationale for this: >>> std.ascii: I think returning -1 as a magic number should help keep the >>> code faster and with less clutter than with exceptions. >> >> For the ASCII version I have two use cases: >> - Where I want to go fast&unsafe I just use "c - '0'". >> - When I want more safety I'd like to use something as to!(), that >> raises exceptions in case of errors. >> >> A function that works on ASCII and returns -1 doesn't give me much more >> than "c - '0'". So maybe exceptions are good in the ASCII case too. >> > > Then we can maybe just drop this function? What's wrong with > if(std.ascii.isNumeric(a)) > a -= '0'; > else > enforce(false); Unnecessary flow :o). enforce(std.ascii.isNumeric(a)); a -= '0'; Andrei |
January 02, 2013 Re: numericValue for (unicode) characters | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | 1/3/2013 12:21 AM, Andrei Alexandrescu пишет: > On 1/2/13 3:13 PM, Dmitry Olshansky wrote: >> 1/2/2013 7:24 PM, bearophile пишет: >>> monarch_dodra: >>> >>>> The rationale for this: >>>> std.ascii: I think returning -1 as a magic number should help keep the >>>> code faster and with less clutter than with exceptions. >>> >>> For the ASCII version I have two use cases: >>> - Where I want to go fast&unsafe I just use "c - '0'". >>> - When I want more safety I'd like to use something as to!(), that >>> raises exceptions in case of errors. >>> >>> A function that works on ASCII and returns -1 doesn't give me much more >>> than "c - '0'". So maybe exceptions are good in the ASCII case too. >>> >> >> Then we can maybe just drop this function? What's wrong with >> if(std.ascii.isNumeric(a)) >> a -= '0'; >> else >> enforce(false); > > Unnecessary flow :o). > > enforce(std.ascii.isNumeric(a)); > a -= '0'; Yup, and it's 2 lines then. And if one really wants to chain it: map(a => enforce(std.ascii.isNumeric(a)), a -= '0')(...); Hardly makes it Phobos candidate then ;) -- Dmitry Olshansky |
January 02, 2013 Re: numericValue for (unicode) characters | ||||
---|---|---|---|---|
| ||||
Posted in reply to Dmitry Olshansky | On Wednesday, 2 January 2013 at 20:49:38 UTC, Dmitry Olshansky wrote: > > Yup, and it's 2 lines then. And if one really wants to chain it: > map(a => enforce(std.ascii.isNumeric(a)), a -= '0')(...); > > Hardly makes it Phobos candidate then ;) Well, just because its almost trivial to us doesn't mean it hurts to have it. The fact that you can even operate on chars in such a fashion (c - '0') is not obvious to everyone: I've seen time and time again code such as: //---- if (97 <= c && c <= 122) c -= 97; //---- numericValue helps keep things clean and self documented. What's more, it helps keep ascii complete. Code originally written for ascii is easily upgreable to support uni (and vice-versa). Further more, *writing* "std.ascii.numericValue" self documents ascii only support, which is less obvious than code using "c - '0'": In the original pull request to "improve" conv.to, the fact that it did not support unicode didn't even cross our minds. Seeing "std.ascii.numericValue" raises the eyebrow. It *forces* unicode consideration (regardless of which is right, it can't be ignored). Really, by the rationale of "it's 2 lines", we shouldn't even have "std.ascii.isNumeric" at all... On Wednesday, 2 January 2013 at 20:13:32 UTC, Dmitry Olshansky wrote: > 1/2/2013 7:24 PM, bearophile пишет: >> There is also std.typecons.nullable, it's a possibility for >> std.uni.numericValue. Generally Phobos should eat more of its dog food :-) >> > > double.nan sounds more like it. Hum... nan. I like it. |
January 02, 2013 Re: numericValue for (unicode) characters | ||||
---|---|---|---|---|
| ||||
Posted in reply to monarch_dodra | On Wed, Jan 02, 2013 at 11:15:31PM +0100, monarch_dodra wrote: > On Wednesday, 2 January 2013 at 20:49:38 UTC, Dmitry Olshansky wrote: > > > >Yup, and it's 2 lines then. And if one really wants to chain it: > >map(a => enforce(std.ascii.isNumeric(a)), a -= '0')(...); > > > >Hardly makes it Phobos candidate then ;) > > Well, just because its almost trivial to us doesn't mean it hurts to > have it. The fact that you can even operate on chars in such a > fashion (c - '0') is not obvious to everyone: I've seen time and > time again code such as: > //---- > if (97 <= c && c <= 122) > c -= 97; > //---- > > numericValue helps keep things clean and self documented. +1. Code intent is important. [...] > On Wednesday, 2 January 2013 at 20:13:32 UTC, Dmitry Olshansky wrote: > >1/2/2013 7:24 PM, bearophile пишет: > >>There is also std.typecons.nullable, it's a possibility for std.uni.numericValue. Generally Phobos should eat more of its dog food :-) > >> > > > >double.nan sounds more like it. > > Hum... nan. I like it. +1 for nan. It's about time we used nan for something useful beyond just an annoying default value for floating-point variables. :) T -- People say I'm indecisive, but I'm not sure about that. -- YHL, CONLANG |
January 03, 2013 Re: numericValue for (unicode) characters | ||||
---|---|---|---|---|
| ||||
Posted in reply to Dmitry Olshansky | Dmitry Olshansky: > Yup, and it's 2 lines then. And if one really wants to chain it: > map(a => enforce(std.ascii.isNumeric(a)), a -= '0')(...); > > Hardly makes it Phobos candidate then ;) I think you meant to write: map(a => enforce(std.ascii.isNumeric(a)), a - '0')(...); To avoid some bugs I try to not use the comma expression like that. Compare that code with: map!numericValue(...); Bye, bearophile |
January 03, 2013 Re: numericValue for (unicode) characters | ||||
---|---|---|---|---|
| ||||
Posted in reply to monarch_dodra | 1/3/2013 2:15 AM, monarch_dodra пишет: > On Wednesday, 2 January 2013 at 20:49:38 UTC, Dmitry Olshansky wrote: >> >> Yup, and it's 2 lines then. And if one really wants to chain it: >> map(a => enforce(std.ascii.isNumeric(a)), a -= '0')(...); >> >> Hardly makes it Phobos candidate then ;) > > Well, just because its almost trivial to us doesn't mean it hurts to > have it. The fact that you can even operate on chars in such a fashion > (c - '0') is not obvious to everyone: I've seen time and time again code > such as: > //---- > if (97 <= c && c <= 122) > c -= 97; > //---- > > numericValue helps keep things clean and self documented. > > What's more, it helps keep ascii complete. Code originally written for > ascii is easily upgreable to support uni (and vice-versa). Further more, > *writing* "std.ascii.numericValue" self documents ascii only support, > which is less obvious than code using "c - '0'": > > In the original pull request to "improve" conv.to, the fact that it did > not support unicode didn't even cross our minds. Seeing > "std.ascii.numericValue" raises the eyebrow. It *forces* unicode > consideration (regardless of which is right, it can't be ignored). > > Really, by the rationale of "it's 2 lines", we shouldn't even have > "std.ascii.isNumeric" at all... > I don't mind adding because of completeness and/or symmetry stand point as I said. I do see another cool issue popping up though. It's a problem of how the anti-hijacking works. Say we add numericValue right now to std.ascii but not std.uni. A release later we have numericValue in std.uni (well hopefully they are both in the same 2.062 ;) ). Now take this code: map!numericValue(...) If the code also happens to import std.uni it's going to stop compiling. That's one of reasons I think our hopes on stability (as in compiles in 5 years from now) are ill placed as we can't have it until the library is essentially dead in stone. -- Dmitry Olshansky |
January 03, 2013 Re: numericValue for (unicode) characters | ||||
---|---|---|---|---|
| ||||
Posted in reply to Dmitry Olshansky | On Thursday, 3 January 2013 at 08:23:06 UTC, Dmitry Olshansky wrote:
> Now take this code:
> map!numericValue(...)
>
> If the code also happens to import std.uni it's going to stop compiling.
Hum... We could always "camp" the std.uni's numericValue function?
//----
double numericValue()(dchar c) const nothrow @safe
{
static assert(false, "Sorry, std.uni.numericValue is not yet implemented");
}
//----
This would avoid the breakage you mentioned.
|
Copyright © 1999-2021 by the D Language Foundation