View mode: basic / threaded / horizontal-split · Log in · Help
January 03, 2013
Re: numericValue for (unicode) characters
03-Jan-2013 21:13, monarch_dodra пишет:
> On Thursday, 3 January 2013 at 08:23:06 UTC, Dmitry Olshansky wrote:
>> Now take this code:
>> map!numericValue(...)
>>
>> If the code also happens to import std.uni it's going to stop compiling.
>
> Hum... We could always "camp" the std.uni's numericValue function?
>
> //----
> double numericValue()(dchar c) const nothrow @safe
> {
>      static assert(false, "Sorry, std.uni.numericValue is not yet
> implemented");
> }
> //----

We'd pretty much have to.


-- 
Dmitry Olshansky
January 03, 2013
Re: numericValue for (unicode) characters
On Thursday, 3 January 2013 at 18:11:45 UTC, Dmitry Olshansky 
wrote:
> 03-Jan-2013 21:13, monarch_dodra пишет:
>> On Thursday, 3 January 2013 at 08:23:06 UTC, Dmitry Olshansky 
>> wrote:
>>> Now take this code:
>>> map!numericValue(...)
>>>
>>> If the code also happens to import std.uni it's going to stop 
>>> compiling.
>>
>> Hum... We could always "camp" the std.uni's numericValue 
>> function?
>> [SNIP]
>
> We'd pretty much have to.

Or, you know... I could just implement both at the same time. 
It's not like there's an *urgency* for the ascii version or 
anything. I think I'll just do that.

So... do we agree on
ascii: int - not found => -1
uni: double - not found => nan
?

I can still get started anyways, even if it isn't definite.
January 03, 2013
Re: numericValue for (unicode) characters
03-Jan-2013 23:40, monarch_dodra пишет:
> On Thursday, 3 January 2013 at 18:11:45 UTC, Dmitry Olshansky wrote:
>> 03-Jan-2013 21:13, monarch_dodra пишет:
>>> On Thursday, 3 January 2013 at 08:23:06 UTC, Dmitry Olshansky wrote:
>>>> Now take this code:
>>>> map!numericValue(...)
>>>>
>>>> If the code also happens to import std.uni it's going to stop
>>>> compiling.
>>>
>>> Hum... We could always "camp" the std.uni's numericValue function?
>>> [SNIP]
>>
>> We'd pretty much have to.
>
> Or, you know... I could just implement both at the same time. It's not
> like there's an *urgency* for the ascii version or anything. I think
> I'll just do that.
>
> So... do we agree on
> ascii: int - not found => -1
> uni: double - not found => nan
> ?
>
Me fine.

> I can still get started anyways, even if it isn't definite.

It's just an idea that I have exceptionally fast version for Unicode 
just around the corner, but I wouldn't mind some competition ;)

-- 
Dmitry Olshansky
January 03, 2013
Re: numericValue for (unicode) characters
On Thu, Jan 03, 2013 at 08:40:47PM +0100, monarch_dodra wrote:
[...]
> Or, you know... I could just implement both at the same time. It's
> not like there's an *urgency* for the ascii version or anything. I
> think I'll just do that.
> 
> So... do we agree on
> ascii: int - not found => -1
> uni: double - not found => nan
[...]

LGTM. :)

I did think of what might happen if somebody wrote an int cast for
std.uni.numericValue:

	void sloppyProgrammersFunction(dchar ch) {
		// First attempt: compiler error: can't implicitly
		// convert double -> int ...
		//int val = std.uni.numericValue(ch);

		// ... so sloppy programmer inserts a cast
		int val = cast(int)std.uni.numericValue(ch);

		// On Linux/64, if numericValue returns nan, this prints
		// -int.max.
		writeln(val);

		// So this should work:
		if (val < 0) {
			// (In fact, it will still work if
			// std.ascii.numericValue were used instead.)
			writeln("Sloppy code caught the problem correctly!");
		}
	}

So it seems that everything should be alright.

This particular example occurred to me, 'cos I'm thinking of how often
one wishes to extract an integral value from a string, and usually one
doesn't think that floating point is necessary(!), so the cast from
double is a rather big temptation (even though it's wrong!).


T

-- 
Tell me and I forget. Teach me and I remember. Involve me and I understand. -- Benjamin Franklin
January 04, 2013
Re: numericValue for (unicode) characters
On Thursday, 3 January 2013 at 21:51:14 UTC, H. S. Teoh wrote:
> On Thu, Jan 03, 2013 at 08:40:47PM +0100, monarch_dodra wrote:
> [...]
>> Or, you know... I could just implement both at the same time. 
>> It's
>> not like there's an *urgency* for the ascii version or 
>> anything. I
>> think I'll just do that.
>> 
>> So... do we agree on
>> ascii: int - not found => -1
>> uni: double - not found => nan
> [...]
>
> LGTM. :)
>
> I did think of what might happen if somebody wrote an int cast 
> for
> std.uni.numericValue
> [SNIP]
> writeln("Sloppy code caught the problem correctly!");

... alsmost! 1e12 will have a negative value when cast to int. To 
be 100% correct in regards to converting, the end user would have 
to use long.

But that'd be a *really exceptional* case behavior...

Even with long, the only problem with the code is that the user 
would not know the difference between exact integral, and inexact 
integral. Well, that's what the user gets for being sloppy I 
guess.

In any case, I think we'd have to provide an example section with 
a "recommended" way for casting to integral.
January 04, 2013
Re: numericValue for (unicode) characters
On Thursday, 3 January 2013 at 20:14:43 UTC, Dmitry Olshansky 
wrote:
> It's just an idea that I have exceptionally fast version for 
> Unicode just around the corner, but I wouldn't mind some 
> competition ;)

Well, I already mentioned to you how I was planning to do it: 
Just stupid binary search over ranges of numbers indexed on 0.

The "big" chunk of work, actually (IMO), is just creating the raw 
data...
January 04, 2013
Re: numericValue for (unicode) characters
On Thursday, January 03, 2013 20:40:47 monarch_dodra wrote:
> So... do we agree on
> ascii: int - not found => -1
> uni: double - not found => nan

I'm not a fan of the ASCII version returning -1, but I don't really have a 
better suggestion. I suppose that you could throw instead, but I don't know if 
that's a good idea or not. It _would_ be more consistent with our other 
conversion functions however.

- Jonathan M Davis
January 04, 2013
Re: numericValue for (unicode) characters
04-Jan-2013 15:58, Jonathan M Davis пишет:
> On Thursday, January 03, 2013 20:40:47 monarch_dodra wrote:
>> So... do we agree on
>> ascii: int - not found => -1
>> uni: double - not found => nan
>
> I'm not a fan of the ASCII version returning -1, but I don't really have a
> better suggestion. I suppose that you could throw instead, but I don't know if
> that's a good idea or not. It _would_ be more consistent with our other
> conversion functions however.
>
> - Jonathan M Davis

I find low-level stuff that throws to be overly awkward to deal with 
(not to mention performance problems).

Hm... I've found an brilliant primitive Expected!T that could be of 
great help in error code vs exceptions problem. See the recent Andrei's 
talk that went live not long ago:

http://channel9.msdn.com/Shows/Going+Deep/C-and-Beyond-2012-Andrei-Alexandrescu-Systematic-Error-Handling-in-C

Time to put the analogous stuff into Phobos?

-- 
Dmitry Olshansky
January 04, 2013
Re: numericValue for (unicode) characters
On Friday, 4 January 2013 at 13:18:48 UTC, Dmitry Olshansky wrote:
> 04-Jan-2013 15:58, Jonathan M Davis пишет:
>> On Thursday, January 03, 2013 20:40:47 monarch_dodra wrote:
>>> So... do we agree on
>>> ascii: int - not found => -1
>>> uni: double - not found => nan
>>
>> I'm not a fan of the ASCII version returning -1, but I don't 
>> really have a
>> better suggestion. I suppose that you could throw instead, but 
>> I don't know if
>> that's a good idea or not. It _would_ be more consistent with 
>> our other
>> conversion functions however.
>>
>> - Jonathan M Davis
>
> I find low-level stuff that throws to be overly awkward to deal 
> with (not to mention performance problems).
>
> Hm... I've found an brilliant primitive Expected!T that could 
> be of great help in error code vs exceptions problem. See the 
> recent Andrei's talk that went live not long ago:
>
> http://channel9.msdn.com/Shows/Going+Deep/C-and-Beyond-2012-Andrei-Alexandrescu-Systematic-Error-Handling-in-C
>
> Time to put the analogous stuff into Phobos?

I finished an implementation:

https://github.com/D-Programming-Language/phobos/pull/1052

It is not "pull ready", so we can still discuss it.

I raised a couple of issues in the pull, which I'll copy here:

//----
I did run into a couple of issues, namelly that I'm not getting 
100% equivalence between chars that are numeric, and chars with 
numeric value... Is this normal...?

* There's a fair bit of chars that have numeric value, but aren't 
isNumber. I think they might be new in 6.1.0. But I'm not sure. I 
decided it was best to have them return nan, instead of having 
inconsistent behavior.
* There's a couple characters in tableLo that have numeric 
values. These aren't considered in isNumber either. I think this 
might be a bug though.
* There are 4 "non-number numeric" characters in "CUNEIFORM 
NUMERIC SIGN". These return wild values, and in particular two of 
them return -1. I *think* this should actually return nan for us, 
because (AFAIK), -1 is just wild for invalid :/

Maybe we should just return -1 on invalid unicode? Or maybe it's 
just my input file:
http://www.unicode.org/Public/UNIDATA/UnicodeData.txt
It doesn't have a separate field for isNumber/numericValue, so it 
is forced to write a wild number. Maybe these four chars should 
return nan?
//----

Oh yeah, I also added isNumber to std.ascii. Feels wrong to not 
have it if we have numericValue.
January 04, 2013
Re: numericValue for (unicode) characters
On Friday, 4 January 2013 at 17:48:28 UTC, monarch_dodra wrote:
> //----
> Maybe we should just return -1 on invalid unicode? Or maybe 
> it's just my input file:
> http://www.unicode.org/Public/UNIDATA/UnicodeData.txt
> It doesn't have a separate field for isNumber/numericValue, so 
> it is forced to write a wild number. Maybe these four chars 
> should return nan?

Wait: I figured it out: They are just non-numbers that happen to 
be inside Nl (Number Letter): 
http://unicode.org/cldr/utility/character.jsp?a=12433

Documentation on this is not very clear, nor consistent, so sorry 
for any confusion.

Well, I guess there is a bug in std.isNumber then...
1 2 3
Top | Discussion index | About this forum | D home