January 03, 2013
03-Jan-2013 21:13, monarch_dodra пишет:
> On Thursday, 3 January 2013 at 08:23:06 UTC, Dmitry Olshansky wrote:
>> Now take this code:
>> map!numericValue(...)
>>
>> If the code also happens to import std.uni it's going to stop compiling.
>
> Hum... We could always "camp" the std.uni's numericValue function?
>
> //----
> double numericValue()(dchar c) const nothrow @safe
> {
>      static assert(false, "Sorry, std.uni.numericValue is not yet
> implemented");
> }
> //----

We'd pretty much have to.


-- 
Dmitry Olshansky
January 03, 2013
On Thursday, 3 January 2013 at 18:11:45 UTC, Dmitry Olshansky wrote:
> 03-Jan-2013 21:13, monarch_dodra пишет:
>> On Thursday, 3 January 2013 at 08:23:06 UTC, Dmitry Olshansky wrote:
>>> Now take this code:
>>> map!numericValue(...)
>>>
>>> If the code also happens to import std.uni it's going to stop compiling.
>>
>> Hum... We could always "camp" the std.uni's numericValue function?
>> [SNIP]
>
> We'd pretty much have to.

Or, you know... I could just implement both at the same time. It's not like there's an *urgency* for the ascii version or anything. I think I'll just do that.

So... do we agree on
ascii: int - not found => -1
uni: double - not found => nan
?

I can still get started anyways, even if it isn't definite.
January 03, 2013
03-Jan-2013 23:40, monarch_dodra пишет:
> On Thursday, 3 January 2013 at 18:11:45 UTC, Dmitry Olshansky wrote:
>> 03-Jan-2013 21:13, monarch_dodra пишет:
>>> On Thursday, 3 January 2013 at 08:23:06 UTC, Dmitry Olshansky wrote:
>>>> Now take this code:
>>>> map!numericValue(...)
>>>>
>>>> If the code also happens to import std.uni it's going to stop
>>>> compiling.
>>>
>>> Hum... We could always "camp" the std.uni's numericValue function?
>>> [SNIP]
>>
>> We'd pretty much have to.
>
> Or, you know... I could just implement both at the same time. It's not
> like there's an *urgency* for the ascii version or anything. I think
> I'll just do that.
>
> So... do we agree on
> ascii: int - not found => -1
> uni: double - not found => nan
> ?
>
Me fine.

> I can still get started anyways, even if it isn't definite.

It's just an idea that I have exceptionally fast version for Unicode just around the corner, but I wouldn't mind some competition ;)

-- 
Dmitry Olshansky
January 03, 2013
On Thu, Jan 03, 2013 at 08:40:47PM +0100, monarch_dodra wrote: [...]
> Or, you know... I could just implement both at the same time. It's not like there's an *urgency* for the ascii version or anything. I think I'll just do that.
> 
> So... do we agree on
> ascii: int - not found => -1
> uni: double - not found => nan
[...]

LGTM. :)

I did think of what might happen if somebody wrote an int cast for std.uni.numericValue:

	void sloppyProgrammersFunction(dchar ch) {
		// First attempt: compiler error: can't implicitly
		// convert double -> int ...
		//int val = std.uni.numericValue(ch);

		// ... so sloppy programmer inserts a cast
		int val = cast(int)std.uni.numericValue(ch);

		// On Linux/64, if numericValue returns nan, this prints
		// -int.max.
		writeln(val);

		// So this should work:
		if (val < 0) {
			// (In fact, it will still work if
			// std.ascii.numericValue were used instead.)
			writeln("Sloppy code caught the problem correctly!");
		}
	}

So it seems that everything should be alright.

This particular example occurred to me, 'cos I'm thinking of how often one wishes to extract an integral value from a string, and usually one doesn't think that floating point is necessary(!), so the cast from double is a rather big temptation (even though it's wrong!).


T

-- 
Tell me and I forget. Teach me and I remember. Involve me and I understand. -- Benjamin Franklin
January 04, 2013
On Thursday, 3 January 2013 at 21:51:14 UTC, H. S. Teoh wrote:
> On Thu, Jan 03, 2013 at 08:40:47PM +0100, monarch_dodra wrote:
> [...]
>> Or, you know... I could just implement both at the same time. It's
>> not like there's an *urgency* for the ascii version or anything. I
>> think I'll just do that.
>> 
>> So... do we agree on
>> ascii: int - not found => -1
>> uni: double - not found => nan
> [...]
>
> LGTM. :)
>
> I did think of what might happen if somebody wrote an int cast for
> std.uni.numericValue
> [SNIP]
> writeln("Sloppy code caught the problem correctly!");

... alsmost! 1e12 will have a negative value when cast to int. To be 100% correct in regards to converting, the end user would have to use long.

But that'd be a *really exceptional* case behavior...

Even with long, the only problem with the code is that the user would not know the difference between exact integral, and inexact integral. Well, that's what the user gets for being sloppy I guess.

In any case, I think we'd have to provide an example section with a "recommended" way for casting to integral.
January 04, 2013
On Thursday, 3 January 2013 at 20:14:43 UTC, Dmitry Olshansky wrote:
> It's just an idea that I have exceptionally fast version for Unicode just around the corner, but I wouldn't mind some competition ;)

Well, I already mentioned to you how I was planning to do it: Just stupid binary search over ranges of numbers indexed on 0.

The "big" chunk of work, actually (IMO), is just creating the raw data...
January 04, 2013
On Thursday, January 03, 2013 20:40:47 monarch_dodra wrote:
> So... do we agree on
> ascii: int - not found => -1
> uni: double - not found => nan

I'm not a fan of the ASCII version returning -1, but I don't really have a better suggestion. I suppose that you could throw instead, but I don't know if that's a good idea or not. It _would_ be more consistent with our other conversion functions however.

- Jonathan M Davis
January 04, 2013
04-Jan-2013 15:58, Jonathan M Davis пишет:
> On Thursday, January 03, 2013 20:40:47 monarch_dodra wrote:
>> So... do we agree on
>> ascii: int - not found => -1
>> uni: double - not found => nan
>
> I'm not a fan of the ASCII version returning -1, but I don't really have a
> better suggestion. I suppose that you could throw instead, but I don't know if
> that's a good idea or not. It _would_ be more consistent with our other
> conversion functions however.
>
> - Jonathan M Davis

I find low-level stuff that throws to be overly awkward to deal with (not to mention performance problems).

Hm... I've found an brilliant primitive Expected!T that could be of great help in error code vs exceptions problem. See the recent Andrei's talk that went live not long ago:

http://channel9.msdn.com/Shows/Going+Deep/C-and-Beyond-2012-Andrei-Alexandrescu-Systematic-Error-Handling-in-C

Time to put the analogous stuff into Phobos?

-- 
Dmitry Olshansky
January 04, 2013
On Friday, 4 January 2013 at 13:18:48 UTC, Dmitry Olshansky wrote:
> 04-Jan-2013 15:58, Jonathan M Davis пишет:
>> On Thursday, January 03, 2013 20:40:47 monarch_dodra wrote:
>>> So... do we agree on
>>> ascii: int - not found => -1
>>> uni: double - not found => nan
>>
>> I'm not a fan of the ASCII version returning -1, but I don't really have a
>> better suggestion. I suppose that you could throw instead, but I don't know if
>> that's a good idea or not. It _would_ be more consistent with our other
>> conversion functions however.
>>
>> - Jonathan M Davis
>
> I find low-level stuff that throws to be overly awkward to deal with (not to mention performance problems).
>
> Hm... I've found an brilliant primitive Expected!T that could be of great help in error code vs exceptions problem. See the recent Andrei's talk that went live not long ago:
>
> http://channel9.msdn.com/Shows/Going+Deep/C-and-Beyond-2012-Andrei-Alexandrescu-Systematic-Error-Handling-in-C
>
> Time to put the analogous stuff into Phobos?

I finished an implementation:

https://github.com/D-Programming-Language/phobos/pull/1052

It is not "pull ready", so we can still discuss it.

I raised a couple of issues in the pull, which I'll copy here:

//----
I did run into a couple of issues, namelly that I'm not getting 100% equivalence between chars that are numeric, and chars with numeric value... Is this normal...?

* There's a fair bit of chars that have numeric value, but aren't isNumber. I think they might be new in 6.1.0. But I'm not sure. I decided it was best to have them return nan, instead of having inconsistent behavior.
* There's a couple characters in tableLo that have numeric values. These aren't considered in isNumber either. I think this might be a bug though.
* There are 4 "non-number numeric" characters in "CUNEIFORM NUMERIC SIGN". These return wild values, and in particular two of them return -1. I *think* this should actually return nan for us, because (AFAIK), -1 is just wild for invalid :/

Maybe we should just return -1 on invalid unicode? Or maybe it's just my input file:
http://www.unicode.org/Public/UNIDATA/UnicodeData.txt
It doesn't have a separate field for isNumber/numericValue, so it is forced to write a wild number. Maybe these four chars should return nan?
//----

Oh yeah, I also added isNumber to std.ascii. Feels wrong to not have it if we have numericValue.
January 04, 2013
On Friday, 4 January 2013 at 17:48:28 UTC, monarch_dodra wrote:
> //----
> Maybe we should just return -1 on invalid unicode? Or maybe it's just my input file:
> http://www.unicode.org/Public/UNIDATA/UnicodeData.txt
> It doesn't have a separate field for isNumber/numericValue, so it is forced to write a wild number. Maybe these four chars should return nan?

Wait: I figured it out: They are just non-numbers that happen to be inside Nl (Number Letter): http://unicode.org/cldr/utility/character.jsp?a=12433

Documentation on this is not very clear, nor consistent, so sorry for any confusion.

Well, I guess there is a bug in std.isNumber then...