Jump to page: 1 2
Thread overview
bearophile
Sep 20, 2012
bearophile
About std.ascii.toLower
Sep 20, 2012
bearophile
Sep 20, 2012
monarch_dodra
Re: About std.ascii.toLower
Sep 20, 2012
bearophile
Sep 20, 2012
monarch_dodra
Sep 20, 2012
bearophile
Sep 20, 2012
monarch_dodra
Sep 20, 2012
Jonathan M Davis
Sep 20, 2012
bearophile
Sep 21, 2012
monarch_dodra
Sep 21, 2012
Jonathan M Davis
Sep 21, 2012
monarch_dodra
Sep 21, 2012
Jonathan M Davis
Sep 21, 2012
monarch_dodra
Sep 21, 2012
Jonathan M Davis
Sep 21, 2012
monarch_dodra
Sep 21, 2012
Jonathan M Davis
Sep 27, 2012
Don Clugston
September 20, 2012
This is the signature of a function of std.ascii:

http://dlang.org/phobos/std_ascii.html#toLower

pure nothrow @safe dchar toLower(dchar c);

If this function is supposed to be used on ASCII strings, what's the point of returning a dchar? When I use it I have usually to cast its result back to char, and I prefer to avoid casts in my code in D.

Bye,
bearophile
September 20, 2012
Sorry, the thread title was "About std.ascii.toLower"...
September 20, 2012
On Thursday, 20 September 2012 at 16:00:18 UTC, bearophile wrote:
> This is the signature of a function of std.ascii:
>
> http://dlang.org/phobos/std_ascii.html#toLower
>
> pure nothrow @safe dchar toLower(dchar c);
>
> If this function is supposed to be used on ASCII strings, what's the point of returning a dchar? When I use it I have usually to cast its result back to char, and I prefer to avoid casts in my code in D.
>
> Bye,
> bearophile

It's not, it only *operates* on ASCII, but non ascii is still a legal arg:

----
import std.stdio;
import std.ascii;

void main(){
    string s = "héllö";
    write("\"");
    foreach(c; s)
        write(c.toUpper);
    write("\"");
}
----
HéLLö
----
September 20, 2012
monarch_dodra:

> It's not, it only *operates* on ASCII, but non ascii is still a legal arg:

Then maybe std.ascii.toLower needs a pre-condition that constraints it to just ASCII inputs, so it's free to return a char.

Bye,
bearophile
September 20, 2012
On Thursday, 20 September 2012 at 16:34:22 UTC, bearophile wrote:
> monarch_dodra:
>
>> It's not, it only *operates* on ASCII, but non ascii is still a legal arg:
>
> Then maybe std.ascii.toLower needs a pre-condition that constraints it to just ASCII inputs, so it's free to return a char.
>
> Bye,
> bearophile

I was thinking the exact same thing right after replying actually.

Would that actually change anything though? I mean what with alignment and everything, wouldn't returning a char be just as expansive? I'm not 100% sure. What is your use case that would require this?
September 20, 2012
monarch_dodra:

> Would that actually change anything though? I mean what with alignment and everything, wouldn't returning a char be just as expansive? I'm not 100% sure.

If you are thinking about the number of operations, then it's the same, as both a char and dchar value go in a register. The run time is the same, especially after inlining.


> What is your use case that would require this?

I have a char[] like:

['a','x','b','a','c','x','f']

Every char encodes something. Putting it to upper case means that that data was already used:

['a','X','b','a','C','x','f']

In this case to use toUpper I have to use:

cast(char)toUpper(foo[1])

What's I am trying to minimize is the number of cast(). On the other hand even in C toupper returns a type larger than char:

http://www.acm.uiuc.edu/webmonkeys/book/c_guide/2.2.html

It's just D has contract programming, and this module is written for ASCII, so it's able to be smarter than C functions, and return a char.

Bye,
bearophile
September 20, 2012
On Thursday, September 20, 2012 18:35:21 bearophile wrote:
> monarch_dodra:
> > It's not, it only *operates* on ASCII, but non ascii is still a
> 
> > legal arg:
> Then maybe std.ascii.toLower needs a pre-condition that constraints it to just ASCII inputs, so it's free to return a char.

Goodness no.

1. Operating on a char is almost always the wrong thing to do. If you really want to do that, then cast. It should _not_ be encouraged.

2. It would be disastrous if std.ascii's funtions didn't work on unicode. Right now, you can use them with ranges on strings which are unicode, which can be very useful. I grant you that that's more obvious with something like isDigit than toLower, but regardless, std.ascii is designed such that its functions will all operate on unicode strings. It just doesn't alter unicode characters and returns false for them with any of the query functions.

- Jonathan M Davis
September 20, 2012
On Thursday, 20 September 2012 at 17:05:18 UTC, bearophile wrote:
> monarch_dodra:
>
>> Would that actually change anything though? I mean what with alignment and everything, wouldn't returning a char be just as expansive? I'm not 100% sure.
>
> If you are thinking about the number of operations, then it's the same, as both a char and dchar value go in a register. The run time is the same, especially after inlining.
>
>
>> What is your use case that would require this?
>
> I have a char[] like:
>
> ['a','x','b','a','c','x','f']
>
> Every char encodes something. Putting it to upper case means that that data was already used:
>
> ['a','X','b','a','C','x','f']
>
> In this case to use toUpper I have to use:
>
> cast(char)toUpper(foo[1])
>
> What's I am trying to minimize is the number of cast(). On the other hand even in C toupper returns a type larger than char:
>
> http://www.acm.uiuc.edu/webmonkeys/book/c_guide/2.2.html
>
> It's just D has contract programming, and this module is written for ASCII, so it's able to be smarter than C functions, and return a char.
>
> Bye,
> bearophile

That's what I thought. You have a valid point (IMO) but at the same time, using the ASCII methods on non-ascii characters is also legit operation.

I guess we'd need the extra "std.strictascii" module (!) for operations that would accept ASCII char, and return a ASCII char.

I'd support such an ER, I think it would. Allow users (such as you) to have tighter constraints if needed, while still keeping std.ascii for "safer" ASCII operations.
September 20, 2012
Jonathan M Davis:

> Goodness no.

:-)


> 1. Operating on a char is almost always the wrong thing to do.

A single char is often not so useful but I have to keep many mutable chars, keeping them as char[] instead of dchar[] saves both memory and reduces cache misses. The same is true for types like short or float, single ones are not so useful, but they sometimes become useful when you have many of them in arrays.

If I have to modify such char[], using toUpper() requires me a cast. And in my opinion it's not a good idea to return a dchar if you know the both the input and output of the function are a char.


> If you really want to do that, then cast.

On the other hand casts in D have a certain risk, so reducing their number as much as possible is a good idea.


> It should _not_ be encouraged.

This is silly, see the above explanation.


> 2. It would be disastrous if std.ascii's funtions didn't work on unicode.
> Right now, you can use them with ranges on strings which are unicode, which
> can be very useful. [...] but regardless, std.ascii is designed such that its
> functions will all operate on unicode strings. It just doesn't alter unicode
> characters and returns false for them with any of the query functions.

I see, and I didn't know this, I have misunderstood. I have thought of std.ascii functions as functions meant to work on just ASCII characters/text. But they are better defined as Unicode-passing functions. And yeah, it's written at the top of the module:

>Functions which operate on ASCII characters. All of the functions in std.ascii accept unicode characters but effectively ignore them. All isX functions return false for unicode characters, and all toX functions do nothing to unicode characters.<

So now I'd like a new set of functions designed for ASCII text, with contracts to refuse not-ASCII things ;-)

Thank you for the answers Jonathan.

Bye,
bearophile
September 21, 2012
On Thursday, 20 September 2012 at 17:32:52 UTC, bearophile wrote:
> Jonathan M Davis:
>>Functions which operate on ASCII characters. All of the functions in std.ascii accept unicode characters but effectively ignore them. All isX functions return false for unicode characters, and all toX functions do nothing to unicode characters.<
>
> So now I'd like a new set of functions designed for ASCII text, with contracts to refuse not-ASCII things ;-)
>
> Thank you for the answers Jonathan.
>
> Bye,
> bearophile

What do you (you two) think of my proposition for a "std.strictascii" module?

The signatures would be:
char toLower(dchar c);

And the implementations be like:

----
char toLower(dchar c)
in
{
    assert(c.std.ascii.isAscii());
}
body
{
    cast(char) c.std.ascii.toLower();
}
----

The rational for taking a dchar as input is so that it's own input can be correctly validated, and so that it can easilly operate with foreach etc, doing the cast internally. The returned value would be pre-cast to char.

Usage:

----
import std.stdio;
import std.strictascii;

void main(){
    string s1 = "axbacxf";
    string s2 = "àxbécxf";
    char[] cs = new char[](7);

    //bearophile use case: no casts
    foreach(i, c; s1)
        cs[i] = c.toUpper();

    //illegal use case: correct input validation
    foreach(i, c; s1)
        cs[i] = c.toUpper(); //in assert
}
----

It doesn't add *much* functionality, and arguably, it is a specialized functionality, but there are usecases where you want to operate ONLY on ascii, as pointed out by bearophile.

Just curious if I should even consider investing some effort in this.
« First   ‹ Prev
1 2