Thread overview
[phobos] std.ctype vs std.string
May 23, 2011
Jonathan M Davis
May 23, 2011
David Nadlinger
May 23, 2011
Jonathan M Davis
May 23, 2011
Walter Bright
May 23, 2011
Jonathan M Davis
May 25, 2011
Jesse Phillips
May 25, 2011
Jonathan M Davis
May 22, 2011
std.ctype and std.string overlap. std.ctype defines functions for determining the type of a character which are in standard C - including the non-camelcased names and return int instead of bool. std.string contains all the string stuff (which std.ctype doesn't have) as well as some character-specific stuff. It has hexdigits, digits, etc. which give the characters which return true (or non-zero at the moment) for the various functions in std.ctype, and it defines some functions similar to those in std.ctype. In fact, std.string defines iswhite and std.ctype defines isspace - both of which do the same thing with different implementations. So, I really think that their common functionality needs to be refactored.

Would anyone be opposed to my moving the pieces of std.string which are similar to std.ctype's functionality (hexdigits, letters, whitespace, iswhite, etc.) into std.ctype and fixing the std.ctype functions so that they're names are properly camelcased and return bool (obviously, I'd leave in the old stuff as scheduled for deprecation)?

- Jonathan M Davis
May 23, 2011
On 5/23/11 7:25 AM, Jonathan M Davis wrote:
> Would anyone be opposed to my moving the pieces of std.string which are similar to std.ctype's functionality (hexdigits, letters, whitespace, iswhite, etc.) into std.ctype and fixing the std.ctype functions so that they're names are properly camelcased and return bool (obviously, I'd leave in the old stuff as scheduled for deprecation)?

I haven't really thought about what would be the cleanest solution, but I agree that the current state is quite a mess. You might want to take std.uni in consideration as well, which contains some lonely Unicode character classification functions.

David

May 22, 2011
On 2011-05-22 22:39, David Nadlinger wrote:
> On 5/23/11 7:25 AM, Jonathan M Davis wrote:
> > Would anyone be opposed to my moving the pieces of std.string which are similar to std.ctype's functionality (hexdigits, letters, whitespace, iswhite, etc.) into std.ctype and fixing the std.ctype functions so that they're names are properly camelcased and return bool (obviously, I'd leave in the old stuff as scheduled for deprecation)?
> 
> I haven't really thought about what would be the cleanest solution, but I agree that the current state is quite a mess. You might want to take std.uni in consideration as well, which contains some lonely Unicode character classification functions.

Yeah. I think that the unicode-specific stuff should go in std.uni and the ASCII stuff should go in std.ctype. So, for instance, std.string.LS and std.string.PS should be moved to std.uni, whereas std.string.digits and std.string.newline should be moved to std.ctype.

- Jonathan M Davis
May 23, 2011

On 5/22/2011 10:25 PM, Jonathan M Davis wrote:
>
> Would anyone be opposed to my moving the pieces of std.string which are similar to std.ctype's functionality (hexdigits, letters, whitespace, iswhite, etc.) into std.ctype and fixing the std.ctype functions so that they're names are properly camelcased and return bool (obviously, I'd leave in the old stuff as scheduled for deprecation)?
>
>

I understand the sentiment, but I don't like breaking existing code. It really annoys people who have to go back and constantly rename things in their production code.
May 23, 2011
On 2011-05-23 00:30, Walter Bright wrote:
> On 5/22/2011 10:25 PM, Jonathan M Davis wrote:
> > Would anyone be opposed to my moving the pieces of std.string which are similar to std.ctype's functionality (hexdigits, letters, whitespace, iswhite, etc.) into std.ctype and fixing the std.ctype functions so that they're names are properly camelcased and return bool (obviously, I'd leave in the old stuff as scheduled for deprecation)?
> 
> I understand the sentiment, but I don't like breaking existing code. It really annoys people who have to go back and constantly rename things in their production code.

Well, that's why the old versions would still be there, scheduled for deprecation. And that's why we should be doing it sooner rather than later. The sooner it gets done, the less code that it will break.

- Jonathan M Davis
May 25, 2011
Personally I have never used std.ctype and would never guess what is actually in there. I thought it would be more like core.stdc.types (or whatever it is called).

On Sun, May 22, 2011 at 10:25 PM, Jonathan M Davis <jmdavisProg at gmx.com> wrote:
> std.ctype and std.string overlap. std.ctype defines functions for determining the type of a character which are in standard C - including the non-camelcased names and return int instead of bool. std.string contains all the string stuff (which std.ctype doesn't have) as well as some character-specific stuff. It has hexdigits, digits, etc. which give the characters which return true (or non-zero at the moment) for the various functions in std.ctype, and it defines some functions similar to those in std.ctype. In fact, std.string defines iswhite and std.ctype defines isspace - both of which do the same thing with different implementations. So, I really think that their common functionality needs to be refactored.
>
> Would anyone be opposed to my moving the pieces of std.string which are similar to std.ctype's functionality (hexdigits, letters, whitespace, iswhite, etc.) into std.ctype and fixing the std.ctype functions so that they're names are properly camelcased and return bool (obviously, I'd leave in the old stuff as scheduled for deprecation)?
May 25, 2011
On 2011-05-25 13:19, Jesse Phillips wrote:
> Personally I have never used std.ctype and would never guess what is actually in there. I thought it would be more like core.stdc.types (or whatever it is called).

It is modeled after C's ctype.h, which I believe is meant to stand for "character type." It holds functions for querying about the type of a character (digit, hex digit, uppercase letter, etc.). It's all ASCII-specific. std.uni holds what corresponding unicode functions we have. So, essentially, std.ctype has functions for operating on ASCII characters, std.uni has functions for operating on unicode characters, and std.string has functions for operating on strings. std.ascii would probably be better than std.ctype, but we already have std.ctype, and C or C++ folks may recognize it. But std.ctype holds pure D functions (albeit based on their C counterparts - including return int instead of bool for for true and false - though they do take dchar, not char), so they wouldn't be in core.stdc. However, over time, some std.string seems to have taken on some functionality which makes more sense for std.ctype or std.uni, so they should be better organized.

I'm moving the functions which operate on dchar instead of strings to std.ctype and std.uni (depending on whether they're ASCII-specific or deal with unicode). I'm also fixing the names in std.string and std.ctype so that they're properly camelcased. A previous discussion on revamping std.string a few months ago made it quite clear that the majority want those functions (and Phobos in general) to follow its own naming conventions consistently, and std.string and std.ctype are two of the major places that the function names aren't properly camelcased. So, I'm fixing that as well as trying to properly unicodify a few of the std.string functions which are still too ASCII- specific. All of the old functions will still be there as scheduled for deprecation for the time being though.

- Jonathan M Davis