std.ctype vs std.string

[phobos] std.ctype vs std.string

May 23, 2011

Jonathan M Davis

May 23, 2011

May 23, 2011

May 23, 2011

May 23, 2011

May 25, 2011

May 25, 2011

std.ctype and std.string overlap. std.ctype defines functions for determining the type of a character which are in standard C - including the non-camelcased names and return int instead of bool. std.string contains all the string stuff (which std.ctype doesn't have) as well as some character-specific stuff. It has hexdigits, digits, etc. which give the characters which return true (or non-zero at the moment) for the various functions in std.ctype, and it defines some functions similar to those in std.ctype. In fact, std.string defines iswhite and std.ctype defines isspace - both of which do the same thing with different implementations. So, I really think that their common functionality needs to be refactored. Would anyone be opposed to my moving the pieces of std.string which are similar to std.ctype's functionality (hexdigits, letters, whitespace, iswhite, etc.) into std.ctype and fixing the std.ctype functions so that they're names are properly camelcased and return bool (obviously, I'd leave in the old stuff as scheduled for deprecation)? - Jonathan M Davis

On 5/23/11 7:25 AM, Jonathan M Davis wrote: > Would anyone be opposed to my moving the pieces of std.string which are similar to std.ctype's functionality (hexdigits, letters, whitespace, iswhite, etc.) into std.ctype and fixing the std.ctype functions so that they're names are properly camelcased and return bool (obviously, I'd leave in the old stuff as scheduled for deprecation)? I haven't really thought about what would be the cleanest solution, but I agree that the current state is quite a mess. You might want to take std.uni in consideration as well, which contains some lonely Unicode character classification functions. David

On 2011-05-22 22:39, David Nadlinger wrote: > On 5/23/11 7:25 AM, Jonathan M Davis wrote: > > Would anyone be opposed to my moving the pieces of std.string which are similar to std.ctype's functionality (hexdigits, letters, whitespace, iswhite, etc.) into std.ctype and fixing the std.ctype functions so that they're names are properly camelcased and return bool (obviously, I'd leave in the old stuff as scheduled for deprecation)? > > I haven't really thought about what would be the cleanest solution, but I agree that the current state is quite a mess. You might want to take std.uni in consideration as well, which contains some lonely Unicode character classification functions. Yeah. I think that the unicode-specific stuff should go in std.uni and the ASCII stuff should go in std.ctype. So, for instance, std.string.LS and std.string.PS should be moved to std.uni, whereas std.string.digits and std.string.newline should be moved to std.ctype. - Jonathan M Davis

On 5/22/2011 10:25 PM, Jonathan M Davis wrote: > > Would anyone be opposed to my moving the pieces of std.string which are similar to std.ctype's functionality (hexdigits, letters, whitespace, iswhite, etc.) into std.ctype and fixing the std.ctype functions so that they're names are properly camelcased and return bool (obviously, I'd leave in the old stuff as scheduled for deprecation)? > > I understand the sentiment, but I don't like breaking existing code. It really annoys people who have to go back and constantly rename things in their production code.

On 2011-05-23 00:30, Walter Bright wrote: > On 5/22/2011 10:25 PM, Jonathan M Davis wrote: > > Would anyone be opposed to my moving the pieces of std.string which are similar to std.ctype's functionality (hexdigits, letters, whitespace, iswhite, etc.) into std.ctype and fixing the std.ctype functions so that they're names are properly camelcased and return bool (obviously, I'd leave in the old stuff as scheduled for deprecation)? > > I understand the sentiment, but I don't like breaking existing code. It really annoys people who have to go back and constantly rename things in their production code. Well, that's why the old versions would still be there, scheduled for deprecation. And that's why we should be doing it sooner rather than later. The sooner it gets done, the less code that it will break. - Jonathan M Davis

Personally I have never used std.ctype and would never guess what is actually in there. I thought it would be more like core.stdc.types (or whatever it is called). On Sun, May 22, 2011 at 10:25 PM, Jonathan M Davis <jmdavisProg at gmx.com> wrote: > std.ctype and std.string overlap. std.ctype defines functions for determining the type of a character which are in standard C - including the non-camelcased names and return int instead of bool. std.string contains all the string stuff (which std.ctype doesn't have) as well as some character-specific stuff. It has hexdigits, digits, etc. which give the characters which return true (or non-zero at the moment) for the various functions in std.ctype, and it defines some functions similar to those in std.ctype. In fact, std.string defines iswhite and std.ctype defines isspace - both of which do the same thing with different implementations. So, I really think that their common functionality needs to be refactored. > > Would anyone be opposed to my moving the pieces of std.string which are similar to std.ctype's functionality (hexdigits, letters, whitespace, iswhite, etc.) into std.ctype and fixing the std.ctype functions so that they're names are properly camelcased and return bool (obviously, I'd leave in the old stuff as scheduled for deprecation)?

May 25, 2011

[phobos] std.ctype vs std.string

Posted by Jonathan M Davis
in reply to Jesse Phillips

Permalink

Jonathan M Davis

Posted in reply to Jesse Phillips

Permalink

On 2011-05-25 13:19, Jesse Phillips wrote:
> Personally I have never used std.ctype and would never guess what is actually in there. I thought it would be more like core.stdc.types (or whatever it is called).

It is modeled after C's ctype.h, which I believe is meant to stand for "character type." It holds functions for querying about the type of a character (digit, hex digit, uppercase letter, etc.). It's all ASCII-specific. std.uni holds what corresponding unicode functions we have. So, essentially, std.ctype has functions for operating on ASCII characters, std.uni has functions for operating on unicode characters, and std.string has functions for operating on strings. std.ascii would probably be better than std.ctype, but we already have std.ctype, and C or C++ folks may recognize it. But std.ctype holds pure D functions (albeit based on their C counterparts - including return int instead of bool for for true and false - though they do take dchar, not char), so they wouldn't be in core.stdc. However, over time, some std.string seems to have taken on some functionality which makes more sense for std.ctype or std.uni, so they should be better organized.

I'm moving the functions which operate on dchar instead of strings to std.ctype and std.uni (depending on whether they're ASCII-specific or deal with unicode). I'm also fixing the names in std.string and std.ctype so that they're properly camelcased. A previous discussion on revamping std.string a few months ago made it quite clear that the majority want those functions (and Phobos in general) to follow its own naming conventions consistently, and std.string and std.ctype are two of the major places that the function names aren't properly camelcased. So, I'm fixing that as well as trying to properly unicodify a few of the std.string functions which are still too ASCII- specific. All of the old functions will still be there as scheduled for deprecation for the time being though.

- Jonathan M Davis

Forums