char vs ascii

I guess ascii makes more sence than char and unicode makes more sence than wchar or wchar_t... Walter wrote: > What do people think about using the keyword: > > ascii or char? > unicode or wchar? > > -Walter

Just some suggestions that come to mind, in no particular order or coherance: No, ascii makes little sense. ascii refers explicitly to one character set. There are many 8 bit character sets or locales or code pages or whatever you want to call them. Also, unicode can be 8 bit or 16 bit, and there is talk of a 32 bit as well in the future. I think any language that expects to stick around for any length of time needs to address the forward compatibility of new code sets. I'd much rather see a way to define your character type and use it throughout your program. Also remember that you might be creating an application that needs to display multiple character sets simultaneously (for instance, both English and Japanese). Now, while much of this will be OS specific, and doesn't belong in a language, you at least need some way to deal with such things cleanly in that language. char_t and wchar_t do not provide specific sizes, but can be implementation defined. I'd say define the types. char8 and char16, this allows char32 or char64 (or char12 for that matter, remember that some CPU's have non-standard word sizes). An alternative would be a syntax like char(8) or char(16), perhaps even a simple "char" and a modifier like "unicode(16) char" Finally, I might suggest doing away with char all together and making the entire language unicode. On platforms that don't support it, provide a seamless mapping mechanism to downconvert 16 bit chars to 8 bit. "Jan Knepper" <jan@smartsoft.cc> wrote in message news:3B79CF33.94F71602@smartsoft.cc... > I guess ascii makes more sence than char and unicode makes more sence than wchar or wchar_t... > > Walter wrote: > > > What do people think about using the keyword: > > > > ascii or char? > > unicode or wchar?

August 15, 2001

Re: char vs ascii

Posted by Walter
in reply to Erik Funkenbusch

Permalink

Walter

Posted in reply to Erik Funkenbusch

Permalink

"Erik Funkenbusch" <erikf@seahorsesoftware.com> wrote in message news:9lcsqr$2s9p$1@digitaldaemon.com...
> Just some suggestions that come to mind, in no particular order or
> coherance:
> No, ascii makes little sense.  ascii refers explicitly to one character
set.
> There are many 8 bit character sets or locales or code pages or whatever
you
> want to call them.

Yes, I think it should just be called "char" and it will be an unsigned 8 bit type.

> Also, unicode can be 8 bit or 16 bit, and there is talk of a 32 bit as
well
> in the future.  I think any language that expects to stick around for any length of time needs to address the forward compatibility of new code
sets.

32 bit wchar_t's are a reality on linux now. I think it will work out best to just make a wchar type and it will map to whatever the wchar_t is for the local native C compiler.


> I'd much rather see a way to define your character type and use it throughout your program.  Also remember that you might be creating an application that needs to display multiple character sets simultaneously (for instance, both English and Japanese).

I've found I've wanted to support both ascii and unicode simultaneously in programs, hence I thought two different types was appropriate. I was constantly irritated by having to go through and either subtract or add L's in front of the strings. The macros to do it automatically are ugly. Hence, the idea that the string literals should be implicitly convertible to either char[] or wchar[].

Next, there is the D typedef facility, which actually does introduce a new,
overloadable type. So, you could:
    typedef char mychar;
or
    typedef wchar mychar;
and through the magic of overloading <g> the rest of the code should not
need changing.


> Now, while much of this will be OS specific, and doesn't belong in a language, you at least need some way to deal with such things cleanly in that language.  char_t and wchar_t do not provide specific sizes, but can
be
> implementation defined.
> I'd say define the types.  char8 and char16, this allows char32 or char64
> (or char12 for that matter, remember that some CPU's have non-standard
word
> sizes).
> An alternative would be a syntax like char(8) or char(16), perhaps even a
> simple "char" and a modifier like "unicode(16) char"
> Finally, I might suggest doing away with char all together and making the
> entire language unicode.  On platforms that don't support it, provide a
> seamless mapping mechanism to downconvert 16 bit chars to 8 bit.

Java went the way of chucking ascii entirely. While that makes sense for a web language, I think for systems languages ascii is going to be around for a long time, so might as well make it easy to deal with! Ascii is really never going to be anything but an 8 bit type - it is unicode with the varying size. Hence I think having a wchar type of a varying size is the way to go.

There's something clean and neat about calling things what they are. Instead of larding up your code with typedef char ascii typedef wchar unicode why not just use 'ascii' and 'unicode' in the first place? Save the typedefs for typedef ascii ebcdic Now, about that cast notation .... --Ivan Frohne

I suspect that ascii and unicode are trademarked names! "Ivan Frohne" <frohne@gci.net> wrote in message news:9lf20l$11og$1@digitaldaemon.com... > There's something clean and neat about calling things > what they are. Instead of larding up your code with > > typedef char ascii > typedef wchar unicode > > why not just use 'ascii' and 'unicode' in the first place? Save the typedefs for > > typedef ascii ebcdic > > Now, about that cast notation .... > > > --Ivan Frohne > >

In article <9lchvd$2miu$1@digitaldaemon.com>, Walter wrote: > What do people think about using the keyword: > > ascii or char? > unicode or wchar? Ascii makes little sense. In most cases where it is used (other than for strings) is to get a "byte". Since you have a byte type, char is sort of redundant. IMHO it would be better to extend the string type (unicode, etc) to be able to specify a restricted subset. Unicode would be the superset (for strings, and the default if not contrained), and some other things (unicode.byte[10] string_of_10_byte_sized_positions) for restricting the type of "string" you have. -- Tobias Weingartner | Unix Guru, Admin, Systems-Dude Apt B 7707-110 St. | http://www.tepid.org/~weingart/ Edmonton, AB |------------------------------------------------- Canada, T6G 1G3 | %SYSTEM-F-ANARCHISM, The OS has been overthrown

Im Artikel <9levtq$10ji$1@digitaldaemon.com> schrieb "Walter" <walter@digitalmars.com>: > I've found I've wanted to support both ascii and unicode simultaneously in programs, hence I thought two different types was appropriate. I was constantly irritated by having to go through and either subtract or add L's in front of the strings. The macros to do it automatically are ugly. Hence, the idea that the string literals should be implicitly convertible to either char[] or wchar[]. Well it seems to be that you already have standard sizes integral types: byte, short, int, long. Why not make char be a 2 or 4-byte unicode char and use the syntax byte[] str = "My ASCII string"; for ascii? -- Sheldon Simms / sheldon@semanticedge.com

Walter wrote: > > What do people think about using the keyword: > > ascii or char? > unicode or wchar? I personally think C might have started a bad habit by using types that were generally vague in nature. All I ask is that simplicity be given impartial consideration. Since we are all used to seeing types such as short, long, and int in code, perhaps it would be better for all of us to spend some time thinking about the following types rather than form an immediate opinion. I can easily identify with the fact that any unfamiliar looking types can look highly offensive to the newly or barely acquainted, as they did to me at one time: u8,s8,u16,s16,u32,s32,... Some will be adamantly opposed because they don't use these, or know anyone that does. SGI, for one, has used these types for Nintendo 64 development and now Nintendo is using them for GameBoy Advance development. There are probably others... As 128 bit and 256 bit systems are released, adding new types would be as easy as u128,s128,u256,s256... rather than have to consider something like "long long long long", or a new name in general. Those that want to use vague types can always typedef their own types. Thanks for listening, :) Jeff

Forums