Jump to page: 1 2 3
Thread overview
char vs ascii
Aug 15, 2001
Walter
Aug 15, 2001
Jan Knepper
Aug 15, 2001
Erik Funkenbusch
Aug 15, 2001
Walter
Aug 15, 2001
Ivan Frohne
Aug 16, 2001
Walter
Aug 16, 2001
Jan Knepper
Aug 16, 2001
Sheldon Simms
Aug 17, 2001
Walter
Apr 29, 2002
c. keith ray
Apr 30, 2002
Walter
Apr 30, 2002
Keith Ray
Apr 30, 2002
Walter
May 01, 2002
Keith Ray
May 03, 2002
Walter
Aug 16, 2001
Tobias Weingartner
Aug 16, 2001
Jeff Frohwein
Aug 17, 2001
Charles Hixson
Aug 17, 2001
Walter
Aug 17, 2001
Walter
Aug 17, 2001
Walter
Aug 17, 2001
Kent Sandvik
Aug 17, 2001
Russ Lewis
Aug 18, 2001
Kent Sandvik
Aug 20, 2001
Tobias Weingartner
Aug 21, 2001
Walter
Aug 22, 2001
Tobias Weingartner
August 15, 2001
What do people think about using the keyword:

    ascii or char?
    unicode or wchar?

-Walter


August 15, 2001
I guess ascii makes more sence than char and unicode makes more sence than wchar or wchar_t...



Walter wrote:

> What do people think about using the keyword:
>
>     ascii or char?
>     unicode or wchar?
>
> -Walter

August 15, 2001
Just some suggestions that come to mind, in no particular order or coherance:

No, ascii makes little sense.  ascii refers explicitly to one character set. There are many 8 bit character sets or locales or code pages or whatever you want to call them.

Also, unicode can be 8 bit or 16 bit, and there is talk of a 32 bit as well in the future.  I think any language that expects to stick around for any length of time needs to address the forward compatibility of new code sets.

I'd much rather see a way to define your character type and use it throughout your program.  Also remember that you might be creating an application that needs to display multiple character sets simultaneously (for instance, both English and Japanese).

Now, while much of this will be OS specific, and doesn't belong in a language, you at least need some way to deal with such things cleanly in that language.  char_t and wchar_t do not provide specific sizes, but can be implementation defined.

I'd say define the types.  char8 and char16, this allows char32 or char64 (or char12 for that matter, remember that some CPU's have non-standard word sizes).

An alternative would be a syntax like char(8) or char(16), perhaps even a
simple "char" and a modifier like "unicode(16) char"

Finally, I might suggest doing away with char all together and making the entire language unicode.  On platforms that don't support it, provide a seamless mapping mechanism to downconvert 16 bit chars to 8 bit.

"Jan Knepper" <jan@smartsoft.cc> wrote in message news:3B79CF33.94F71602@smartsoft.cc...
> I guess ascii makes more sence than char and unicode makes more sence than wchar or wchar_t...
>
> Walter wrote:
>
> > What do people think about using the keyword:
> >
> >     ascii or char?
> >     unicode or wchar?



August 15, 2001
"Erik Funkenbusch" <erikf@seahorsesoftware.com> wrote in message news:9lcsqr$2s9p$1@digitaldaemon.com...
> Just some suggestions that come to mind, in no particular order or
> coherance:
> No, ascii makes little sense.  ascii refers explicitly to one character
set.
> There are many 8 bit character sets or locales or code pages or whatever
you
> want to call them.

Yes, I think it should just be called "char" and it will be an unsigned 8 bit type.

> Also, unicode can be 8 bit or 16 bit, and there is talk of a 32 bit as
well
> in the future.  I think any language that expects to stick around for any length of time needs to address the forward compatibility of new code
sets.

32 bit wchar_t's are a reality on linux now. I think it will work out best to just make a wchar type and it will map to whatever the wchar_t is for the local native C compiler.


> I'd much rather see a way to define your character type and use it throughout your program.  Also remember that you might be creating an application that needs to display multiple character sets simultaneously (for instance, both English and Japanese).

I've found I've wanted to support both ascii and unicode simultaneously in programs, hence I thought two different types was appropriate. I was constantly irritated by having to go through and either subtract or add L's in front of the strings. The macros to do it automatically are ugly. Hence, the idea that the string literals should be implicitly convertible to either char[] or wchar[].

Next, there is the D typedef facility, which actually does introduce a new,
overloadable type. So, you could:
    typedef char mychar;
or
    typedef wchar mychar;
and through the magic of overloading <g> the rest of the code should not
need changing.


> Now, while much of this will be OS specific, and doesn't belong in a language, you at least need some way to deal with such things cleanly in that language.  char_t and wchar_t do not provide specific sizes, but can
be
> implementation defined.
> I'd say define the types.  char8 and char16, this allows char32 or char64
> (or char12 for that matter, remember that some CPU's have non-standard
word
> sizes).
> An alternative would be a syntax like char(8) or char(16), perhaps even a
> simple "char" and a modifier like "unicode(16) char"
> Finally, I might suggest doing away with char all together and making the
> entire language unicode.  On platforms that don't support it, provide a
> seamless mapping mechanism to downconvert 16 bit chars to 8 bit.

Java went the way of chucking ascii entirely. While that makes sense for a web language, I think for systems languages ascii is going to be around for a long time, so might as well make it easy to deal with! Ascii is really never going to be anything but an 8 bit type - it is unicode with the varying size. Hence I think having a wchar type of a varying size is the way to go.



August 15, 2001
There's something clean and neat about calling things
what they are.  Instead of larding up your code with

    typedef char ascii
    typedef wchar unicode

why not just use 'ascii' and 'unicode' in the first place? Save the typedefs for

    typedef ascii ebcdic

Now, about that cast notation ....


--Ivan Frohne


August 16, 2001
I suspect that ascii and unicode are trademarked names!

"Ivan Frohne" <frohne@gci.net> wrote in message news:9lf20l$11og$1@digitaldaemon.com...
> There's something clean and neat about calling things
> what they are.  Instead of larding up your code with
>
>     typedef char ascii
>     typedef wchar unicode
>
> why not just use 'ascii' and 'unicode' in the first place? Save the typedefs for
>
>     typedef ascii ebcdic
>
> Now, about that cast notation ....
>
>
> --Ivan Frohne
>
>


August 16, 2001
<g>
I had not thought of that one!

Jan



Walter wrote:

> I suspect that ascii and unicode are trademarked names!

August 16, 2001
In article <9lchvd$2miu$1@digitaldaemon.com>, Walter wrote:
> What do people think about using the keyword:
> 
>     ascii or char?
>     unicode or wchar?


Ascii makes little sense.  In most cases where it is used (other than for strings) is to get a "byte".  Since you have a byte type, char is sort of redundant.  IMHO it would be better to extend the string type (unicode, etc) to be able to specify a restricted subset.  Unicode would be the superset (for strings, and the default if not contrained), and some other things (unicode.byte[10] string_of_10_byte_sized_positions) for restricting the type of "string" you have.

-- 
Tobias Weingartner |        Unix Guru, Admin, Systems-Dude
Apt B 7707-110 St. |        http://www.tepid.org/~weingart/
Edmonton, AB       |-------------------------------------------------
Canada, T6G 1G3    | %SYSTEM-F-ANARCHISM, The OS has been overthrown
August 16, 2001
Im Artikel <9levtq$10ji$1@digitaldaemon.com> schrieb "Walter" <walter@digitalmars.com>:

> I've found I've wanted to support both ascii and unicode simultaneously in programs, hence I thought two different types was appropriate. I was constantly irritated by having to go through and either subtract or add L's in front of the strings. The macros to do it automatically are ugly. Hence, the idea that the string literals should be implicitly convertible to either char[] or wchar[].

Well it seems to be that you already have standard sizes integral types: byte, short, int, long.

Why not make char be a 2 or 4-byte unicode char and use the syntax

byte[] str = "My ASCII string";

for ascii?

-- 
Sheldon Simms / sheldon@semanticedge.com
August 16, 2001
Walter wrote:
> 
> What do people think about using the keyword:
> 
>     ascii or char?
>     unicode or wchar?

 I personally think C might have started a bad habit by using
types that were generally vague in nature. All I ask is that
simplicity be given impartial consideration. Since we are all
used to seeing types such as short, long, and int in code,
perhaps it would be better for all of us to spend some time
thinking about the following types rather than form an immediate
opinion. I can easily identify with the fact that any unfamiliar
looking types can look highly offensive to the newly or barely
acquainted, as they did to me at one time:

 u8,s8,u16,s16,u32,s32,...

 Some will be adamantly opposed because they don't use these,
or know anyone that does. SGI, for one, has used these types
for Nintendo 64 development and now Nintendo is using them
for GameBoy Advance development. There are probably others...

 As 128 bit and 256 bit systems are released, adding new types
would be as easy as u128,s128,u256,s256... rather than have to
consider something like "long long long long", or a new name in
general. Those that want to use vague types can always typedef
their own types.

 Thanks for listening, :)

 Jeff
« First   ‹ Prev
1 2 3