UNICODE operators (page 3) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » D » UNICODE operators (page 3)

December 04, 2003

Re: UNICODE operators

Posted by Elias Martenson
in reply to Hauke Duden

Elias Martenson

Posted in reply to Hauke Duden

Den Thu, 04 Dec 2003 01:44:25 +0100 skrev Hauke Duden:

>> Win95 is dying, if not dead, for development purposes.
> 
> Win95 is close to dead: about 2% of our customers. But we still have 30% customers using Win98 or WinME.
> 
> And I'm sure there are lots of Unix systems that would also have their problems with this - having been invented when ASCII ruled the world and Unicode didn't even exist.

Unix has pretty much settled on using UTF-8 for external representation and before long all text files in Unix will be UTF-8 instead of some local encoding.

Here's a quote from the excellent UTF-8 for Unix FAQ
(http://www.cl.cam.ac.uk/~mgk25/unicode.html):

"With the UTF-8 encoding, Unicode can be used in a convenient and backwards compatible way in environments that, like Unix, were designed entirely around ASCII. UTF-8 is the way in which Unicode is used under Unix, Linux, and similar systems. It is now time to make sure that you are well familiar with it and that your software supports UTF-8 smoothly."

Regards

Elias

December 04, 2003

Re: UNICODE operators

Posted by Roald Ribe
in reply to Hauke Duden

Roald Ribe

Posted in reply to Hauke Duden

> > UNICODE support files for Win95 -> Me
> >
> > Microsoft Layer for Unicode on Windows 95/95/ME Systems (MSLU)
> >     version 1.0  (http://tinyurl.com/qynq)
> >
> > The question at hand is: is D going to be a language of the future, for all languages, all over the globe, or will it be a conservative backward looking effort?
>
> The MSLU is just a layer above the normal ANSI API. It converts all Unicode strings to ANSI before passing it to functions and converts the results back to Unicode afterwards.
>
> That means that Unicode characters that cannot be represented in the current (ANSI) code page will just be replaced with '?', or whatever the conversion routines use in such a case.

Yes, that is true. But it also means that if the user/admin has set up the correct codepage/fonts for the language they work in, the application using the API will not need to know what codepage that is, it will just work with UNICODE. (openoffice.org uses this system on older Win9X platforms)

It is a stop gap measure to allow modern programs run on older platforms, not the greatest invention since sliced bread ;-)

It would allow a full UNICODE D app to run unmodified on any of those systems, get full use of UNICODE on newer systems, and still just use one API.

Roald

December 04, 2003

Re: UNICODE operators

Posted by Hauke Duden
in reply to Roald Ribe

Hauke Duden

Posted in reply to Roald Ribe

Roald Ribe wrote:
>>That means that Unicode characters that cannot be represented in the
>>current (ANSI) code page will just be replaced with '?', or whatever the
>>conversion routines use in such a case.
> 
> 
> Yes, that is true. But it also means that if the user/admin has set
> up the correct codepage/fonts for the language they work in, the
> application using the API will not need to know what codepage that
> is, it will just work with UNICODE. (openoffice.org uses this
> system on older Win9X platforms)
> 
> It is a stop gap measure to allow modern programs run on older
> platforms, not the greatest invention since sliced bread ;-)
> 
> It would allow a full UNICODE D app to run unmodified on any
> of those systems, get full use of UNICODE on newer systems,
> and still just use one API.

That was not the topic of this discussion. My point was that we shouldn't use Unicode characters for something as essential to the language as operators, because then the code will only be readable if your editor/OS uses a code page that happens to contain these symbols.

Creating Unicode applications in D is a completely different thing (and it was/is already discussed in a different thread).

Hauke

December 04, 2003

Re: UNICODE operators

Posted by Sean L. Palmer
in reply to Elias Martenson

Sean L. Palmer

Posted in reply to Elias Martenson

Right.  And the OS should provide at least one font that has every single unicode character, for use as fallback for fonts that are missing such characters.

Sean

"Elias Martenson" <no@spam.spam> wrote in message news:pan.2003.12.04.11.26.05.375275@spam.spam...
> Den Thu, 04 Dec 2003 01:44:25 +0100 skrev Hauke Duden:
>
> >> Win95 is dying, if not dead, for development purposes.
> >
> > Win95 is close to dead: about 2% of our customers. But we still have 30% customers using Win98 or WinME.
> >
> > And I'm sure there are lots of Unix systems that would also have their problems with this - having been invented when ASCII ruled the world and Unicode didn't even exist.
>
> Unix has pretty much settled on using UTF-8 for external representation and before long all text files in Unix will be UTF-8 instead of some local encoding.
>
> Here's a quote from the excellent UTF-8 for Unix FAQ
> (http://www.cl.cam.ac.uk/~mgk25/unicode.html):
>
> "With the UTF-8 encoding, Unicode can be used in a convenient and backwards compatible way in environments that, like Unix, were designed entirely around ASCII. UTF-8 is the way in which Unicode is used under Unix, Linux, and similar systems. It is now time to make sure that you are well familiar with it and that your software supports UTF-8 smoothly."
>
> Regards
>
> Elias

December 04, 2003

Re: UNICODE operators

Posted by Elias Martenson
in reply to Sean L. Palmer

Elias Martenson

Posted in reply to Sean L. Palmer

Den Thu, 04 Dec 2003 10:56:46 -0800 skrev Sean L. Palmer:

> Right.  And the OS should provide at least one font that has every single unicode character, for use as fallback for fonts that are missing such characters.

Yes it certainly should. Now, my Linux installationlacks fonts for a large set of the unihan code points, but other than that I have most of them.

In fact, I think that almost all existing installed operating systems today would be able to handle unicode operators. However, I think the problem with them is more related to the fact that you more than likely will need a special editor for the code (at least if you don't want to try to remember all the \u-codes for the operators).

Unicode is very important, as I have pointed out several times in the other unicode thread, but it deals with strings in the language. Not the source code itself.

Do I think the designers of Java made a mistake when support unicode in it's symbols? A few years ago I would have said yes. Now, I say that it really didn't matter. People don't use unicode symbols anyway. Therefore, I believe that this discussion is a non-issue. EVen if unicode operatos would be supported, I doubdt people would use them in the name of interoperability.

Regards

Elias

December 05, 2003

Re: UNICODE operators

Posted by Sean L. Palmer
in reply to Elias Martenson

Sean L. Palmer

Posted in reply to Elias Martenson

That's fine with me, so long as they are not expressly prohibited, I can use them for my own personal projects.  Support for them would then grow grassroots-style.  I have text editors that support Unicode, and I don't mind cutting and pasting.  Ease of entry is a minor issue to me.

The problem is, if we can't define new operators in D, and it doesn't provide enough overloadable builtin operators, I'm stuck.  I can do nothing but invest in a Unicode-aware preprocessor.  I want the option of moving forward.

What good is being able to compile D source encoded in UTF-8 if you aren't allowed to use any symbols that aren't in ASCII?  (except embedded in string literals)

Sean

"Elias Martenson" <no@spam.spam> wrote in message news:pan.2003.12.04.23.39.50.952964@spam.spam...
> Den Thu, 04 Dec 2003 10:56:46 -0800 skrev Sean L. Palmer:
>
> > Right.  And the OS should provide at least one font that has every
single
> > unicode character, for use as fallback for fonts that are missing such characters.
>
> Yes it certainly should. Now, my Linux installationlacks fonts for a large set of the unihan code points, but other than that I have most of them.
>
> In fact, I think that almost all existing installed operating systems today would be able to handle unicode operators. However, I think the problem with them is more related to the fact that you more than likely will need a special editor for the code (at least if you don't want to try to remember all the \u-codes for the operators).
>
> Unicode is very important, as I have pointed out several times in the other unicode thread, but it deals with strings in the language. Not the source code itself.
>
> Do I think the designers of Java made a mistake when support unicode in it's symbols? A few years ago I would have said yes. Now, I say that it really didn't matter. People don't use unicode symbols anyway. Therefore, I believe that this discussion is a non-issue. EVen if unicode operatos would be supported, I doubdt people would use them in the name of interoperability.
>
> Regards
>
> Elias

December 05, 2003

Re: [OT] UNICODE operators

Posted by J C Calvarese
in reply to Sean L. Palmer

J C Calvarese

Posted in reply to Sean L. Palmer

Attachments:

utf_8.d

Sean L. Palmer wrote:
> That's fine with me, so long as they are not expressly prohibited, I can use them for my own personal projects.  Support for them would then grow grassroots-style.  I have text editors that support Unicode, and I don't mind cutting and pasting.  Ease of entry is a minor issue to me.
> 
> The problem is, if we can't define new operators in D, and it doesn't provide enough overloadable builtin operators, I'm stuck.  I can do nothing but invest in a Unicode-aware preprocessor.  I want the option of moving forward.
> 
> What good is being able to compile D source encoded in UTF-8 if you aren't allowed to use any symbols that aren't in ASCII?  (except embedded in string literals)

Actually, since DMD 0.74 non-ASCII characters (as long they are "unicode alpha") are allowed as identifier names.  (See the attached example.) Also, comments can contain any non-ASCII character.

I do think Unicode operators is an interesting idea.


Justin

> 
> Sean

December 05, 2003

Re: [OT] UNICODE operators

Posted by Sean L. Palmer
in reply to J C Calvarese

Sean L. Palmer

Posted in reply to J C Calvarese

Yeah, just have to set this "free" browser to Encoding... Unicode UTF-8

That's pretty cool.  Pretty cool indeed.

I bet you if I cut and paste some D program made by someone is a far-away land, into some web-based translator engine it would probably not do that bad of a job of translating the identifiers back into english again ;)

Most likely, I'll rarely if ever see any source written in some other language, and if I did, I'd just consider it obfuscation.  It's not a sin punishable by death.

I think it's cool that finally people can more or less program in their own language, once they learn the english keywords.  A preprocessor would allow even those to be replaced.

In fact, whose idea was it to allow infix notation for regular identifiers? We could use a preprocessor to translate our D + Unicode Symbols into D that will actually compile.  ;)  Right now it would only work with prefix (lisp-like) notation, however.

They have some really interesting brackets in Unicode, as well.  Surely there's one just begging to be used for template syntax.

Sean

"J C Calvarese" <jcc7@cox.net> wrote in message news:bqpbqo$8no$1@digitaldaemon.com...
> Sean L. Palmer wrote:
> > That's fine with me, so long as they are not expressly prohibited, I can
use
> > them for my own personal projects.  Support for them would then grow grassroots-style.  I have text editors that support Unicode, and I don't mind cutting and pasting.  Ease of entry is a minor issue to me.
> >
> > The problem is, if we can't define new operators in D, and it doesn't provide enough overloadable builtin operators, I'm stuck.  I can do
nothing
> > but invest in a Unicode-aware preprocessor.  I want the option of moving forward.
> >
> > What good is being able to compile D source encoded in UTF-8 if you
aren't
> > allowed to use any symbols that aren't in ASCII?  (except embedded in
string
> > literals)
>
> Actually, since DMD 0.74 non-ASCII characters (as long they are "unicode
> alpha") are allowed as identifier names.  (See the attached example.)
> Also, comments can contain any non-ASCII character.
>
> I do think Unicode operators is an interesting idea.
>
>
> Justin
>
> >
> > Sean
>

----------------------------------------------------------------------------
----

>
>
> const char[] Sí = "yes";
> const char[] Año = "year";
>
> /+
>
> These don't work (it might be because they are iconic symbols rather than
part of any actual language)
> const char[] ???? = "box drawing";
> const char[] ???? = "cards";
>
> +/
>
>
> int main()
> {
>
>   int AñoNúmero = 2003;
>   int Cyrillic???? = 1;
>   int Hebrew?????;
>
>   printf("%d", AñoNúmero);
>
>   return 0;
> }

December 05, 2003

Re: [OT] UNICODE operators

Posted by Elias Martenson
in reply to J C Calvarese

Elias Martenson

Posted in reply to J C Calvarese

Den Fri, 05 Dec 2003 01:34:19 -0600 skrev J C Calvarese:

> Actually, since DMD 0.74 non-ASCII characters (as long they are "unicode alpha") are allowed as identifier names.  (See the attached example.) Also, comments can contain any non-ASCII character.

Neat. Although your newsreader didn't include a proper encoding header. Not your fault, but rather the broken software. :-)

Regards

Elias

December 05, 2003

Re: [OT] UNICODE operators

Posted by Mark J. Brudnak
in reply to J C Calvarese

Mark J. Brudnak

Posted in reply to J C Calvarese

"J C Calvarese" <jcc7@cox.net> wrote


<snip>

> Actually, since DMD 0.74 non-ASCII characters (as long they are "unicode
> alpha") are allowed as identifier names.  (See the attached example.)
> Also, comments can contain any non-ASCII character.
>

I think only "letter-like" unicode characters should be allowed in D identifiers.  Having variables like

int  ï»¿ = 42 ;
float ±×§ =3.14159 ;

will really confuse things.  Punctuation, shapes, boxdrawing, dingbats, math symbols, should be prohibited from being used in identifiers.

> I do think Unicode operators is an interesting idea.
>
>
> Justin
>
> >
> > Sean
>


----------------------------------------------------------------------------
----


> ï»¿
>
> const char[] SÃ = "yes";
> const char[] AÃ±o = "year";
>
> /+
>
> These don't work (it might be because they are iconic symbols rather than
part of any actual language)
> const char[] â. â.¢â.¦â.¬ = "box drawing";
> const char[] âT âT¥âT£âT¦ = "cards";
>
> +/
>
>
> int main()
> {
>
>   int AÃ±oNÃºmero = 2003;
>   int CyrillicÒ-Ñ?Ò"Ò± = 1;
>   int Hebrew××"×Y×£×§;
>
>   printf("%d", AÃ±oNÃºmero);
>
>   return 0;
> }

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation