December 04, 2003
Den Thu, 04 Dec 2003 01:44:25 +0100 skrev Hauke Duden:

>> Win95 is dying, if not dead, for development purposes.
> 
> Win95 is close to dead: about 2% of our customers. But we still have 30% customers using Win98 or WinME.
> 
> And I'm sure there are lots of Unix systems that would also have their problems with this - having been invented when ASCII ruled the world and Unicode didn't even exist.

Unix has pretty much settled on using UTF-8 for external representation and before long all text files in Unix will be UTF-8 instead of some local encoding.

Here's a quote from the excellent UTF-8 for Unix FAQ
(http://www.cl.cam.ac.uk/~mgk25/unicode.html):

"With the UTF-8 encoding, Unicode can be used in a convenient and backwards compatible way in environments that, like Unix, were designed entirely around ASCII. UTF-8 is the way in which Unicode is used under Unix, Linux, and similar systems. It is now time to make sure that you are well familiar with it and that your software supports UTF-8 smoothly."

Regards

Elias

December 04, 2003
> > UNICODE support files for Win95 -> Me
> >
> > Microsoft Layer for Unicode on Windows 95/95/ME Systems (MSLU)
> >     version 1.0  (http://tinyurl.com/qynq)
> >
> > The question at hand is: is D going to be a language of the future, for all languages, all over the globe, or will it be a conservative backward looking effort?
>
> The MSLU is just a layer above the normal ANSI API. It converts all Unicode strings to ANSI before passing it to functions and converts the results back to Unicode afterwards.
>
> That means that Unicode characters that cannot be represented in the current (ANSI) code page will just be replaced with '?', or whatever the conversion routines use in such a case.

Yes, that is true. But it also means that if the user/admin has set up the correct codepage/fonts for the language they work in, the application using the API will not need to know what codepage that is, it will just work with UNICODE. (openoffice.org uses this system on older Win9X platforms)

It is a stop gap measure to allow modern programs run on older platforms, not the greatest invention since sliced bread ;-)

It would allow a full UNICODE D app to run unmodified on any of those systems, get full use of UNICODE on newer systems, and still just use one API.

Roald


December 04, 2003
Roald Ribe wrote:
>>That means that Unicode characters that cannot be represented in the
>>current (ANSI) code page will just be replaced with '?', or whatever the
>>conversion routines use in such a case.
> 
> 
> Yes, that is true. But it also means that if the user/admin has set
> up the correct codepage/fonts for the language they work in, the
> application using the API will not need to know what codepage that
> is, it will just work with UNICODE. (openoffice.org uses this
> system on older Win9X platforms)
> 
> It is a stop gap measure to allow modern programs run on older
> platforms, not the greatest invention since sliced bread ;-)
> 
> It would allow a full UNICODE D app to run unmodified on any
> of those systems, get full use of UNICODE on newer systems,
> and still just use one API.

That was not the topic of this discussion. My point was that we shouldn't use Unicode characters for something as essential to the language as operators, because then the code will only be readable if your editor/OS uses a code page that happens to contain these symbols.

Creating Unicode applications in D is a completely different thing (and it was/is already discussed in a different thread).

Hauke

December 04, 2003
Right.  And the OS should provide at least one font that has every single unicode character, for use as fallback for fonts that are missing such characters.

Sean

"Elias Martenson" <no@spam.spam> wrote in message news:pan.2003.12.04.11.26.05.375275@spam.spam...
> Den Thu, 04 Dec 2003 01:44:25 +0100 skrev Hauke Duden:
>
> >> Win95 is dying, if not dead, for development purposes.
> >
> > Win95 is close to dead: about 2% of our customers. But we still have 30% customers using Win98 or WinME.
> >
> > And I'm sure there are lots of Unix systems that would also have their problems with this - having been invented when ASCII ruled the world and Unicode didn't even exist.
>
> Unix has pretty much settled on using UTF-8 for external representation and before long all text files in Unix will be UTF-8 instead of some local encoding.
>
> Here's a quote from the excellent UTF-8 for Unix FAQ
> (http://www.cl.cam.ac.uk/~mgk25/unicode.html):
>
> "With the UTF-8 encoding, Unicode can be used in a convenient and backwards compatible way in environments that, like Unix, were designed entirely around ASCII. UTF-8 is the way in which Unicode is used under Unix, Linux, and similar systems. It is now time to make sure that you are well familiar with it and that your software supports UTF-8 smoothly."
>
> Regards
>
> Elias


December 04, 2003
Den Thu, 04 Dec 2003 10:56:46 -0800 skrev Sean L. Palmer:

> Right.  And the OS should provide at least one font that has every single unicode character, for use as fallback for fonts that are missing such characters.

Yes it certainly should. Now, my Linux installationlacks fonts for a large set of the unihan code points, but other than that I have most of them.

In fact, I think that almost all existing installed operating systems today would be able to handle unicode operators. However, I think the problem with them is more related to the fact that you more than likely will need a special editor for the code (at least if you don't want to try to remember all the \u-codes for the operators).

Unicode is very important, as I have pointed out several times in the other unicode thread, but it deals with strings in the language. Not the source code itself.

Do I think the designers of Java made a mistake when support unicode in it's symbols? A few years ago I would have said yes. Now, I say that it really didn't matter. People don't use unicode symbols anyway. Therefore, I believe that this discussion is a non-issue. EVen if unicode operatos would be supported, I doubdt people would use them in the name of interoperability.

Regards

Elias

December 05, 2003
That's fine with me, so long as they are not expressly prohibited, I can use them for my own personal projects.  Support for them would then grow grassroots-style.  I have text editors that support Unicode, and I don't mind cutting and pasting.  Ease of entry is a minor issue to me.

The problem is, if we can't define new operators in D, and it doesn't provide enough overloadable builtin operators, I'm stuck.  I can do nothing but invest in a Unicode-aware preprocessor.  I want the option of moving forward.

What good is being able to compile D source encoded in UTF-8 if you aren't allowed to use any symbols that aren't in ASCII?  (except embedded in string literals)

Sean

"Elias Martenson" <no@spam.spam> wrote in message news:pan.2003.12.04.23.39.50.952964@spam.spam...
> Den Thu, 04 Dec 2003 10:56:46 -0800 skrev Sean L. Palmer:
>
> > Right.  And the OS should provide at least one font that has every
single
> > unicode character, for use as fallback for fonts that are missing such characters.
>
> Yes it certainly should. Now, my Linux installationlacks fonts for a large set of the unihan code points, but other than that I have most of them.
>
> In fact, I think that almost all existing installed operating systems today would be able to handle unicode operators. However, I think the problem with them is more related to the fact that you more than likely will need a special editor for the code (at least if you don't want to try to remember all the \u-codes for the operators).
>
> Unicode is very important, as I have pointed out several times in the other unicode thread, but it deals with strings in the language. Not the source code itself.
>
> Do I think the designers of Java made a mistake when support unicode in it's symbols? A few years ago I would have said yes. Now, I say that it really didn't matter. People don't use unicode symbols anyway. Therefore, I believe that this discussion is a non-issue. EVen if unicode operatos would be supported, I doubdt people would use them in the name of interoperability.
>
> Regards
>
> Elias


December 05, 2003
Sean L. Palmer wrote:
> That's fine with me, so long as they are not expressly prohibited, I can use them for my own personal projects.  Support for them would then grow grassroots-style.  I have text editors that support Unicode, and I don't mind cutting and pasting.  Ease of entry is a minor issue to me.
> 
> The problem is, if we can't define new operators in D, and it doesn't provide enough overloadable builtin operators, I'm stuck.  I can do nothing but invest in a Unicode-aware preprocessor.  I want the option of moving forward.
> 
> What good is being able to compile D source encoded in UTF-8 if you aren't allowed to use any symbols that aren't in ASCII?  (except embedded in string literals)

Actually, since DMD 0.74 non-ASCII characters (as long they are "unicode alpha") are allowed as identifier names.  (See the attached example.) Also, comments can contain any non-ASCII character.

I do think Unicode operators is an interesting idea.


Justin

> 
> Sean


December 05, 2003
Yeah, just have to set this "free" browser to Encoding... Unicode UTF-8

That's pretty cool.  Pretty cool indeed.

I bet you if I cut and paste some D program made by someone is a far-away land, into some web-based translator engine it would probably not do that bad of a job of translating the identifiers back into english again ;)

Most likely, I'll rarely if ever see any source written in some other language, and if I did, I'd just consider it obfuscation.  It's not a sin punishable by death.

I think it's cool that finally people can more or less program in their own language, once they learn the english keywords.  A preprocessor would allow even those to be replaced.

In fact, whose idea was it to allow infix notation for regular identifiers? We could use a preprocessor to translate our D + Unicode Symbols into D that will actually compile.  ;)  Right now it would only work with prefix (lisp-like) notation, however.

They have some really interesting brackets in Unicode, as well.  Surely there's one just begging to be used for template syntax.

Sean

"J C Calvarese" <jcc7@cox.net> wrote in message news:bqpbqo$8no$1@digitaldaemon.com...
> Sean L. Palmer wrote:
> > That's fine with me, so long as they are not expressly prohibited, I can
use
> > them for my own personal projects.  Support for them would then grow grassroots-style.  I have text editors that support Unicode, and I don't mind cutting and pasting.  Ease of entry is a minor issue to me.
> >
> > The problem is, if we can't define new operators in D, and it doesn't provide enough overloadable builtin operators, I'm stuck.  I can do
nothing
> > but invest in a Unicode-aware preprocessor.  I want the option of moving forward.
> >
> > What good is being able to compile D source encoded in UTF-8 if you
aren't
> > allowed to use any symbols that aren't in ASCII?  (except embedded in
string
> > literals)
>
> Actually, since DMD 0.74 non-ASCII characters (as long they are "unicode
> alpha") are allowed as identifier names.  (See the attached example.)
> Also, comments can contain any non-ASCII character.
>
> I do think Unicode operators is an interesting idea.
>
>
> Justin
>
> >
> > Sean
>


----------------------------------------------------------------------------
----


>
>
> const char[] Sí = "yes";
> const char[] Año = "year";
>
> /+
>
> These don't work (it might be because they are iconic symbols rather than
part of any actual language)
> const char[] ???? = "box drawing";
> const char[] ???? = "cards";
>
> +/
>
>
> int main()
> {
>
>   int AñoNúmero = 2003;
>   int Cyrillic???? = 1;
>   int Hebrew?????;
>
>   printf("%d", AñoNúmero);
>
>   return 0;
> }


December 05, 2003
Den Fri, 05 Dec 2003 01:34:19 -0600 skrev J C Calvarese:

> Actually, since DMD 0.74 non-ASCII characters (as long they are "unicode alpha") are allowed as identifier names.  (See the attached example.) Also, comments can contain any non-ASCII character.

Neat. Although your newsreader didn't include a proper encoding header. Not your fault, but rather the broken software. :-)

Regards

Elias

December 05, 2003
"J C Calvarese" <jcc7@cox.net> wrote


<snip>

> Actually, since DMD 0.74 non-ASCII characters (as long they are "unicode
> alpha") are allowed as identifier names.  (See the attached example.)
> Also, comments can contain any non-ASCII character.
>

I think only "letter-like" unicode characters should be allowed in D identifiers.  Having variables like

int   = 42 ;
float ±×§ =3.14159 ;

will really confuse things.  Punctuation, shapes, boxdrawing, dingbats, math symbols, should be prohibited from being used in identifiers.

> I do think Unicode operators is an interesting idea.
>
>
> Justin
>
> >
> > Sean
>


----------------------------------------------------------------------------
----


> 
>
> const char[] Sí = "yes";
> const char[] Año = "year";
>
> /+
>
> These don't work (it might be because they are iconic symbols rather than
part of any actual language)
> const char[] â. â.¢â.¦â.¬ = "box drawing";
> const char[] âT âT¥âT£âT¦ = "cards";
>
> +/
>
>
> int main()
> {
>
>   int AñoNúmero = 2003;
>   int CyrillicÒ-Ñ?Ò"Ò± = 1;
>   int Hebrewא×"×Yףק;
>
>   printf("%d", AñoNúmero);
>
>   return 0;
> }