View mode: basic / threaded / horizontal-split · Log in · Help
March 02, 2009
Re: std.locale
Mon, 02 Mar 2009 07:02:10 -0800, Andrei Alexandrescu wrote:

> Consider some code in phobos that must throw an exception:
> 
> throw Exception("File `%s' not found, system error is %s.",
>      filename, errnomsg);
> 
> The localized version will look like this:
> 
> auto format = "File `%s' not found, system error is %s.";
> auto localFormat = currentLocale ? currentLocale.peek(format) : null;
> if (!localFormat) localFormat = format;
> throw Exception(localFormat, filename, errnomsg);

This example does not address the encoding problem.  Currently, errnomsg
is in Russian, UTF-8 encoded.  So I get "system error is <garbage>" on
the console.  If you adopt locales I'll get garbage not only for the
system error but for the rest of the exception message as well.

To actually solve this problem the default exception handler must be
fixed to convert any UTF-8 into the current OEM code page before
printing.  It would also help if default stdin and stdout performed such
a conversion.

> What happens is that the default format string _is_ the key for looking 
> up the localized strings.

Nice.  This means that error messages become a part of API and are
subject to backward and forward compatibility issues.  Isn't it too
much?
March 02, 2009
Re: std.locale
Andrei Alexandrescu wrote:
> Georg Wrede wrote:
> You see, we're not communicating. I sent this link:
> 
> http://www.unicode.org/cldr/
> 
> Did you look at it? It is essentially a database of locale information 
> in a highly structured format. All I want is to define a structure 
> expressive enough to gobble the part of that database that is of 
> interest. The Phobos documentation will say, we just adopt their schema. 
> If users don't want to load any, then fine - everything is just like today.

I read the page. It says "This data is used by a wide spectrum of 
companies for their software internationalization and localization".

The first link in the text part is to the CLDR Overview ppt. I read it. 
On page 5 it says:

"Companies / Organizations
Adobe, Apple (Mac OS X), abas Software, Ascential Software, Avaya, BEA, 
BluePhoenix Solutions, BMC Software (Remedy), Business Objects, caris, 
CERN, ClearCommerce, Cognos, Debian Linux, D programming language, 
Gentoo Linux, GNU Classpath, HP, Hyperion, IBM, Inktomi, Innodata 
Isogen, Isogon, Informatica, Intel, Interlogics, IONA, IXOS, Macromedia, 
Mathworks, OpenOffice, Language Analysis Systems, Lawson Software, Leica 
Geosystems GIS & Mapping LLC, Mandrake Linux, Novell (SuSE), Optio 
Software, PayPal, Progress Software, Python, QNX, Quark, Rogue Wave, 
SAP, Siebel, SIL, SPSS, Software AG, Sun Microsystems (Solaris, Java), 
Sybase, Teradata (NCR), Trados, Trend Micro, Virage, webMethods, WMS 
Gaming, Xerox, Yahoo!, and many more…"

One sees here major companies, operating systems, and three languages: 
D, Python and Java. The page is from 2005.

So D "has had this since at least 2005". What can I say? I guess we have 
to implement it then...

>> What I'm saying is, it's debatable whether this stuff belongs to "the 
>> programming language itself" at all. Rather, it should be an external 
>> library, provided by someone else than us. It belongs to SourceForge 
>> or Dsource, not here.
> 
> http://www.unicode.org/cldr/
> 
> We just need to load it if there is such a need.

In another post you sounded as if there is a connection between this 
stuff and printing arrays. I'm not sure I see the connection.

> Let me try again: I don't want to define locale support. I want to 
> provide the basics for people to roll it out themselves.

I downloaded the files in http://unicode.org/Public/cldr/1.6.1/ which 
were core.zip, posix.zip, tests.zip and tools.zip. They unzipped to 
140MB, containing some 200 java files and some 800 xml files, among others.

The readme.txt in tools.zip says:

"The code is very preliminary, so don't expect stability from the APIs 
(or documentation!), since we still have to work out how we want to do 
the architecture."

The main web page says "CLDR 1.7 Tentative Schedule: 2008-09", but it 
still isn't on the download page. The last version is 2008-07-23 
Version1.6.1.

==============

My take:

 * This is still a moving target
 * Using this is a major hassle for the programmer
 * With D2 itelf a moving target, nobody is going to invest enough time 
in this to actually use it for something worthwhile in the next 6 to 12 
months anyway
 * This is more application level stuff than language level stuff
 * Doing this now will steal time from you, Walter, and many of us, 
both directly, and indirectly by leaching bandwidth in the newsgroup -- 
time that should be spent on more urgent or more important things, or 
even documentation
 * If it's so easy to do, then why not do it a week before the release 
of final D2

I really can't help it, but this is how I see it.
March 02, 2009
Re: std.locale
Georg Wrede wrote:
> Andrei Alexandrescu wrote:
>> Georg Wrede wrote:
>> You see, we're not communicating. I sent this link:
>>
>> http://www.unicode.org/cldr/
>>
>> Did you look at it? It is essentially a database of locale information 
>> in a highly structured format. All I want is to define a structure 
>> expressive enough to gobble the part of that database that is of 
>> interest. The Phobos documentation will say, we just adopt their 
>> schema. If users don't want to load any, then fine - everything is 
>> just like today.
> 
> I read the page. It says "This data is used by a wide spectrum of 
> companies for their software internationalization and localization".
> 
> The first link in the text part is to the CLDR Overview ppt. I read it. 
> On page 5 it says:
> 
> "Companies / Organizations
> Adobe, Apple (Mac OS X), abas Software, Ascential Software, Avaya, BEA, 
> BluePhoenix Solutions, BMC Software (Remedy), Business Objects, caris, 
> CERN, ClearCommerce, Cognos, Debian Linux, D programming language, 
> Gentoo Linux, GNU Classpath, HP, Hyperion, IBM, Inktomi, Innodata 
> Isogen, Isogon, Informatica, Intel, Interlogics, IONA, IXOS, Macromedia, 
> Mathworks, OpenOffice, Language Analysis Systems, Lawson Software, Leica 
> Geosystems GIS & Mapping LLC, Mandrake Linux, Novell (SuSE), Optio 
> Software, PayPal, Progress Software, Python, QNX, Quark, Rogue Wave, 
> SAP, Siebel, SIL, SPSS, Software AG, Sun Microsystems (Solaris, Java), 
> Sybase, Teradata (NCR), Trados, Trend Micro, Virage, webMethods, WMS 
> Gaming, Xerox, Yahoo!, and many more…"
> 
> One sees here major companies, operating systems, and three languages: 
> D, Python and Java. The page is from 2005.
> 
> So D "has had this since at least 2005". What can I say? I guess we have 
> to implement it then...

Hehe, didn't see that.

>>> What I'm saying is, it's debatable whether this stuff belongs to "the 
>>> programming language itself" at all. Rather, it should be an external 
>>> library, provided by someone else than us. It belongs to SourceForge 
>>> or Dsource, not here.
>>
>> http://www.unicode.org/cldr/
>>
>> We just need to load it if there is such a need.
> 
> In another post you sounded as if there is a connection between this 
> stuff and printing arrays. I'm not sure I see the connection.

Very simple. If we have a locale table, I am thinking of dedicating a 
branch "std" in it to stuff that's in std. For example, I can use 
currentLocale.get("std", "array-separator") or something.

>> Let me try again: I don't want to define locale support. I want to 
>> provide the basics for people to roll it out themselves.
> 
> I downloaded the files in http://unicode.org/Public/cldr/1.6.1/ which 
> were core.zip, posix.zip, tests.zip and tools.zip. They unzipped to 
> 140MB, containing some 200 java files and some 800 xml files, among others.
> 
> The readme.txt in tools.zip says:
> 
> "The code is very preliminary, so don't expect stability from the APIs 
> (or documentation!), since we still have to work out how we want to do 
> the architecture."
> 
> The main web page says "CLDR 1.7 Tentative Schedule: 2008-09", but it 
> still isn't on the download page. The last version is 2008-07-23 
> Version1.6.1.
> 
> ==============
> 
> My take:
> 
>  * This is still a moving target
>  * Using this is a major hassle for the programmer
>  * With D2 itelf a moving target, nobody is going to invest enough time 
> in this to actually use it for something worthwhile in the next 6 to 12 
> months anyway
>  * This is more application level stuff than language level stuff
>  * Doing this now will steal time from you, Walter, and many of us, both 
> directly, and indirectly by leaching bandwidth in the newsgroup -- time 
> that should be spent on more urgent or more important things, or even 
> documentation
>  * If it's so easy to do, then why not do it a week before the release 
> of final D2
> 
> I really can't help it, but this is how I see it.

I understand.


Andrei
March 02, 2009
Re: std.locale
Sergey Gromov wrote:
> Mon, 02 Mar 2009 07:02:10 -0800, Andrei Alexandrescu wrote:
> 
>> Consider some code in phobos that must throw an exception:
>>
>> throw Exception("File `%s' not found, system error is %s.",
>>      filename, errnomsg);
>>
>> The localized version will look like this:
>>
>> auto format = "File `%s' not found, system error is %s.";
>> auto localFormat = currentLocale ? currentLocale.peek(format) : null;
>> if (!localFormat) localFormat = format;
>> throw Exception(localFormat, filename, errnomsg);
> 
> This example does not address the encoding problem.  Currently, errnomsg
> is in Russian, UTF-8 encoded.  So I get "system error is <garbage>" on
> the console.  If you adopt locales I'll get garbage not only for the
> system error but for the rest of the exception message as well.
> 
> To actually solve this problem the default exception handler must be
> fixed to convert any UTF-8 into the current OEM code page before
> printing.  It would also help if default stdin and stdout performed such
> a conversion.

I see.

>> What happens is that the default format string _is_ the key for looking 
>> up the localized strings.
> 
> Nice.  This means that error messages become a part of API and are
> subject to backward and forward compatibility issues.  Isn't it too
> much?

I think it isn't too much, considering the sorry state of affairs of 
today's exceptions. You can't even answer the question: "Given this 
FileException object, what file name was concerned?" And each module 
defines its own exception class that is equally useless. It's 
ridiculous. 95% of them must be removed. And we must have systematic 
formatting of all strings initiated by Phobos.


Andrei
March 02, 2009
Re: std.locale
Christopher Wright wrote:
>> -- All very nice, but no cigar. That's about as smart as letting 
>> people define *unlimited* length variable names!)
> 
> I recently dealt with a programming language that specified a limit of 
> 63 characters for identifier names. This wouldn't have been a 
> significant problem, except that I was generating code automatically, 
> and some of my identifiers were over 90 characters. Identifier length 
> limits are evil, unless they're ridiculously large (C#, I think, limits 
> identifiers to 4096 characters).

As soon as you put in a limit on identifier name length, sooner or later 
you'll get a bug report on it.

For example, C++ can be compiled to C code. C++ templates encode their 
entire state into the template instance identifier, and these can easily 
reach 10,000 characters or more. So if your C compiler has a length 
limit on identifiers, then C++ templates become severely limited.

Another thing to consider is it's actually *more* work to put a limit 
on, where you have to document it, explain it, detect it, diagnose it, 
recover from it, than if you just make it unlimited.

There are really only 3 numbers in computer programming: 0, 1, and 
unlimited. I always chuckle when I see an ad for like, an editor, that 
says "up to 5 files open at once!".
March 02, 2009
Re: std.locale
Georg Wrede wrote:
> So D "has had this since at least 2005". What can I say? I guess we have 
> to implement it then...

Wow, D usually gets slammed for not having a feature that even a cursory 
glance at the documentation shows it has. This is the first vaporware 
feature!
March 02, 2009
Re: std.locale
Michel Fortin wrote:
> Translating strings is a little harder because 1) strings are 
> application-defined, 2) strings are often not available in the user's 
> prefered language, adding the need for a fallback mecanism, and 3) 
> different applications will want to to store those strings in different 
> ways. Perhaps we could define a base class for getting translated 
> strings, then allow the program to use whatever subclass it wants.

It's a silly thing, but I love the little google widget you can add to a 
web page to automatically translate the pages. All the D site pages have 
it in the left column.
March 02, 2009
Re: std.locale
Sergey Gromov wrote:
> Mon, 02 Mar 2009 09:34:32 +0200, Georg Wrede wrote:
>
>>>> Of course, eventually we will want to "do something" about this. But
>>>> that should be left to the day when real issues are all sorted out in
>>>> D. This is a non-urgent, low-priority thing.
>> Had there been any need for locales, believe me, the "foreigners" in
>> this NG would have asked for it.
>
> I'm Russian.  For me, encoding problems are a PITA of such epic
> proportions that little format inconsistencies simply fade away.  Yes
> it's sometimes hard to decipher what 02/03/08 means since our custom is
> to put day first and separate with dots.  But compare this to Adobe Flex
> SDK which prints half compiler error messages in Russian (thank you
> Adobe!) using system default code page, 1251, while default /console/
> code page is actually so-called IBM 866.  Whenever I use MXML compiler
> from console I get rubbish for error messages.  And there is no way to
> disable translation--I've found none.  Phobos is no better.  Any
> exception resulting from an invalid OS call dumps UTF-8 garbage instead
> of an error message.  std.file.read("non-existent") for instance.
>
> I think games are not an issue.  I've worked for a company producing
> cell phone games for a long time.  I've localized my game for Chinese
> market, too.  The thing is, game interfaces are always custom, always
> ad-hoc.  They *never* work in untested locales.  Well, with some
> experience you can make them work most of the time in languages you are
> familiar with, from localization perspective.  Anyway, all you need to
> know is an ID of a supported locale so that you can replace text and
> locale-specific images accordingly.  Then you have correctors and native
> testing to make sure the localization works.

encoding isn't that hard compared to other issues.
for instance, have you ever tried to make a website go both ways?
March 02, 2009
Re: std.locale
Sergey Gromov wrote:
> To actually solve this problem the default exception handler must be
> fixed to convert any UTF-8 into the current OEM code page before
> printing.  It would also help if default stdin and stdout performed such
> a conversion.

No, stdin/stdout *must* perform this conversion.  It is a serious bug if
they don't.

The conversion cannot be performed at any other level.  D uses unicode
internally.  The console uses a specific encoding.  Therefore all data
passing between D and the console must be encoded/decoded.


-- 
Rainer Deyke - rainerd@eldwood.com
March 02, 2009
Re: std.locale
On Mon, Mar 2, 2009 at 1:52 PM, Georg Wrede <georg.wrede@iki.fi> wrote:
>
> My take:
>
>  * This is still a moving target
>  * Using this is a major hassle for the programmer
>  * With D2 itelf a moving target, nobody is going to invest enough time in
> this to actually use it for something worthwhile in the next 6 to 12 months
> anyway
>  * This is more application level stuff than language level stuff
>  * Doing this now will steal time from you, Walter, and many of us, both
> directly, and indirectly by leaching bandwidth in the newsgroup -- time that
> should be spent on more urgent or more important things, or even
> documentation
>  * If it's so easy to do, then why not do it a week before the release of
> final D2

I agree entirely.  Localization and internationalization seem like
things that should be at a much higher level than a standard library.
Everyone's going to want to do it differently.  Providing a thin,
cross-platform wrapper over what the OS exposes is fine, but creating
a proper i18n/l10n framework is a huge project in and of itself (I
think the 140MB Java package makes that abundantly clear).

I'd much rather see a rewritten std.stream and proper Unicode support
in std.string (support for types other than string, functions for
indexing and slicing on character boundaries) before this.
1 2 3 4 5 6 7 8 9
Top | Discussion index | About this forum | D home