March 02, 2009
Jarrett Billingsley wrote:
> functions for
> indexing and slicing on character boundaries) before this.

These already exist in std.uni.
March 02, 2009
Rainer Deyke wrote:
> Sergey Gromov wrote:
>> To actually solve this problem the default exception handler must be
>> fixed to convert any UTF-8 into the current OEM code page before
>> printing.  It would also help if default stdin and stdout performed such
>> a conversion.
> 
> No, stdin/stdout *must* perform this conversion.  It is a serious bug if
> they don't.
> 
> The conversion cannot be performed at any other level.  D uses unicode
> internally.  The console uses a specific encoding.  Therefore all data
> passing between D and the console must be encoded/decoded.
> 
> 

What API to use to detect the encoding used by the console?

Andrei
March 02, 2009

Andrei Alexandrescu wrote:
> Rainer Deyke wrote:
>> Sergey Gromov wrote:
>>> To actually solve this problem the default exception handler must be fixed to convert any UTF-8 into the current OEM code page before printing.  It would also help if default stdin and stdout performed such a conversion.
>>
>> No, stdin/stdout *must* perform this conversion.  It is a serious bug if they don't.
>>
>> The conversion cannot be performed at any other level.  D uses unicode internally.  The console uses a specific encoding.  Therefore all data passing between D and the console must be encoded/decoded.
>>
>>
> 
> What API to use to detect the encoding used by the console?
> 
> Andrei

According to <http://markmail.org/message/neu2pllqz3sst4tq>, it's uint GetConsoleOutputCP() <http://msdn.microsoft.com/en-us/library/ms683169%28VS.85%29.aspx>.

Interestingly, there's a SetConsoleOutputCP
<http://msdn.microsoft.com/en-us/library/ms686036(VS.85).aspx> function.
 Check this out:

> module utf;
>
> import tango.io.Stdout;
>
> extern(Windows) int SetConsoleOutputCP(uint wCodePageID);
>
> void main()
> {
>     SetConsoleOutputCP(65001);
>     Stdout("Не∟└Ω").newline;
> }

FYI, "65001" is how Windows spells "UTF-8".  Also note that this won't work in anything earlier than Windows 2000, but then, even that's not supported any more.

Note that you MUST change the console's font to Lucidia Console (right-click title, properties, font tab) for this to actually display, but that's not something D can control.  :P

  -- Daniel
March 02, 2009
Daniel Keep wrote:
> 
> Andrei Alexandrescu wrote:
>> Rainer Deyke wrote:
>>> Sergey Gromov wrote:
>>>> To actually solve this problem the default exception handler must be
>>>> fixed to convert any UTF-8 into the current OEM code page before
>>>> printing.  It would also help if default stdin and stdout performed such
>>>> a conversion.
>>> No, stdin/stdout *must* perform this conversion.  It is a serious bug if
>>> they don't.
>>>
>>> The conversion cannot be performed at any other level.  D uses unicode
>>> internally.  The console uses a specific encoding.  Therefore all data
>>> passing between D and the console must be encoded/decoded.
>>>
>>>
>> What API to use to detect the encoding used by the console?
>>
>> Andrei
> 
> According to <http://markmail.org/message/neu2pllqz3sst4tq>, it's uint
> GetConsoleOutputCP()
> <http://msdn.microsoft.com/en-us/library/ms683169%28VS.85%29.aspx>.
> 
> Interestingly, there's a SetConsoleOutputCP
> <http://msdn.microsoft.com/en-us/library/ms686036(VS.85).aspx> function.
>  Check this out:
> 
>> module utf;
>>
>> import tango.io.Stdout;
>>
>> extern(Windows) int SetConsoleOutputCP(uint wCodePageID);
>>
>> void main()
>> {
>>     SetConsoleOutputCP(65001);
>>     Stdout("Не∟└Ω").newline;
>> }
> 
> FYI, "65001" is how Windows spells "UTF-8".  Also note that this won't
> work in anything earlier than Windows 2000, but then, even that's not
> supported any more.
> 
> Note that you MUST change the console's font to Lucidia Console
> (right-click title, properties, font tab) for this to actually display,
> but that's not something D can control.  :P
> 
>   -- Daniel

Ahhhh... Windows you mean? Ehm. I need to get to a Windows machine. If you could paste this into a bug report that would be great.

Thanks,

Andrei
March 02, 2009
Jarrett Billingsley wrote:
> On Mon, Mar 2, 2009 at 1:52 PM, Georg Wrede <georg.wrede@iki.fi> wrote:
>> My take:
>>
>>  * This is still a moving target
>>  * Using this is a major hassle for the programmer
>>  * With D2 itelf a moving target, nobody is going to invest enough time in
>> this to actually use it for something worthwhile in the next 6 to 12 months
>> anyway
>>  * This is more application level stuff than language level stuff
>>  * Doing this now will steal time from you, Walter, and many of us, both
>> directly, and indirectly by leaching bandwidth in the newsgroup -- time that
>> should be spent on more urgent or more important things, or even
>> documentation
>>  * If it's so easy to do, then why not do it a week before the release of
>> final D2
> 
> I agree entirely.  Localization and internationalization seem like
> things that should be at a much higher level than a standard library.
> Everyone's going to want to do it differently.  Providing a thin,
> cross-platform wrapper over what the OS exposes is fine, but creating
> a proper i18n/l10n framework is a huge project in and of itself (I
> think the 140MB Java package makes that abundantly clear).

I must be missing something huge because I keep on misunderestimating (sic :o)) the scope of this project.

Let me try to state my point again: I don't want to provide locale-specific strings, collation orders, date, time, and number formatters, or class hierarchies that do all of the above. Zip. Nada. Zilch.

I want to put together a string-based hierarchical string table that allows depositing ALL OF THE ABOVE in it, without initially putting ANYTHING in it. What's nice is that others have already defined the keys and the possible values used by that table.

Possibly you are missing one or more of the following points:

1) The existence of a hierarchical nomenclature for localization;

2) The existence of a large database containing localized values for said nomenclature;

2) The power of Algebraic, which allows depositing data, functions, and subtables alike in a uniform format.

> I'd much rather see a rewritten std.stream and proper Unicode support
> in std.string (support for types other than string, functions for
> indexing and slicing on character boundaries) before this.

That, incidentally, is more complicated :o).


Andrei
March 02, 2009
On Mon, Mar 2, 2009 at 3:48 PM, Walter Bright <newshound1@digitalmars.com> wrote:
> Jarrett Billingsley wrote:
>>
>> functions for
>> indexing and slicing on character boundaries) before this.
>
> These already exist in std.uni.

It's std.utf, but good to know.
March 02, 2009
Mon, 02 Mar 2009 12:53:48 -0800, Andrei Alexandrescu wrote:

> Rainer Deyke wrote:
>> Sergey Gromov wrote:
>>> To actually solve this problem the default exception handler must be fixed to convert any UTF-8 into the current OEM code page before printing.  It would also help if default stdin and stdout performed such a conversion.
>> 
>> No, stdin/stdout *must* perform this conversion.  It is a serious bug if they don't.
>> 
>> The conversion cannot be performed at any other level.  D uses unicode internally.  The console uses a specific encoding.  Therefore all data passing between D and the console must be encoded/decoded.
>> 
> 
> What API to use to detect the encoding used by the console?

There is std.windows.charset.toMBSz(str, 1) which does the right thing.
March 03, 2009
Andrei Alexandrescu wrote:
> If you want to provide a specific date formatter, you plant a delegate in the locale table. The code in Phobos doing formatting will detect that and call your delegate passing in the date. You do whatever you want on your side (format on the spot, use your own class hierarchy etc.)
> 
> Again: mechanism only. Not policy.
> 
> 
> Andrei

Weak typing for the win!
March 03, 2009
Christopher Wright wrote:
> Andrei Alexandrescu wrote:
>> If you want to provide a specific date formatter, you plant a delegate in the locale table. The code in Phobos doing formatting will detect that and call your delegate passing in the date. You do whatever you want on your side (format on the spot, use your own class hierarchy etc.)
>>
>> Again: mechanism only. Not policy.
>>
>>
>> Andrei
> 
> Weak typing for the win!

Yes. Sometimes it's exactly what the doctor prescribed, as I believe is in this case.

Andrei
March 03, 2009
Andrei Alexandrescu wrote:
> The localized version will look like this:
> 
> auto format = "File `%s' not found, system error is %s.";
> auto localFormat = currentLocale ? currentLocale.peek(format) : null;
> if (!localFormat) localFormat = format;
> throw Exception(localFormat, filename, errnomsg);

This short example suggests:
Locale.peek(T)(char[] key, T ifNotFound = T.init)

auto localFormat = currentLocale ? currentLocale.peek(format, format) : format;
throw new Exception(localFormat);