July 13, 2004
"Blandger" <zeroman@prominvest.com.ua> wrote in message news:cd085g$29tq$1@digitaldaemon.com...
> For example, recently I stuck with:
> Object {
> ...
> char[] toString()
> ...
> }
> but I need wchar[] at least for supporting non ASCII languages. DMD
> complains about another return type.

char[] isn't ASCII, it's UTF-8. Any UTF-8 string can be converted to UTF-16
(which is wchar[]) by calling std.utf.toUTF16(). So, char[] toString() does
fully support non-ASCII languages.


July 13, 2004
"Blandger" <zeroman@prominvest.com.ua> wrote in message news:cd0lhh$30mc$1@digitaldaemon.com...
>
> "Arcane Jill" <Arcane_member@pathlink.com> wrote in message news:cd0jdn$2sru$1@digitaldaemon.com...
> > In article <cd085g$29tq$1@digitaldaemon.com>, Blandger says...
> >
> > >but I need wchar[] at least for supporting non ASCII languages.
> >
> > Not true. char[] stores UTF-8, not ASCII. The whole of Unicode is
> available to
> > char[] arrays.
> >
> > #    char[] s = "&#1041;&#1075;&#1047;&#1049;
> &#10077;&#13181;&#9283;&#10078;
> > &#5797;&#5801;&#5804; &#1600;&#1601;&#1602;";
> >
> > is perfectly legal. (And you can use etc.unicode's
> getSimpleUppercaseMapping() to uppercase it too).
>
> Thanks for addition.
>
> You are right it's legal but it looks (and I think works) ugly. It seems
to
> me there is no 'normal way' to work with upper/lowecase, sort, search, collate, replace, code pages stuff  with non ASCII letters within Phobos
in
> this case . Or am I something missed ??

It looks ugly because it's written with unicode code numbers rather than the actual characters. If you write your source code using an editor that supports UTF-8, UTF-16, or UTF-32 you can write it using the actual characters. The D compiler can handle UTF-8, UTF-16, or UTF-32 source text.


July 13, 2004
In article <cd0lhh$30mc$1@digitaldaemon.com>, Blandger says...
>
>> #    char[] s = "&#1041;&#1075;&#1047;&#1049;
>&#10077;&#13181;&#9283;&#10078;
>> &#5797;&#5801;&#5804; &#1600;&#1601;&#1602;";
>>
>You are right it's legal but it looks (and I think works) ugly.

Errm. That was an artifact of this forum's web interface. When I typed it in, it looked to me like a nice bunch of Russian and Chinese characters with a few Runes and Dingbats thrown it. It would look like that in my text editor too. And it would work. Alas, the HTML capacities of the D forum web site were not up the job, so you didn't see what I intended for you to see.

Apparently you have to be a virgin to see unicode.  :)

Something like that anyway. Walter says Unicode is the future. I think he's right, but unfortunately it isn't the present.


>It seems to
>me there is no 'normal way' to work with upper/lowecase, sort, search,
>collate, replace, code pages stuff  with non ASCII letters within Phobos in
>this case . Or am I something missed ??

Right now, no. But you can use the getSimpleUppercaseMapping() etc. functions from Deimos to do casing. Lexicographical sort isn't a problem, obviously. Search - depends what you mean. If you're waiting for the Unicode regular expression engine, you'll have to wait a while - that will be one of the last things we get. If you want an exact match though, that's pretty easy right now - a string is just an array, after all. Collation will be available (but isn't yet) via the Unicode Collation Algorithm - for which we'll have to download the CLDR (Common Locale Data Repository) from Unicode to get all the locale-specific weightings, but that will come.

"Code pages", note, have nothing to do with Unicode. That comes into play in our sphere during transoding (encoding/decoding), which is something that I imagine will ultimately be built into streams.

Much of Phobos was written in the early days of D, when there was no access to Unicode property data. It takes time to organize a proper Unicode library. Unicode has layers of features, with each algorithm relying on the services of the next layer down. Phobos had access to none of this, when it was written. Even now, Deimos's Unicode support is still only at the character level, but we'll get to the string level eventually.

But all this will come. And I strongly suspect that D's Unicode support will eventually make it the language of choice for Unicode projects.

Arcane Jill



July 13, 2004
"Arcane Jill" <Arcane_member@pathlink.com> wrote in message news:cd18gg$135d$1@digitaldaemon.com...
> In article <cd0lhh$30mc$1@digitaldaemon.com>, Blandger says...
> >
> it would work. Alas, the HTML capacities of the D forum web site were not
up the
> job, so you didn't see what I intended for you to see.

I see. :)

> Apparently you have to be a virgin to see unicode.  :)
>
> Something like that anyway. Walter says Unicode is the future. I think
he's
> right, but unfortunately it isn't the present.

Agree with you both.

> "Code pages", note, have nothing to do with Unicode. That comes into play
in our
> sphere during transoding (encoding/decoding), which is something that I
imagine
> will ultimately be built into streams.

Exactly. I meant I don't want to think about code page then I use something like 'String class' in the D cdoe because it's should be 'internally unicoded' as it's in java. But I have to think about code page for I/O because there are a lots of 'old files' with 'old non unicode' content.

> Even now, Deimos's Unicode support is still only at the character level,
but
> we'll get to the string level eventually.
> But all this will come. And I strongly suspect that D's Unicode support
will
> eventually make it the language of choice for Unicode projects.

Hope so. :)


July 13, 2004
"Walter" <newshound@digitalmars.com> wrote in message news:cd17ev$115j$1@digitaldaemon.com...

> char[] isn't ASCII, it's UTF-8. Any UTF-8 string can be converted to
UTF-16
> (which is wchar[]) by calling std.utf.toUTF16(). So, char[] toString()
does
> fully support non-ASCII languages.

Sorry for mistaking all of you a little.

DWT has a 'internal convention' to use 'alias wchar[] String;' for 'java String class' replacement. I don't know why. Seem it was Andy's decision. I hope it's right but...

Recently I stuck with this:

alias wchar[] String;
  public class ToStringTest {
    this() {
    }
    String toString() {
      return "ff";
    }
  }
DMD complains about another return type:
//function toString overrides but is not covariant with toString

How we can go throught this 'probable error'? This error has gone away by this time with unknow reason (it happed before) but I'm not sure if it doesn't come back again later... (sorry for probobly wrong english gramma here).



July 13, 2004
"Walter" <newshound@digitalmars.com> wrote in message news:cd17f0$115j$2@digitaldaemon.com...

> It looks ugly because it's written with unicode code numbers rather than
the
> actual characters. If you write your source code using an editor that supports UTF-8, UTF-16, or UTF-32 you can write it using the actual characters. The D compiler can handle UTF-8, UTF-16, or UTF-32 source
text.

I'm always catching myself with a thought I'm afraid write a code using UTF
editors.
Actually I don't know why!
May be it's an old, outdated habits, may be it's something like 'internal
fear' from UTF-x stuff. Really I don't know why it's so.

So I decided to ask how many people in NG use UTF-x editors coding sources??



July 13, 2004
Blandger wrote:
> So I decided to ask how many people in NG use UTF-x editors coding sources??

Hasn't this been the standard for several years now - at least in the perl and Java world?

Thomas
July 13, 2004
"Blandger" <zeroman@aport.ru> wrote in message news:cd1fmq$1fqa$2@digitaldaemon.com...
>
> "Walter" <newshound@digitalmars.com> wrote in message news:cd17ev$115j$1@digitaldaemon.com...
>
> > char[] isn't ASCII, it's UTF-8. Any UTF-8 string can be converted to
> UTF-16
> > (which is wchar[]) by calling std.utf.toUTF16(). So, char[] toString()
> does
> > fully support non-ASCII languages.
>
> Sorry for mistaking all of you a little.
>
> DWT has a 'internal convention' to use 'alias wchar[] String;' for 'java String class' replacement. I don't know why. Seem it was Andy's decision.
I
> hope it's right but...
>
> Recently I stuck with this:
>
> alias wchar[] String;
>   public class ToStringTest {
>     this() {
>     }
>     String toString() {
>       return "ff";
>     }
>   }
> DMD complains about another return type:
> //function toString overrides but is not covariant with toString
>
> How we can go throught this 'probable error'? This error has gone away by this time with unknow reason (it happed before) but I'm not sure if it doesn't come back again later... (sorry for probobly wrong english gramma here).

The "not covariant" error happens when the overriding function has a return type that is not the same as the return type of the overridden function, or is not derived from that type.


July 14, 2004
In article <cd17f0$115j$2@digitaldaemon.com>, Walter says...
>
[...]
>If you write your source code using an editor that
>supports UTF-8, UTF-16, or UTF-32 you can write it using the actual
>characters. The D compiler can handle UTF-8, UTF-16, or UTF-32 source text.

This leds to some questions:

How can it detect the right coding?
Does endianess matter?
And what about my current default codepage (windows-1252)?
If I pass an HTML as source, does it honor the encoding specified in the header?

Ciao


July 14, 2004
In article <cd1fmv$1fqa$3@digitaldaemon.com>, Blandger says...
>
>So I decided to ask how many people in NG use UTF-x editors coding sources??

I wasn't aware that there were still any _non_ UTF-XX editors in use! Even Microsoft Notepad - the bottom end of text editors if you're a programmer (no syntax highlighting, etc.) understands UTF-8. These days, what text editors don't?

Me, I use TextPad. TextPad is not fully Unicode-aware (yet), but it CAN save files in UTF-8 format, which is all I need.

Arcane Jill