October 14, 2004
Would it be yet another "blasphemy" to
add a string *alias* to the language ?
(No, not a string typedef. Just alias)

I think that, and some char type aliases
similar to stdint.d, could do *wonders*
for the readability/understandability ?


alias char  utf8_t;
alias wchar utf16_t;
alias dchar utf32_t;

alias utf8_t[]   string; // ASCII-optimized
alias utf16_t[] ustring; // Unicode-optimized


Used like in the following example D program,
that will print all args in UTF-8 and UTF-32:

void main(string[] args)
{
  foreach(int a, string arg; args) {
    printf("%d: %.*s\n", a, arg);
    printf("    ");
    foreach (utf8_t b; arg) {
      printf("%02x ", b);
    }
    printf("\n");
    foreach (utf32_t c; arg) {
      printf("\t\\U%08x\n", c);
    }
  }
}

For simple ASCII, the output looks something like:

0: ./unichar
    2e 2f 75 6e 69 63 68 61 72
        \U0000002e
        \U0000002f
        \U00000075
        \U0000006e
        \U00000069
        \U00000063
        \U00000068
        \U00000061
        \U00000072

With unicode arguments, it looks ... different.
(since some UTF-8 code units will be surrogates)

--anders


PS:
I think this string alias and UTF-8 chars are way
better than Java's String class and UTF-16 chars!
(pretty much the same way that the compiled D code
vastly outperforms the Java code with JVM startup)