Thread overview
extended characterset output
Apr 08, 2022
anonymous
Apr 08, 2022
Ali Çehreli
Apr 08, 2022
anonymous
Apr 08, 2022
Ali Çehreli
Apr 09, 2022
anonymous
Apr 08, 2022
anonymous
April 08, 2022

What's the proper way to output all characters in the extended character set?

void main()
{
    foreach(char c; 0 .. 256)
    {
       write(isControl(c) ? '.' : c);
    }
}

Expected output:

................................ !"#$%&\'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[.]^_`abcdefghijklmnopqrstuvwxyz{|}~..................................¡¢£¤¥¦§¨©ª«¬.®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ

Actual output:

................................ !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~.................................������������������������������������������������������������������������������������������������

Works as expected in python.

Thanks

April 08, 2022
On 4/7/22 23:13, anonymous wrote:
> What's the proper way to output all characters in the extended character
> set?

It is not easy to answer because there are a number of concepts here that may make it trivial or complicated.

The configuration of the output device matters. Is it set to Windows-1252 or are you using Unicode strings in Python?

>
> ```d
> void main()
> {
>      foreach(char c; 0 .. 256)

'char' is wrong there because 'char' has a very special meaning in D: A UTF-8 code unit. Not a full Unicode character in many cases, especially in the "extended" set.

I think your problem will be solved simply by replacing 'char' with 'dchar' there:

  foreach (dchar c; ...

However, isControl() below won't work because isControl() only knows about the ASCII table. It would miss the unprintable characters above 127.

>      {
>         write(isControl(c) ? '.' : c);
>      }
> }
> ```

This works:

import std.stdio;

bool isPrintableLatin1(dchar value) {
  if (value < 32) {
    return false;
  }

  if (value > 126 && value < 161) {
    return false;
  }

  return true;
}

void main() {
  foreach (dchar c; 0 .. 256) {
    write(isPrintableLatin1(c) ? c : '.');
  }

  writeln();

  // import std.encoding;

  // foreach(ubyte c; 0 .. 256) {
  //   if (isPrintableLatin1(c)) {
  //     Latin1Char[1] from = [ cast(Latin1Char)c ];
  //     string to;
  //     transcode(from, to);
  //     write(to);

  //   } else {
  //     write('.');
  //   }
  // }

  // writeln();
}

I left some code commented-out, which I experimented with. (That works as well.)

Ali

April 08, 2022
On Friday, 8 April 2022 at 08:36:33 UTC, Ali Çehreli wrote:
> On 4/7/22 23:13, anonymous wrote:
> > What's the proper way to output all characters in the
> extended character
> > set?
>
> It is not easy to answer because there are a number of concepts here that may make it trivial or complicated.
>
> The configuration of the output device matters. Is it set to Windows-1252 or are you using Unicode strings in Python?

I'm running Ubuntu and my default language is en_US.UTF-8.

> >
> > ```d
> > void main()
> > {
> >      foreach(char c; 0 .. 256)
>
> 'char' is wrong there because 'char' has a very special meaning in D: A UTF-8 code unit. Not a full Unicode character in many cases, especially in the "extended" set.
>
> I think your problem will be solved simply by replacing 'char' with 'dchar' there:
>
>   foreach (dchar c; ...

I tried that. It didn't work.

> However, isControl() below won't work because isControl() only knows about the ASCII table. It would miss the unprintable characters above 127.
>
> >      {
> >         write(isControl(c) ? '.' : c);
> >      }
> > }
> > ```

Oh okay, that may have been the reason.

> This works:
>
> import std.stdio;
>
> bool isPrintableLatin1(dchar value) {
>   if (value < 32) {
>     return false;
>   }
>
>   if (value > 126 && value < 161) {
>     return false;
>   }
>
>   return true;
> }
>
> void main() {
>   foreach (dchar c; 0 .. 256) {
>     write(isPrintableLatin1(c) ? c : '.');
>   }

Nope... running this code, I get a bunch of digits as the output. The dot's don't even show up. Maybe I'm drunk or lacking sleep.

Weird, I got this strange feeling that this problem stemmed from the compiler I'm using (GDC) so I installed DMD. Would you believe everything worked fine afterwords? To include the original version where I used isControl and 'dchar' instead of 'char'. I wonder why that is?

Thanks Ali.
April 08, 2022
On Friday, 8 April 2022 at 08:36:33 UTC, Ali Çehreli wrote:
[snip]
> However, isControl() below won't work because isControl() only knows about the ASCII table. It would miss the unprintable characters above 127.
[snip]

This actuall works because I'm using std.uni.isControl() instead of std.ascii.isControl().

April 08, 2022
On 4/8/22 02:51, anonymous wrote:

> Weird, I got this strange feeling that this problem stemmed from the
> compiler I'm using (GDC)

Some distribution install an old gdc. What version is yours?

Ali

April 09, 2022
On Friday, 8 April 2022 at 15:06:41 UTC, Ali Çehreli wrote:
> On 4/8/22 02:51, anonymous wrote:
>
> > Weird, I got this strange feeling that this problem stemmed
> from the
> > compiler I'm using (GDC)
>
> Some distribution install an old gdc. What version is yours?
>
> Ali

Not sure actually. I just did "apt install gdc" and assumed the latest available. Let me check. Here's the version output (10.3.0?):

anon@ymous:~/$ gdc --version
gdc (Ubuntu 10.3.0-1ubuntu1~20.04) 10.3.0
Copyright (C) 2020 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.