Thread overview
UTF-8 char and write(f)ln
Mar 04
Vindex9
Mar 05
Vindex9
Mar 06
Vindex9
March 04

Program:

import std.stdio;

void main() {
    string s = "ταυ";
    foreach(i, elem; s) {
        writefln("%s %s '%s'", i, cast(int)elem, elem);
        writefln("%s", elem);
    }
}

Output:

0 207 '�'

1 132 '�'
τ
2 206 '�'

3 177 '�'
α
4 207 '�'

5 133 '�'
υ

How does the second writefln know about the context and can adequately output a character on every other iteration?

However, if you do it this way (see below), the output is very strange with arbitrary line breaks.

    foreach(i, elem; s) {
        writefln("'%s', %s", elem, elem);
    }

Output:

'�',
�'�', �
'�',
�'�', �
'�',
�'�', �

March 05
On 3/4/25 12:09 AM, Vindex9 wrote:
> Program:
> ```d
> import std.stdio;
>
> void main() {
>      string s = "ταυ";
>      foreach(i, elem; s) {
>          writefln("%s %s '%s'", i, cast(int)elem, elem);
>          writefln("%s", elem);
>      }
> }
>
> ```
> Output:
> ```
> 0 207 '�'
>
> 1 132 '�'
> τ

[...]

> How does the second writefln know about the context and can adequately
> output a character on every other iteration?

It shouldn't and does not work in my environment (both inside an Emacs buffer and inside a Linux terminal).

I think you are running your program in an environment where 132 is mapped to τ, etc. Perhaps a "code page" setting is helping (or hurting) you there?

Ali

March 05
On Wednesday, 5 March 2025 at 17:56:25 UTC, Ali Çehreli wrote:
> It shouldn't and does not work in my environment (both inside an Emacs buffer and inside a Linux terminal).

I used Terminator. I tried other terminal emulators and they behaved differently (output unrecognized bits of characters as question marks). Apparently Terminator has some weird buffering. Sorry to bother you.
March 05
On 3/5/25 11:59 AM, Vindex9 wrote:
> On Wednesday, 5 March 2025 at 17:56:25 UTC, Ali Çehreli wrote:
>> It shouldn't and does not work in my environment (both inside an Emacs
>> buffer and inside a Linux terminal).
>

The program just outputs characters to its stdout. One way to see this process is to redirect 'stdout' to a file:

$ my_program > my_output

Then, when you open file 'my_output' in a hex editor, you should see that the program did output just a single char with value e.g. 132. There are no other UTF-8 characters right after it, so I wouldn't expect 'τ' to be formed on the output.

> I used Terminator. I tried other terminal emulators and they behaved
> differently (output unrecognized bits of characters as question marks).
> Apparently Terminator has some weird buffering.

They probably keep state for Unicode characters but don't reset it. (I don't know whether they are required to.) Could you please try the following program to see whether it prints τ for all tau arrays below?

import std.stdio;
import std.algorithm;

void main() {
    char[][] taus = [ [ 207, 132 ],
                      [ 207, 0, 132 ],
                      [ 207, 'a', 132 ] ];

    foreach (i, tau; taus) {
        write(i, ": ");
        tau.each!(write);
        writeln();
    }
}

For me, only the first one is a τ in a Unicode environment. You may see 3 taus under Terminal.

> Sorry to bother you.

Not at all! I assume everybody here finds these topic very interesting like I do. :)

Ali

March 06
On Wednesday, 5 March 2025 at 21:26:32 UTC, Ali Çehreli wrote:
> They probably keep state for Unicode characters but don't reset it. (I don't know whether they are required to.)

Further experiments showed inconsistent behavior. The oddities are well reproduced with the terminal plugin for neovim ('akinsho/toggleterm.nvim'). So things aren't that interesting anymore.