Thread overview
Rune strings. Like in Go.
Sep 30, 2021
Alexey
Sep 30, 2021
Paul Backus
Sep 30, 2021
Alexey
Sep 30, 2021
H. S. Teoh
Sep 30, 2021
jfondren
Oct 01, 2021
Basile B.
Oct 01, 2021
Ali Çehreli
September 30, 2021

Can we have them in D? :)

September 30, 2021

On Thursday, 30 September 2021 at 20:44:54 UTC, Alexey wrote:

>

Can we have them in D? :)

We have them already. :) What Go calls a "rune" is called a dchar in D, and a string of them is a dchar[].

September 30, 2021

On Thursday, 30 September 2021 at 20:57:34 UTC, Paul Backus wrote:

>

On Thursday, 30 September 2021 at 20:44:54 UTC, Alexey wrote:

>

Can we have them in D? :)

We have them already. :) What Go calls a "rune" is called a dchar in D, and a string of them is a dchar[].

Go's runes and D's dchars - have different behavior.
Go's rune string, being converted to byte array like so []byte(string_var) and saved to file - results in UTF-8, while dchar is UTF-32.

this means - the frequent conversions between string and dstring in D is required for confortable work with unicode.

September 30, 2021
On Thu, Sep 30, 2021 at 10:47:25PM +0000, Alexey via Digitalmars-d wrote:
> On Thursday, 30 September 2021 at 20:57:34 UTC, Paul Backus wrote:
> > On Thursday, 30 September 2021 at 20:44:54 UTC, Alexey wrote:
> > > Can we have them in D? :)
> > 
> > We have them already. :) What Go calls a "rune" is called a `dchar` in D, and a string of them is a `dchar[]`.
> 
> Go's runes and D's dchars - have different behavior.
> Go's rune string, being converted to byte array like so
> `[]byte(string_var)` and saved to file - results in UTF-8, while dchar
> is UTF-32.
> 
> this means - the frequent conversions between string and dstring in D is required for confortable work with unicode.

This is not true. D strings are autodecoded with Phobos range functions, i.e., if you iterate over a string with Phobos, you will get a stream of dchars without having to convert the encoding.

(Ironically enough, autodecoding is regarded as a bad thing!)

Also, IIRC, writing a stream of dchars to a string sink, e.g., appender!string, will automatically encode into UTF-8, so no explicit conversion is needed afterwards.


T

-- 
If you look at a thing nine hundred and ninety-nine times, you are perfectly safe; if you look at it the thousandth time, you are in frightful danger of seeing it for the first time. -- G. K. Chesterton
September 30, 2021

On Thursday, 30 September 2021 at 22:47:25 UTC, Alexey wrote:

>

Go's rune string, being converted to byte array like so []byte(string_var) and saved to file - results in UTF-8, while dchar is UTF-32.

You're explicitly asking for a byte cast and instead of a byte cast you get a reencoding? You might prefer that because it's familiar, but that's a really confusing thing to do. std.string.representation meanwhile turns a dchar[] into an uint[].

And to get UTF-8 in a file, just write the string:

import std;

void write() {
    string noel = "no\u0308el";
    dstring dstr = noel.toUTF32;

    File("a", "w").writeln(noel);
    File("b", "w").writeln(dstr);
}

void main() {
    write;
    ubyte[] fromA = cast(ubyte[]) read("a");
    ubyte[] fromB = cast(ubyte[]) read("b");

    assert(fromA == fromB);
    writeln(fromA, "\n", fromB);
}

output:

[110, 111, 204, 136, 101, 108, 10]
[110, 111, 204, 136, 101, 108, 10]
October 01, 2021

On Thursday, 30 September 2021 at 20:44:54 UTC, Alexey wrote:

>

Can we have them in D? :)

do you want a verbatim about runs on IRC ?

September 30, 2021
On 9/30/21 1:44 PM, Alexey wrote:
> Can we have them in D? :)

Looks like Go "could" have them. ;)

Ali