Jump to page: 1 2
Thread overview
Best way to read/write Chinese (GBK/GB18030) files?
Mar 07, 2023
John Xu
Mar 07, 2023
ryuukk_
Mar 10, 2023
John Xu
Mar 10, 2023
zjh
Mar 10, 2023
zjh
Mar 11, 2023
0xEAB
Mar 12, 2023
zjh
Mar 12, 2023
0xEAB
Mar 13, 2023
zjh
Mar 14, 2023
zjh
Mar 14, 2023
Kagamin
Mar 14, 2023
zjh
Mar 22, 2023
Kagamin
Mar 23, 2023
zjh
March 07, 2023

I'm new to dlang. I didn't find much tutorials on internet about how to read/write Chinese easily. std.encoding doesn't seem to support GBK or GB18030:

"Encodings currently supported are UTF-8, UTF-16, UTF-32, ASCII, ISO-8859-1 (also known as LATIN-1), ISO-8859-2 (LATIN-2), WINDOWS-1250, WINDOWS-1251 and WINDOWS-1252."

Then what is best way to read GBK/GB18030 contents ? Even GBK/GB18030 file names ?

March 06, 2023

On 3/6/23 8:45 PM, John Xu wrote:

>

I'm new to dlang. I didn't find much tutorials on internet about how to read/write Chinese easily. std.encoding doesn't seem to support GBK or GB18030:

"Encodings currently supported are UTF-8, UTF-16, UTF-32, ASCII, ISO-8859-1 (also known as LATIN-1), ISO-8859-2 (LATIN-2), WINDOWS-1250, WINDOWS-1251 and WINDOWS-1252."

It appears that encoding is not supported.

There is a scant mention of it, in the BOM detection. But I don't think there's any mechanism to encode/decode it.

>

Then what is best way to read GBK/GB18030 contents ? Even GBK/GB18030 file names ?

D has direct bindings to C, so possibly using a C library. I don't see anything jumping out at me from code.dlang.org

-Steve

March 07, 2023

On Tuesday, 7 March 2023 at 01:45:27 UTC, John Xu wrote:

>

I'm new to dlang. I didn't find much tutorials on internet about how to read/write Chinese easily. std.encoding doesn't seem to support GBK or GB18030:

"Encodings currently supported are UTF-8, UTF-16, UTF-32, ASCII, ISO-8859-1 (also known as LATIN-1), ISO-8859-2 (LATIN-2), WINDOWS-1250, WINDOWS-1251 and WINDOWS-1252."

Then what is best way to read GBK/GB18030 contents ? Even GBK/GB18030 file names ?

I found this: https://github.com/meatatt/exCode/blob/master/source/excode/package.d

There is mention of unicode/GBK conversion, maybe it could be helpful

March 10, 2023
>

I found this: https://github.com/meatatt/exCode/blob/master/source/excode/package.d

There is mention of unicode/GBK conversion, maybe it could be helpful

Thanks for quick answers. Now I found I can read both UTF8 and UTF-16LE
chinese file:
string txt = std.file.read(chineseFile).to!string;

and write to UTF8 file:
std.file.write(utf8ChineseFile, txt);

But still need figure out how to read/write GBK directly.

March 10, 2023

On Friday, 10 March 2023 at 02:48:43 UTC, John Xu wrote:

module chinese;
import std.stdio : writeln;
import std.conv;
import std.windows.charset;

int main(string[] argv)
{
	auto s1 = "中文";//utf8 字符串
	writeln("word:"~ s1); //乱的
	writeln("word:" ~ to!string(toMBSz(text(s1)))); //转后就正常了
    writeln("Hello D-World!");
    return 0;
}
March 10, 2023

On Friday, 10 March 2023 at 06:19:38 UTC, zjh wrote:

D language is too unfriendly for Chinese users!
You can't even write gbk files.

March 11, 2023
On Friday, 10 March 2023 at 07:16:32 UTC, zjh wrote:
> `D language` is too unfriendly for Chinese users!
> You can't even write `gbk` files.

D’s char + string types are Unicode.
To quote the tour, “In D, *all* strings are Unicode strings”.

If you desire to use other encodings, how about using ubyte + ubyte[]?
March 12, 2023

On Saturday, 11 March 2023 at 19:56:09 UTC, 0xEAB wrote:

>

If you desire to use other encodings, how about using ubyte + ubyte[]?

There is no example. An example should be added in an obvious position.
I tried for a long time, but couldn't output gbk, and I finally gave up.

March 12, 2023

On Sunday, 12 March 2023 at 00:54:53 UTC, zjh wrote:

>

On Saturday, 11 March 2023 at 19:56:09 UTC, 0xEAB wrote:

>

If you desire to use other encodings, how about using ubyte + ubyte[]?

There is no example.

To read binary data from a file and dump it into another, you do:

import std.file : read, write;

void[] data = read("infile.txt");
write("outfile.txt", data);

To write binary data to a file:

import std.file : write;

ubyte[] data = [0xA0, 0x0A, 0x30, 0x01, 0xFF, 0x00, 0xFE];
write("myfile.txt", data);

data could contain GBK encoded text, for example. (Just don’t use "Unicode literals".)

March 13, 2023

On Sunday, 12 March 2023 at 20:03:23 UTC, 0xEAB wrote:

>

...

Thank you for your reply, but is there any way to output gbk code to the console?

« First   ‹ Prev
1 2