Thread overview | |||||||||
---|---|---|---|---|---|---|---|---|---|
|
August 06, 2010 [phobos] Transcoded text stdio | ||||
---|---|---|---|---|
| ||||
Hello, I'm trying to integrate codeset conversion facility to std.stdio. But how can it be done? Mixing transcoded and non-transcoded (UTF-8) I/O in the same File structure will mess up the source. I think separating UTF-8 based I/O and transcoded I/O is necessary. I could think of the following four ways. 1. Integrate everything in the File anyway. 2. Make the File to always perform conversion. 3. Create a distinct type for transcoded I/O. ---------- shared TranscodedFile stdout; stdout.writeln("Hall?, V?rld!"); ---------- # http://github.com/sinfu/misc/blob/master/stdio/test01.d 4. Simplify the File and define upper layer structures. ---------- // File itself doesn't provide byLine etc. shared File stdout; // these 'ports' perform actual I/O for specific purposes shared UTF8TextIOPort stdoutUTF8; shared NativeTextIOPort stdoutText; shared BinaryIOPort stdoutBin; // wrap stdout with various 'I/O ports' stdoutUTF8 = UTF8TextIOPort(stdout); stdoutText = NativeTextIOPort(stdout); stdoutBin = BinaryIOPort(stdout); // write text in UTF-8 stdoutUTF8.writeln("Hall?, V?rld!"); // write text in console encoding stdoutText.writeln("Hall?, V?rld!"); // free functions use stdoutText writeln("Hall?, V?rld!"); ---------- # http://github.com/sinfu/misc/blob/master/stdio/test02.d ... I'm uncertain of which is the best. Perhaps there are more reasonable ways. What do you think? Any ideas? Thanks, Shin |
August 06, 2010 [phobos] Transcoded text stdio | ||||
---|---|---|---|---|
| ||||
Posted in reply to Shin Fujishiro | I like #4.
I think we should start specifications development of the interface of new D's I/O.
2010/8/6 Shin Fujishiro <rsinfu at gmail.com>:
> Hello,
>
> I'm trying to integrate codeset conversion facility to std.stdio. But how can it be done?
>
> Mixing transcoded and non-transcoded (UTF-8) I/O in the same File structure will mess up the source. ?I think separating UTF-8 based I/O and transcoded I/O is necessary.
>
> I could think of the following four ways.
>
> 1. ?Integrate everything in the File anyway.
>
> 2. ?Make the File to always perform conversion.
>
> 3. ?Create a distinct type for transcoded I/O.
> ----------
> ? ?shared TranscodedFile stdout;
> ? ?stdout.writeln("Hall?, V?rld!");
> ----------
> # http://github.com/sinfu/misc/blob/master/stdio/test01.d
>
> 4. ?Simplify the File and define upper layer structures.
> ----------
> ? ?// File itself doesn't provide byLine etc.
> ? ?shared File stdout;
>
> ? ?// these 'ports' perform actual I/O for specific purposes
> ? ?shared UTF8TextIOPort stdoutUTF8;
> ? ?shared NativeTextIOPort stdoutText;
> ? ?shared BinaryIOPort stdoutBin;
>
> ? ?// wrap stdout with various 'I/O ports'
> ? ?stdoutUTF8 = UTF8TextIOPort(stdout);
> ? ?stdoutText = NativeTextIOPort(stdout);
> ? ?stdoutBin = BinaryIOPort(stdout);
>
> ? ?// write text in UTF-8
> ? ?stdoutUTF8.writeln("Hall?, V?rld!");
>
> ? ?// write text in console encoding
> ? ?stdoutText.writeln("Hall?, V?rld!");
>
> ? ?// free functions use stdoutText
> ? ?writeln("Hall?, V?rld!");
> ----------
> # http://github.com/sinfu/misc/blob/master/stdio/test02.d
>
> ...
>
> I'm uncertain of which is the best. ?Perhaps there are more reasonable ways. ?What do you think? ?Any ideas?
>
>
> Thanks,
> Shin
> _______________________________________________
> phobos mailing list
> phobos at puremagic.com
> http://lists.puremagic.com/mailman/listinfo/phobos
|
September 17, 2010 [phobos] Transcoded text stdio | ||||
---|---|---|---|---|
| ||||
Posted in reply to Shin Fujishiro | Hi Shin and everyone,
Regarding transcoding output, please let me know I understand the problem correctly: under Windows (and possibly under other OSs under certain configurations) the console is not UTF and cannot be reasonably forced to be UTF.
I think for such situations, the classic Decorator-based design with stacked interfaces works well: you have a TranscodingStream wrapping a NativeStream or a UTFStream or whatever.
The streaming interface question comes again, i.e. what is the interface that allows such stacking with minimal cost in efficiency?
File was not designed for transcoding, but as long as it supports raw reads and writes, I think writing a wrapper over it should be possible. I'm talking about something like this:
auto nativeStdout = nativeTranscoder(stdout);
nativeStdout.writeln("yah");
The native transcoder would only use rawWrite and flush for stdout - not the higher level text functions.
Then we can define more sophisticated transcoders, e.g. one that transcodes from UTF to some Eurasian codepages etc.
Works?
Andrei
On 8/6/10 1:50 CDT, Shin Fujishiro wrote:
> Hello,
>
> I'm trying to integrate codeset conversion facility to std.stdio. But how can it be done?
>
> Mixing transcoded and non-transcoded (UTF-8) I/O in the same File structure will mess up the source. I think separating UTF-8 based I/O and transcoded I/O is necessary.
>
> I could think of the following four ways.
>
> 1. Integrate everything in the File anyway.
>
> 2. Make the File to always perform conversion.
>
> 3. Create a distinct type for transcoded I/O.
> ----------
> shared TranscodedFile stdout;
> stdout.writeln("Hall?, V?rld!");
> ----------
> # http://github.com/sinfu/misc/blob/master/stdio/test01.d
>
> 4. Simplify the File and define upper layer structures.
> ----------
> // File itself doesn't provide byLine etc.
> shared File stdout;
>
> // these 'ports' perform actual I/O for specific purposes
> shared UTF8TextIOPort stdoutUTF8;
> shared NativeTextIOPort stdoutText;
> shared BinaryIOPort stdoutBin;
>
> // wrap stdout with various 'I/O ports'
> stdoutUTF8 = UTF8TextIOPort(stdout);
> stdoutText = NativeTextIOPort(stdout);
> stdoutBin = BinaryIOPort(stdout);
>
> // write text in UTF-8
> stdoutUTF8.writeln("Hall?, V?rld!");
>
> // write text in console encoding
> stdoutText.writeln("Hall?, V?rld!");
>
> // free functions use stdoutText
> writeln("Hall?, V?rld!");
> ----------
> # http://github.com/sinfu/misc/blob/master/stdio/test02.d
>
> ...
>
> I'm uncertain of which is the best. Perhaps there are more reasonable ways. What do you think? Any ideas?
>
>
> Thanks,
> Shin
> _______________________________________________
> phobos mailing list
> phobos at puremagic.com
> http://lists.puremagic.com/mailman/listinfo/phobos
|
September 18, 2010 [phobos] Transcoded text stdio | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | Thank you for picking up the topic! Andrei Alexandrescu <andrei at erdani.com> wrote: > Regarding transcoding output, please let me know I understand the problem correctly: under Windows (and possibly under other OSs under certain configurations) the console is not UTF and cannot be reasonably forced to be UTF. Yes. Neither input or output isn't UTF under Windows. > I think for such situations, the classic Decorator-based design with stacked interfaces works well: you have a TranscodingStream wrapping a NativeStream or a UTFStream or whatever. > > The streaming interface question comes again, i.e. what is the interface that allows such stacking with minimal cost in efficiency? The cost is minimal when the transcoder has direct access to both I/O device and buffer. I mean, there would be no redundant copy involved: ubyte[N] tmp = void; convert(buffer, tmp); device_write(tmp); So, the best layer for doing converted (or filtered) I/O is the stream buffer. But I feels like it's not quite right... buffering layer might be too 'low level' for character code conversion. Shin |
September 18, 2010 [phobos] Transcoded text stdio | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | By the way: For now, how about working around the Windows console problem by putting the following code in LockingTextWriter? // workaround if (fps == core.stdc.stdio.stdout && orientation <= 0) { foreach (dchar c; writeme) { immutable cp = GetConsoleOutputCP(); wchar[2] wc; char[16] mb; immutable wcLen = encode(wc, c); immutable mbLen = WideCharToMultiByte( cp, 0, wc.ptr, wcLen, mb.ptr, mb.length, null, null); foreach (char c; mb[0 .. mbLen]) { FPUTC(c, handle); } } } Although the long-term solution is a conversion-aware I/O system, we should make it sure that the following works under Windows: import std.stdio; void main() { writeln("Hall?, V?rld!"); } Shin |
January 02, 2011 [phobos] Transcoded text stdio | ||||
---|---|---|---|---|
| ||||
Posted in reply to Shin Fujishiro | We definitely need to keep an eye on this issue while the new stream design is in flux. Transcoding Windows I/O is a major application.
Andrei
On 9/17/10 12:04 PM, Shin Fujishiro wrote:
> Thank you for picking up the topic!
>
> Andrei Alexandrescu<andrei at erdani.com> wrote:
>> Regarding transcoding output, please let me know I understand the problem correctly: under Windows (and possibly under other OSs under certain configurations) the console is not UTF and cannot be reasonably forced to be UTF.
>
> Yes. Neither input or output isn't UTF under Windows.
>
>> I think for such situations, the classic Decorator-based design with stacked interfaces works well: you have a TranscodingStream wrapping a NativeStream or a UTFStream or whatever.
>>
>> The streaming interface question comes again, i.e. what is the interface that allows such stacking with minimal cost in efficiency?
>
> The cost is minimal when the transcoder has direct access to both I/O device and buffer. I mean, there would be no redundant copy involved:
>
> ubyte[N] tmp = void;
> convert(buffer, tmp);
> device_write(tmp);
>
> So, the best layer for doing converted (or filtered) I/O is the stream buffer. But I feels like it's not quite right... buffering layer might be too 'low level' for character code conversion.
>
>
> Shin
> _______________________________________________
> phobos mailing list
> phobos at puremagic.com
> http://lists.puremagic.com/mailman/listinfo/phobos
|
January 02, 2011 [phobos] Transcoded text stdio | ||||
---|---|---|---|---|
| ||||
Posted in reply to Shin Fujishiro | Shin, I think you may put this into Phobos. Please make sure you do so only on version(Windows).
Andrei
On 9/17/10 12:05 PM, Shin Fujishiro wrote:
> By the way: For now, how about working around the Windows console problem by putting the following code in LockingTextWriter?
>
> // workaround
> if (fps == core.stdc.stdio.stdout&& orientation<= 0)
> {
> foreach (dchar c; writeme)
> {
> immutable cp = GetConsoleOutputCP();
> wchar[2] wc;
> char[16] mb;
> immutable wcLen = encode(wc, c);
> immutable mbLen = WideCharToMultiByte(
> cp, 0, wc.ptr, wcLen, mb.ptr, mb.length, null, null);
> foreach (char c; mb[0 .. mbLen])
> {
> FPUTC(c, handle);
> }
> }
> }
>
> Although the long-term solution is a conversion-aware I/O system, we should make it sure that the following works under Windows:
>
> import std.stdio;
> void main()
> {
> writeln("Hall?, V?rld!");
> }
>
>
> Shin
> _______________________________________________
> phobos mailing list
> phobos at puremagic.com
> http://lists.puremagic.com/mailman/listinfo/phobos
|
Copyright © 1999-2021 by the D Language Foundation