Thread overview | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
August 17, 2004 Transcoding - Summary | ||||
---|---|---|---|---|
| ||||
We have two separate problems: (1) formatted I/O (2) unformatted I/O For unformatted I/O, we need the ability to read a sequence of dchars from some source, and the ability to write a sequence of dchars to some sink. The class which acts as a dchar source must perform decoding from some underlying ubyte source. The class which acts as a dchar sink must perform encoding to some underlying ubyte sink. The source and sink could be anything - a string; a console; a file; a socket; - even a simple counter which counts bytes and throws away data. So, to keep things generic, I shall use the terms "ubyte source", "ubyte sink", "dchar source" and "dchar sink". The traditional terms are: ubyte source = input stream ubyte sink = output stream dchar source = reader dchar sink = writer (I'm using new terms merely in order to avoid confusion with objects in std.stream, mango.io, and Java). For formatted I/O, we need: (1a) a replacement for printf() which emits a formatted sequence of dchars to an arbitrary dchar sink (1b) a replacement for scanf() which parses a sequence of dchars obtained from an arbitrary dchar source Further, for reasons of internationalization, our printf replacement must be able to random-access its variadic arguments. Observe that if the output of (1a) is plumbed into an encoder, and the input to (1b) is plumbed into a decoder, then formatted transcoding is achieved. This makes our printf/scanf replacements relatively easy to write. They are likely to require very little modification from the existing format()/unformat() routines, with essentially the only difference being that they must be dchar-based, not char-based. (Random-access of the arguments would be a new feature, however, though not necessarily an urgent one). Another oft-voiced requirement is that transcoding be independent of any particular string/stream implementation. (I suspect that if Phobos streams were fully-featured, fully-documented, bug-free and intuitive, then nobody would be asking for this requirement. But as things are, the requirement is there). So ... listed below are the jobs which need to be done. Volunteers are requested for any unclaimed jobs: (1) The source and sink interfaces need to be nailed down. (2) Given (1), dchar-based format()/unformat() replacements can be written. (3) Given (1), encoder and decoder classes/interfaces can be written. (4) Given (3), classes can be written to attach our encoders/decoders to std and mango streams, to strings, etc. (5) Given (3), encoders and decoders for SPECIFIC encodings can now be written. (6) Will somebody /please/ document std.Stream? I volunteer for (1) and (3). I'm hoping Sean will volunteer for (2). AntiAlias's excellent ideas for throughput enhancement using buffers are part of (1) and (3), so I suggest AntiAlias and I send each other code back and forth until we are both happy with it. Volunteers still needed for (4), (5) and (6) (though (4) and (5) are dependent upon (3)). Anyone who's a dab hand at Wiki might like to volunteer for (6). Arcane Jill |
August 17, 2004 Re: Transcoding - Summary | ||||
---|---|---|---|---|
| ||||
Posted in reply to Arcane Jill | In article <cfsm6d$va0$1@digitaldaemon.com>, Arcane Jill says... >(1) The source and sink interfaces need to be nailed down. >(2) Given (1), dchar-based format()/unformat() replacements can be written. >(3) Given (1), encoder and decoder classes/interfaces can be written. >(4) Given (3), classes can be written to attach our encoders/decoders to std and >mango streams, to strings, etc. >(5) Given (3), encoders and decoders for SPECIFIC encodings can now be written. >(6) Will somebody /please/ document std.Stream? Nick, I think your work falls into category (5). If you want that job, I guess it's yours, but if so, please wait for (3) before you start. Jill |
August 17, 2004 Re: Transcoding - Summary | ||||
---|---|---|---|---|
| ||||
Posted in reply to Arcane Jill | On Tue, 17 Aug 2004 10:21:01 +0000 (UTC), Arcane Jill wrote: > We have two separate problems: > (1) formatted I/O > (2) unformatted I/O > > For unformatted I/O, we need the ability to read a sequence of dchars from some source, and the ability to write a sequence of dchars to some sink. The class which acts as a dchar source must perform decoding from some underlying ubyte source. The class which acts as a dchar sink must perform encoding to some underlying ubyte sink. > > The source and sink could be anything - a string; a console; a file; a socket; - even a simple counter which counts bytes and throws away data. So, to keep things generic, I shall use the terms "ubyte source", "ubyte sink", "dchar source" and "dchar sink". The traditional terms are: > > ubyte source = input stream > ubyte sink = output stream > dchar source = reader > dchar sink = writer > > (I'm using new terms merely in order to avoid confusion with objects in std.stream, mango.io, and Java). > > For formatted I/O, we need: > (1a) a replacement for printf() which emits a formatted sequence of dchars to an > arbitrary dchar sink > (1b) a replacement for scanf() which parses a sequence of dchars obtained from > an arbitrary dchar source > > Further, for reasons of internationalization, our printf replacement must be able to random-access its variadic arguments. > > Observe that if the output of (1a) is plumbed into an encoder, and the input to (1b) is plumbed into a decoder, then formatted transcoding is achieved. This makes our printf/scanf replacements relatively easy to write. They are likely to require very little modification from the existing format()/unformat() routines, with essentially the only difference being that they must be dchar-based, not char-based. (Random-access of the arguments would be a new feature, however, though not necessarily an urgent one). > > Another oft-voiced requirement is that transcoding be independent of any particular string/stream implementation. (I suspect that if Phobos streams were fully-featured, fully-documented, bug-free and intuitive, then nobody would be asking for this requirement. But as things are, the requirement is there). > > So ... listed below are the jobs which need to be done. Volunteers are requested for any unclaimed jobs: > > (1) The source and sink interfaces need to be nailed down. > (2) Given (1), dchar-based format()/unformat() replacements can be written. > (3) Given (1), encoder and decoder classes/interfaces can be written. > (4) Given (3), classes can be written to attach our encoders/decoders to std and > mango streams, to strings, etc. > (5) Given (3), encoders and decoders for SPECIFIC encodings can now be written. > (6) Will somebody /please/ document std.Stream? > > I volunteer for (1) and (3). I'm hoping Sean will volunteer for (2). AntiAlias's > excellent ideas for throughput enhancement using buffers are part of (1) and > (3), so I suggest AntiAlias and I send each other code back and forth until we > are both happy with it. > > Volunteers still needed for (4), (5) and (6) (though (4) and (5) are dependent > upon (3)). Anyone who's a dab hand at Wiki might like to volunteer for (6). > > Arcane Jill I hope I'm not stating the bleeding obvious, but you are talking about TEXT I/O aren't you? There is also a lot of other I/O that is not text based - sound and image files, databases, etc... -- Derek Melbourne, Australia |
August 18, 2004 Re: Transcoding - Summary | ||||
---|---|---|---|---|
| ||||
Posted in reply to Arcane Jill | "Arcane Jill" <Arcane_member@pathlink.com> wrote in message news:cfsm6d$va0$1@digitaldaemon.com... > Further, for reasons of internationalization, our printf replacement must be > able to random-access its variadic arguments. I disagree with this requirement. It breaks the nice way that std.format works. The only place where reordering the arguments is useful is in date/time formatting, and a specialized formatter would be suitable for that (and there are many other nice things one can do with a specialized date/time formatter). |
August 18, 2004 Re: Transcoding - Summary | ||||
---|---|---|---|---|
| ||||
Posted in reply to Walter | On Tue, 17 Aug 2004 15:00:28 -0700, Walter <newshound@digitalmars.com> wrote: > "Arcane Jill" <Arcane_member@pathlink.com> wrote in message > news:cfsm6d$va0$1@digitaldaemon.com... >> Further, for reasons of internationalization, our printf replacement must > be >> able to random-access its variadic arguments. > > I disagree with this requirement. It breaks the nice way that std.format > works. The only place where reordering the arguments is useful is in > date/time formatting, and a specialized formatter would be suitable for that > (and there are many other nice things one can do with a specialized > date/time formatter). Did you miss the thread that mentioned that sentence structure in various languages differ? Example: english :- "The DOG is BIG" other :- ".. BIG .. DOG" (I don't actually know any other languages) So, it would be kind of useful to be able to define the format strings as: english :- "The $1 is $2" other :- ".. $2 .. $1" and be able to go: printf(format[lang_id],"DOG","BIG"); Regan -- Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/ |
August 18, 2004 Re: Transcoding - Summary | ||||
---|---|---|---|---|
| ||||
Posted in reply to Walter | On Tue, 17 Aug 2004 15:00:28 -0700, Walter wrote: > "Arcane Jill" <Arcane_member@pathlink.com> wrote in message news:cfsm6d$va0$1@digitaldaemon.com... >> Further, for reasons of internationalization, our printf replacement must > be >> able to random-access its variadic arguments. > > I disagree with this requirement. It breaks the nice way that std.format works. The only place where reordering the arguments is useful is in date/time formatting, and a specialized formatter would be suitable for that (and there are many other nice things one can do with a specialized date/time formatter). I think that AJ was suggesting that there exists a business need for a type of formatter that can express in its template, the order that arguments will appear in the resultant string, regardless of the order that they are presented to the formatter. For example (contrived for simplicity): char[] Msg; if (gUserLang == LANG_english) temp = "%{1}s %{2}s %{3}s %{4}s %{5}s\n"; else temp = "%{2}s %{1}s %{5}s %{4}s %{3}s\n"; Msg = expand(temp, pSubjectDesc, pSubject, pVerb, pObjectDesc, pObject); writef(Msg); -- Derek Melbourne, Australia 18/Aug/04 10:31:55 AM |
August 18, 2004 Re: Transcoding - Summary | ||||
---|---|---|---|---|
| ||||
Posted in reply to Regan Heath | Regan Heath wrote: > Did you miss the thread that mentioned that sentence structure in various languages differ? > Example: > > english :- "The DOG is BIG" > other :- ".. BIG .. DOG" > > (I don't actually know any other languages) > > So, it would be kind of useful to be able to define the format strings as: > > english :- "The $1 is $2" > other :- ".. $2 .. $1" > > and be able to go: > > printf(format[lang_id],"DOG","BIG"); This isn't strictly a requirement of the formatting tools. Perhaps a library function which, given a number of varargs, reordered them and passed them to another function? Your code could look (very roughly) like this: char[] formatString = LookupNLSFormat (msgID, language); char[] reorderString = LookupNLSReorder(msgID, language); vwritef(formatString, doArgumentReorder(reorderString, <args>)); The advantage here is that you can do reordering for NLS support but writef stays simple. |
August 18, 2004 Re: Transcoding - Summary | ||||
---|---|---|---|---|
| ||||
Posted in reply to Russ Lewis | On Tue, 17 Aug 2004 19:45:47 -0700, Russ Lewis <spamhole-2001-07-16@deming-os.org> wrote: > Regan Heath wrote: >> Did you miss the thread that mentioned that sentence structure in various languages differ? >> Example: >> >> english :- "The DOG is BIG" >> other :- ".. BIG .. DOG" >> >> (I don't actually know any other languages) >> >> So, it would be kind of useful to be able to define the format strings as: >> >> english :- "The $1 is $2" >> other :- ".. $2 .. $1" >> >> and be able to go: >> >> printf(format[lang_id],"DOG","BIG"); > > This isn't strictly a requirement of the formatting tools. Perhaps a library function which, given a number of varargs, reordered them and passed them to another function? > > Your code could look (very roughly) like this: > > char[] formatString = LookupNLSFormat (msgID, language); > char[] reorderString = LookupNLSReorder(msgID, language); > vwritef(formatString, doArgumentReorder(reorderString, <args>)); > > The advantage here is that you can do reordering for NLS support but writef stays simple. The disadvantage being that the above idea is harder to maintain, there are 2 things that define how the message is displayed, 2 things in which a mistake could be made, 2 things in which you have to make changes, .. How hard or complex is it to implement a writef that can do: writef("The %1 is %2","dog","big"); (%1 and %2 can be changed to any symbol that fits with the current symbol set used in writef) I can't see it being a particularly big leap from what it currently does. Also consider: writef("A really long %1 that contains the same %1 several times. %1's like this could be quite common, yes?","string"); Regan -- Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/ |
August 18, 2004 POSIX printf() (was Re: Transcoding - Summary) | ||||
---|---|---|---|---|
| ||||
Posted in reply to Regan Heath | In article <opscwr50rl5a2sq9@digitalmars.com>, Regan Heath says... > >On Tue, 17 Aug 2004 19:45:47 -0700, Russ Lewis >> This isn't strictly a requirement of the formatting tools. Perhaps a library function which, given a number of varargs, reordered them and passed them to another function? >> >> Your code could look (very roughly) like this: >> >> char[] formatString = LookupNLSFormat (msgID, language); >> char[] reorderString = LookupNLSReorder(msgID, language); >> vwritef(formatString, doArgumentReorder(reorderString, <args>)); >> >The disadvantage being that the above idea is harder to maintain, there are 2 things that define how the message is displayed, 2 things in which a mistake could be made, 2 things in which you have to make changes, .. > >How hard or complex is it to implement a writef that can do: > > writef("The %1 is %2","dog","big"); > >(%1 and %2 can be changed to any symbol that fits with the current symbol set used in writef) > >I can't see it being a particularly big leap from what it currently does. > > >Also consider: > > writef("A really long %1 that contains the same %1 several times. %1's >like this could be quite common, yes?","string"); Well, I didn't mean to cause trouble here. :) Anyway. I'm agreeing with Regan, and slightly disagreeing with Walter. There /is/ a need to be able do: # // English # article = "the"; # adjective = "red"; # noun = "house"; # formatString = "%s %s %s"; // default order # # // French # article = "la"; # adjective = "rouge"; # noun = "maison"; # formatString = "%(1)s %(3)s %(2)s"; # # writef(formatString, article, adjective, noun); Sorry, but that's a requirement. It's not an /urgent/ requirement, but you can bet vast sums of money that internationalization will start to become more and more of an issue once other transcoding issues have been dealt with. Russ's idea is good, but obviously not /as/ good as simply coming up with an improved printf() replacement. Right now, POSIX-printf() can do this random-access, but D's writef() can't. It's not urgent, and we'll solve it in time. But it /is/ an internationalization issue, and it won't go away. Arcane Jill |
August 18, 2004 Re: Transcoding - Summary | ||||
---|---|---|---|---|
| ||||
Posted in reply to Arcane Jill | Jill ~ I have a utf-8 transcoder that I'm using as a plaything within Mango; if you're interested, I'll send it on. "Arcane Jill" <Arcane_member@pathlink.com> wrote in message news:cfsm6d$va0$1@digitaldaemon.com... > > We have two separate problems: > (1) formatted I/O > (2) unformatted I/O > > For unformatted I/O, we need the ability to read a sequence of dchars from some > source, and the ability to write a sequence of dchars to some sink. The class > which acts as a dchar source must perform decoding from some underlying ubyte > source. The class which acts as a dchar sink must perform encoding to some underlying ubyte sink. > > The source and sink could be anything - a string; a console; a file; a socket; - > even a simple counter which counts bytes and throws away data. So, to keep things generic, I shall use the terms "ubyte source", "ubyte sink", "dchar source" and "dchar sink". The traditional terms are: > > ubyte source = input stream > ubyte sink = output stream > dchar source = reader > dchar sink = writer > > (I'm using new terms merely in order to avoid confusion with objects in std.stream, mango.io, and Java). > > For formatted I/O, we need: > (1a) a replacement for printf() which emits a formatted sequence of dchars to an > arbitrary dchar sink > (1b) a replacement for scanf() which parses a sequence of dchars obtained from > an arbitrary dchar source > > Further, for reasons of internationalization, our printf replacement must be > able to random-access its variadic arguments. > > Observe that if the output of (1a) is plumbed into an encoder, and the input to > (1b) is plumbed into a decoder, then formatted transcoding is achieved. This > makes our printf/scanf replacements relatively easy to write. They are likely to > require very little modification from the existing format()/unformat() routines, > with essentially the only difference being that they must be dchar-based, not > char-based. (Random-access of the arguments would be a new feature, however, > though not necessarily an urgent one). > > Another oft-voiced requirement is that transcoding be independent of any particular string/stream implementation. (I suspect that if Phobos streams were > fully-featured, fully-documented, bug-free and intuitive, then nobody would be > asking for this requirement. But as things are, the requirement is there). > > So ... listed below are the jobs which need to be done. Volunteers are requested > for any unclaimed jobs: > > (1) The source and sink interfaces need to be nailed down. > (2) Given (1), dchar-based format()/unformat() replacements can be written. > (3) Given (1), encoder and decoder classes/interfaces can be written. > (4) Given (3), classes can be written to attach our encoders/decoders to std and > mango streams, to strings, etc. > (5) Given (3), encoders and decoders for SPECIFIC encodings can now be written. > (6) Will somebody /please/ document std.Stream? > > I volunteer for (1) and (3). I'm hoping Sean will volunteer for (2). AntiAlias's > excellent ideas for throughput enhancement using buffers are part of (1) and > (3), so I suggest AntiAlias and I send each other code back and forth until we > are both happy with it. > > Volunteers still needed for (4), (5) and (6) (though (4) and (5) are dependent > upon (3)). Anyone who's a dab hand at Wiki might like to volunteer for (6). > > Arcane Jill > > |
Copyright © 1999-2021 by the D Language Foundation