Transcoding - Summary - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » Transcoding - Summary

Thread overview

Transcoding - Summary

Aug 17, 2004

Aug 17, 2004

Aug 17, 2004

Aug 18, 2004

Aug 18, 2004

Aug 18, 2004

Aug 18, 2004

POSIX printf() (was Re: Transcoding - Summary)
Aug 18, 2004 Arcane Jill

Aug 18, 2004

Aug 18, 2004

Aug 18, 2004

Aug 18, 2004

Aug 19, 2004

Aug 19, 2004

August 17, 2004

Transcoding - Summary

Posted by Arcane Jill

Arcane Jill

We have two separate problems:
(1) formatted I/O
(2) unformatted I/O

For unformatted I/O, we need the ability to read a sequence of dchars from some source, and the ability to write a sequence of dchars to some sink. The class which acts as a dchar source must perform decoding from some underlying ubyte source. The class which acts as a dchar sink must perform encoding to some underlying ubyte sink.

The source and sink could be anything - a string; a console; a file; a socket; - even a simple counter which counts bytes and throws away data. So, to keep things generic, I shall use the terms "ubyte source", "ubyte sink", "dchar source" and "dchar sink". The traditional terms are:

ubyte source = input stream
ubyte sink = output stream
dchar source = reader
dchar sink = writer

(I'm using new terms merely in order to avoid confusion with objects in std.stream, mango.io, and Java).

For formatted I/O, we need:
(1a) a replacement for printf() which emits a formatted sequence of dchars to an
arbitrary dchar sink
(1b) a replacement for scanf() which parses a sequence of dchars obtained from
an arbitrary dchar source

Further, for reasons of internationalization, our printf replacement must be able to random-access its variadic arguments.

Observe that if the output of (1a) is plumbed into an encoder, and the input to (1b) is plumbed into a decoder, then formatted transcoding is achieved. This makes our printf/scanf replacements relatively easy to write. They are likely to require very little modification from the existing format()/unformat() routines, with essentially the only difference being that they must be dchar-based, not char-based. (Random-access of the arguments would be a new feature, however, though not necessarily an urgent one).

Another oft-voiced requirement is that transcoding be independent of any particular string/stream implementation. (I suspect that if Phobos streams were fully-featured, fully-documented, bug-free and intuitive, then nobody would be asking for this requirement. But as things are, the requirement is there).

So ... listed below are the jobs which need to be done. Volunteers are requested for any unclaimed jobs:

(1) The source and sink interfaces need to be nailed down.
(2) Given (1), dchar-based format()/unformat() replacements can be written.
(3) Given (1), encoder and decoder classes/interfaces can be written.
(4) Given (3), classes can be written to attach our encoders/decoders to std and
mango streams, to strings, etc.
(5) Given (3), encoders and decoders for SPECIFIC encodings can now be written.
(6) Will somebody /please/ document std.Stream?

I volunteer for (1) and (3). I'm hoping Sean will volunteer for (2). AntiAlias's
excellent ideas for throughput enhancement using buffers are part of (1) and
(3), so I suggest AntiAlias and I send each other code back and forth until we
are both happy with it.

Volunteers still needed for (4), (5) and (6) (though (4) and (5) are dependent
upon (3)). Anyone who's a dab hand at Wiki might like to volunteer for (6).

Arcane Jill

August 17, 2004

Re: Transcoding - Summary

Posted by Arcane Jill
in reply to Arcane Jill

Arcane Jill

Posted in reply to Arcane Jill

In article <cfsm6d$va0$1@digitaldaemon.com>, Arcane Jill says...

>(1) The source and sink interfaces need to be nailed down.
>(2) Given (1), dchar-based format()/unformat() replacements can be written.
>(3) Given (1), encoder and decoder classes/interfaces can be written.
>(4) Given (3), classes can be written to attach our encoders/decoders to std and
>mango streams, to strings, etc.
>(5) Given (3), encoders and decoders for SPECIFIC encodings can now be written.
>(6) Will somebody /please/ document std.Stream?

Nick, I think your work falls into category (5). If you want that job, I guess
it's yours, but if so, please wait for (3) before you start.

Jill

August 17, 2004

Re: Transcoding - Summary

Posted by Derek
in reply to Arcane Jill

Derek

Posted in reply to Arcane Jill

On Tue, 17 Aug 2004 10:21:01 +0000 (UTC), Arcane Jill wrote:

> We have two separate problems:
> (1) formatted I/O
> (2) unformatted I/O
> 
> For unformatted I/O, we need the ability to read a sequence of dchars from some source, and the ability to write a sequence of dchars to some sink. The class which acts as a dchar source must perform decoding from some underlying ubyte source. The class which acts as a dchar sink must perform encoding to some underlying ubyte sink.
> 
> The source and sink could be anything - a string; a console; a file; a socket; - even a simple counter which counts bytes and throws away data. So, to keep things generic, I shall use the terms "ubyte source", "ubyte sink", "dchar source" and "dchar sink". The traditional terms are:
> 
> ubyte source = input stream
> ubyte sink = output stream
> dchar source = reader
> dchar sink = writer
> 
> (I'm using new terms merely in order to avoid confusion with objects in std.stream, mango.io, and Java).
> 
> For formatted I/O, we need:
> (1a) a replacement for printf() which emits a formatted sequence of dchars to an
> arbitrary dchar sink
> (1b) a replacement for scanf() which parses a sequence of dchars obtained from
> an arbitrary dchar source
> 
> Further, for reasons of internationalization, our printf replacement must be able to random-access its variadic arguments.
> 
> Observe that if the output of (1a) is plumbed into an encoder, and the input to (1b) is plumbed into a decoder, then formatted transcoding is achieved. This makes our printf/scanf replacements relatively easy to write. They are likely to require very little modification from the existing format()/unformat() routines, with essentially the only difference being that they must be dchar-based, not char-based. (Random-access of the arguments would be a new feature, however, though not necessarily an urgent one).
> 
> Another oft-voiced requirement is that transcoding be independent of any particular string/stream implementation. (I suspect that if Phobos streams were fully-featured, fully-documented, bug-free and intuitive, then nobody would be asking for this requirement. But as things are, the requirement is there).
> 
> So ... listed below are the jobs which need to be done. Volunteers are requested for any unclaimed jobs:
> 
> (1) The source and sink interfaces need to be nailed down.
> (2) Given (1), dchar-based format()/unformat() replacements can be written.
> (3) Given (1), encoder and decoder classes/interfaces can be written.
> (4) Given (3), classes can be written to attach our encoders/decoders to std and
> mango streams, to strings, etc.
> (5) Given (3), encoders and decoders for SPECIFIC encodings can now be written.
> (6) Will somebody /please/ document std.Stream?
> 
> I volunteer for (1) and (3). I'm hoping Sean will volunteer for (2). AntiAlias's
> excellent ideas for throughput enhancement using buffers are part of (1) and
> (3), so I suggest AntiAlias and I send each other code back and forth until we
> are both happy with it.
> 
> Volunteers still needed for (4), (5) and (6) (though (4) and (5) are dependent
> upon (3)). Anyone who's a dab hand at Wiki might like to volunteer for (6).
> 
> Arcane Jill

I hope I'm not stating the bleeding obvious, but you are talking about TEXT I/O aren't you? There is also a lot of other I/O that is not text based - sound and image files, databases, etc...

-- 
Derek
Melbourne, Australia

August 18, 2004

Re: Transcoding - Summary

Posted by Walter
in reply to Arcane Jill

Walter

Posted in reply to Arcane Jill

"Arcane Jill" <Arcane_member@pathlink.com> wrote in message news:cfsm6d$va0$1@digitaldaemon.com...
> Further, for reasons of internationalization, our printf replacement must
be
> able to random-access its variadic arguments.

I disagree with this requirement. It breaks the nice way that std.format works. The only place where reordering the arguments is useful is in date/time formatting, and a specialized formatter would be suitable for that (and there are many other nice things one can do with a specialized date/time formatter).

August 18, 2004

Re: Transcoding - Summary

Posted by Regan Heath
in reply to Walter

Regan Heath

Posted in reply to Walter

On Tue, 17 Aug 2004 15:00:28 -0700, Walter <newshound@digitalmars.com> wrote:
> "Arcane Jill" <Arcane_member@pathlink.com> wrote in message
> news:cfsm6d$va0$1@digitaldaemon.com...
>> Further, for reasons of internationalization, our printf replacement must
> be
>> able to random-access its variadic arguments.
>
> I disagree with this requirement. It breaks the nice way that std.format
> works. The only place where reordering the arguments is useful is in
> date/time formatting, and a specialized formatter would be suitable for that
> (and there are many other nice things one can do with a specialized
> date/time formatter).

Did you miss the thread that mentioned that sentence structure in various languages differ?
Example:

  english :- "The DOG is BIG"
  other   :- ".. BIG .. DOG"

(I don't actually know any other languages)

So, it would be kind of useful to be able to define the format strings as:

  english :- "The $1 is $2"
  other   :- ".. $2 .. $1"

and be able to go:

  printf(format[lang_id],"DOG","BIG");

Regan

-- 
Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/

August 18, 2004

Re: Transcoding - Summary

Posted by Derek Parnell
in reply to Walter

Derek Parnell

Posted in reply to Walter

On Tue, 17 Aug 2004 15:00:28 -0700, Walter wrote:

> "Arcane Jill" <Arcane_member@pathlink.com> wrote in message news:cfsm6d$va0$1@digitaldaemon.com...
>> Further, for reasons of internationalization, our printf replacement must
> be
>> able to random-access its variadic arguments.
> 
> I disagree with this requirement. It breaks the nice way that std.format works. The only place where reordering the arguments is useful is in date/time formatting, and a specialized formatter would be suitable for that (and there are many other nice things one can do with a specialized date/time formatter).

I think that AJ was suggesting that there exists a business need for a type of formatter that can express in its template, the order that arguments will appear in the resultant string, regardless of the order that they are presented to the formatter.

For example (contrived for simplicity):

   char[] Msg;

   if (gUserLang == LANG_english)
     temp = "%{1}s %{2}s %{3}s %{4}s %{5}s\n";
   else
     temp = "%{2}s %{1}s %{5}s %{4}s %{3}s\n";

   Msg = expand(temp, pSubjectDesc, pSubject, pVerb, pObjectDesc, pObject);
   writef(Msg);

-- 
Derek
Melbourne, Australia
18/Aug/04 10:31:55 AM

August 18, 2004

Re: Transcoding - Summary

Posted by Russ Lewis
in reply to Regan Heath

Russ Lewis

Posted in reply to Regan Heath

Regan Heath wrote:
> Did you miss the thread that mentioned that sentence structure in various languages differ?
> Example:
> 
>   english :- "The DOG is BIG"
>   other   :- ".. BIG .. DOG"
> 
> (I don't actually know any other languages)
> 
> So, it would be kind of useful to be able to define the format strings as:
> 
>   english :- "The $1 is $2"
>   other   :- ".. $2 .. $1"
> 
> and be able to go:
> 
>   printf(format[lang_id],"DOG","BIG");

This isn't strictly a requirement of the formatting tools.  Perhaps a library function which, given a number of varargs, reordered them and passed them to another function?

Your code could look (very roughly) like this:

    char[] formatString  = LookupNLSFormat (msgID, language);
    char[] reorderString = LookupNLSReorder(msgID, language);
    vwritef(formatString, doArgumentReorder(reorderString, <args>));

The advantage here is that you can do reordering for NLS support but writef stays simple.

August 18, 2004

Re: Transcoding - Summary

Posted by Regan Heath
in reply to Russ Lewis

Regan Heath

Posted in reply to Russ Lewis

On Tue, 17 Aug 2004 19:45:47 -0700, Russ Lewis <spamhole-2001-07-16@deming-os.org> wrote:
> Regan Heath wrote:
>> Did you miss the thread that mentioned that sentence structure in various languages differ?
>> Example:
>>
>>   english :- "The DOG is BIG"
>>   other   :- ".. BIG .. DOG"
>>
>> (I don't actually know any other languages)
>>
>> So, it would be kind of useful to be able to define the format strings as:
>>
>>   english :- "The $1 is $2"
>>   other   :- ".. $2 .. $1"
>>
>> and be able to go:
>>
>>   printf(format[lang_id],"DOG","BIG");
>
> This isn't strictly a requirement of the formatting tools.  Perhaps a library function which, given a number of varargs, reordered them and passed them to another function?
>
> Your code could look (very roughly) like this:
>
>      char[] formatString  = LookupNLSFormat (msgID, language);
>      char[] reorderString = LookupNLSReorder(msgID, language);
>      vwritef(formatString, doArgumentReorder(reorderString, <args>));
>
> The advantage here is that you can do reordering for NLS support but writef stays simple.

The disadvantage being that the above idea is harder to maintain, there are 2 things that define how the message is displayed, 2 things in which a mistake could be made, 2 things in which you have to make changes, ..

How hard or complex is it to implement a writef that can do:

  writef("The %1 is %2","dog","big");

(%1 and %2 can be changed to any symbol that fits with the current symbol set used in writef)

I can't see it being a particularly big leap from what it currently does.


Also consider:

  writef("A really long %1 that contains the same %1 several times. %1's like this could be quite common, yes?","string");

Regan

-- 
Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/

August 18, 2004

POSIX printf() (was Re: Transcoding - Summary)

Posted by Arcane Jill
in reply to Regan Heath

Arcane Jill

Posted in reply to Regan Heath

In article <opscwr50rl5a2sq9@digitalmars.com>, Regan Heath says...
>
>On Tue, 17 Aug 2004 19:45:47 -0700, Russ Lewis

>> This isn't strictly a requirement of the formatting tools.  Perhaps a library function which, given a number of varargs, reordered them and passed them to another function?
>>
>> Your code could look (very roughly) like this:
>>
>>      char[] formatString  = LookupNLSFormat (msgID, language);
>>      char[] reorderString = LookupNLSReorder(msgID, language);
>>      vwritef(formatString, doArgumentReorder(reorderString, <args>));
>>
>The disadvantage being that the above idea is harder to maintain, there are 2 things that define how the message is displayed, 2 things in which a mistake could be made, 2 things in which you have to make changes, ..
>
>How hard or complex is it to implement a writef that can do:
>
>   writef("The %1 is %2","dog","big");
>
>(%1 and %2 can be changed to any symbol that fits with the current symbol set used in writef)
>
>I can't see it being a particularly big leap from what it currently does.
>
>
>Also consider:
>
>   writef("A really long %1 that contains the same %1 several times. %1's
>like this could be quite common, yes?","string");


Well, I didn't mean to cause trouble here. :)

Anyway. I'm agreeing with Regan, and slightly disagreeing with Walter. There /is/ a need to be able do:

#    // English
#    article = "the";
#    adjective = "red";
#    noun = "house";
#    formatString = "%s %s %s"; // default order
#
#    // French
#    article = "la";
#    adjective = "rouge";
#    noun = "maison";
#    formatString = "%(1)s %(3)s %(2)s";
#
#    writef(formatString, article, adjective, noun);

Sorry, but that's a requirement. It's not an /urgent/ requirement, but you can bet vast sums of money that internationalization will start to become more and more of an issue once other transcoding issues have been dealt with.

Russ's idea is good, but obviously not /as/ good as simply coming up with an
improved printf() replacement. Right now, POSIX-printf() can do this
random-access, but D's writef() can't.

It's not urgent, and we'll solve it in time. But it /is/ an internationalization issue, and it won't go away.

Arcane Jill

August 18, 2004

Re: Transcoding - Summary

Posted by antiAlias
in reply to Arcane Jill

antiAlias

Posted in reply to Arcane Jill

Jill ~ I have a utf-8 transcoder that I'm using as a plaything within Mango; if you're interested, I'll send it on.


"Arcane Jill" <Arcane_member@pathlink.com> wrote in message news:cfsm6d$va0$1@digitaldaemon.com...
>
> We have two separate problems:
> (1) formatted I/O
> (2) unformatted I/O
>
> For unformatted I/O, we need the ability to read a sequence of dchars from
some
> source, and the ability to write a sequence of dchars to some sink. The
class
> which acts as a dchar source must perform decoding from some underlying
ubyte
> source. The class which acts as a dchar sink must perform encoding to some underlying ubyte sink.
>
> The source and sink could be anything - a string; a console; a file; a
socket; -
> even a simple counter which counts bytes and throws away data. So, to keep things generic, I shall use the terms "ubyte source", "ubyte sink", "dchar source" and "dchar sink". The traditional terms are:
>
> ubyte source = input stream
> ubyte sink = output stream
> dchar source = reader
> dchar sink = writer
>
> (I'm using new terms merely in order to avoid confusion with objects in std.stream, mango.io, and Java).
>
> For formatted I/O, we need:
> (1a) a replacement for printf() which emits a formatted sequence of dchars
to an
> arbitrary dchar sink
> (1b) a replacement for scanf() which parses a sequence of dchars obtained
from
> an arbitrary dchar source
>
> Further, for reasons of internationalization, our printf replacement must
be
> able to random-access its variadic arguments.
>
> Observe that if the output of (1a) is plumbed into an encoder, and the
input to
> (1b) is plumbed into a decoder, then formatted transcoding is achieved.
This
> makes our printf/scanf replacements relatively easy to write. They are
likely to
> require very little modification from the existing format()/unformat()
routines,
> with essentially the only difference being that they must be dchar-based,
not
> char-based. (Random-access of the arguments would be a new feature,
however,
> though not necessarily an urgent one).
>
> Another oft-voiced requirement is that transcoding be independent of any particular string/stream implementation. (I suspect that if Phobos streams
were
> fully-featured, fully-documented, bug-free and intuitive, then nobody
would be
> asking for this requirement. But as things are, the requirement is there).
>
> So ... listed below are the jobs which need to be done. Volunteers are
requested
> for any unclaimed jobs:
>
> (1) The source and sink interfaces need to be nailed down.
> (2) Given (1), dchar-based format()/unformat() replacements can be
written.
> (3) Given (1), encoder and decoder classes/interfaces can be written.
> (4) Given (3), classes can be written to attach our encoders/decoders to
std and
> mango streams, to strings, etc.
> (5) Given (3), encoders and decoders for SPECIFIC encodings can now be
written.
> (6) Will somebody /please/ document std.Stream?
>
> I volunteer for (1) and (3). I'm hoping Sean will volunteer for (2).
AntiAlias's
> excellent ideas for throughput enhancement using buffers are part of (1)
and
> (3), so I suggest AntiAlias and I send each other code back and forth
until we
> are both happy with it.
>
> Volunteers still needed for (4), (5) and (6) (though (4) and (5) are
dependent
> upon (3)). Anyone who's a dab hand at Wiki might like to volunteer for
(6).
>
> Arcane Jill
>
>

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation