View mode: basic / threaded / horizontal-split · Log in · Help
September 24, 2004
readf/unformat 1.4 released
For those of you who don't know, readf began as an attempt at a full
C99-compliant scanf implementation in D.  It's since been renamed to match the
Phobos writef/format functions a bit more closely, and this version attempts to
bring usage a bit closer to readf.  What's new in this version:

- Support for negative zero and negative infinity (previous version ignored sign
in these cases).  This version also does not allow the optional sign to appear
before "NAN" as IMO it's meaningless.  So "+NAN" and "-NAN" will both cause an
error.  If you don't like this, please let me know.  It is contrary to the C99
spec.

- Format strings are no longer necessary.  Default formats are:
%s: char arrays
%c: char pointers
%i: integer/bit
%f: floating point

unFormat still will not throw an exception on parameter mismatch, but will
return immediately instead.  This is the only interface issue I know of where
this package diverges from doFormat/writef.  The format string parsing is still
fully scanf-compliant, so there are some redundant format specifiers.  Check the
C99 spec or the included text file to get an idea of what specifiers do what.

By the way, the code currently assumes input data to be in UTF-8 or UTF-16
(native) format as it uses the Phobos toUTFXX functions for conversion.  The
package includes a custom utf.d that allows delegates and consists of two
implementation files: unformat.d and stdio.d.  The format string and all
incoming data are converted to UTF-32 before evaluation to facilitate
comparison.  As usual, please let me know what you think.  The file is here:

http://home.f4.ca/sean/d/stdio.zip


Sean
September 24, 2004
Re: readf/unformat 1.4 released
"Sean Kelly" <sean@f4.ca> wrote in message
news:cj1of5$cjp$1@digitaldaemon.com...
> For those of you who don't know, readf began as an attempt at a full
> C99-compliant scanf implementation in D.  It's since been renamed to match
the
> Phobos writef/format functions a bit more closely, and this version
attempts to
> bring usage a bit closer to readf.  What's new in this version:
>
> - Support for negative zero and negative infinity (previous version
ignored sign
> in these cases).  This version also does not allow the optional sign to
appear
> before "NAN" as IMO it's meaningless.  So "+NAN" and "-NAN" will both
cause an
> error.  If you don't like this, please let me know.  It is contrary to the
C99
> spec.
>
> - Format strings are no longer necessary.  Default formats are:
> %s: char arrays
> %c: char pointers
> %i: integer/bit
> %f: floating point
>
> unFormat still will not throw an exception on parameter mismatch, but will
> return immediately instead.  This is the only interface issue I know of
where
> this package diverges from doFormat/writef.  The format string parsing is
still
> fully scanf-compliant, so there are some redundant format specifiers.
Check the
> C99 spec or the included text file to get an idea of what specifiers do
what.
>
> By the way, the code currently assumes input data to be in UTF-8 or UTF-16
> (native) format as it uses the Phobos toUTFXX functions for conversion.
The
> package includes a custom utf.d that allows delegates and consists of two
> implementation files: unformat.d and stdio.d.  The format string and all
> incoming data are converted to UTF-32 before evaluation to facilitate
> comparison.  As usual, please let me know what you think.  The file is
here:
>
> http://home.f4.ca/sean/d/stdio.zip
>
>
> Sean
>
>

How does unFormat take advantage of D's _arguments feature (if it does)? I'm
not quite sure why the parsing code needs to think about %s or %i or
whatever since it can look at the type of the target variable. If it sees
int* then it parses an int and if it sees char[]* it parses a string. The
only role of the format would be to specify where to parse and where to
match literal characters.

-Ben
September 24, 2004
Re: readf/unformat 1.4 released
In article <cj1r40$e0k$1@digitaldaemon.com>, Ben Hinkle says...
>
>How does unFormat take advantage of D's _arguments feature (if it does)? I'm
>not quite sure why the parsing code needs to think about %s or %i or
>whatever since it can look at the type of the target variable. If it sees
>int* then it parses an int and if it sees char[]* it parses a string. The
>only role of the format would be to specify where to parse and where to
>match literal characters.

Format strings can also specify how to parse the incoming data.  Integers, for
example, have a bunch of different format specifiers for different types of
input.  I chose "%i" as the default, since it's the most flexible, but "%d"
specifies decimal numbers only, "%o" is octal, you can include width specifiers,
etc.  I also may have forgotten to allow a bit to be parsed as a string (%s) to
convert "true" and "false" to 1 and 0, respectively.

For a contrived example:

# int i, r;
# char[] s;
#
# r = sreadf( "0x1 hello", &i, &s );
# assert( r == 2 && i == 1 && s == "hello" );
#
# i = i.init; s = s.init;
# r = sreadf( "0x1 hello", "%d%*s", &i, &s );
# assert( r == 1 && i == 0 && s == "hello" );

In the second case, 0x1 is expected to be a decimal number so the "x" is
interpreted as non-numeric.  The "%*s" indicates that a string should be read
but assignment should be suppressed (which throws out the "x1"), and the final
string is read as normal because the format string has been exhausted.

So in many cases there's no need to use format specifiers.  The code still uses
them internally even when one isn't supplied because it simplifies things, but
this is all invisible to the programmer.

unFormat takes advantage of the _arguments collection by using it to determine
what type is being written to (it will return if you try to read a string into
an integer, for example--writef would throw a FormatError in this situation),
and to determine what to expect if no format string is supplied.

You can also do stuff like this:

# int i;
# char c;
# char[] s, t;
#
# sreadf( "0x1 hello", &i, "%2s", &s, "%2c", &t, &c );
# assert( i == 1 && s == "he" && t == "ll" && c == 'o' );

So there's no restriction on the number or the location of format strings.  All
arguments are evaluated left to right.


Sean
September 24, 2004
Re: readf/unformat 1.4 released
"Sean Kelly" <sean@f4.ca> wrote in message
news:cj1thq$fd3$1@digitaldaemon.com...
> In article <cj1r40$e0k$1@digitaldaemon.com>, Ben Hinkle says...
> >
> >How does unFormat take advantage of D's _arguments feature (if it does)?
I'm
> >not quite sure why the parsing code needs to think about %s or %i or
> >whatever since it can look at the type of the target variable. If it sees
> >int* then it parses an int and if it sees char[]* it parses a string. The
> >only role of the format would be to specify where to parse and where to
> >match literal characters.
>
> Format strings can also specify how to parse the incoming data.  Integers,
for
> example, have a bunch of different format specifiers for different types
of
> input.  I chose "%i" as the default, since it's the most flexible, but
"%d"
> specifies decimal numbers only, "%o" is octal, you can include width
specifiers,
> etc.  I also may have forgotten to allow a bit to be parsed as a string
(%s) to
> convert "true" and "false" to 1 and 0, respectively.
>
> For a contrived example:
>
> # int i, r;
> # char[] s;
> #
> # r = sreadf( "0x1 hello", &i, &s );
> # assert( r == 2 && i == 1 && s == "hello" );
> #
> # i = i.init; s = s.init;
> # r = sreadf( "0x1 hello", "%d%*s", &i, &s );
> # assert( r == 1 && i == 0 && s == "hello" );
>
> In the second case, 0x1 is expected to be a decimal number so the "x" is
> interpreted as non-numeric.  The "%*s" indicates that a string should be
read
> but assignment should be suppressed (which throws out the "x1"), and the
final
> string is read as normal because the format string has been exhausted.
>
> So in many cases there's no need to use format specifiers.  The code still
uses
> them internally even when one isn't supplied because it simplifies things,
but
> this is all invisible to the programmer.
>
> unFormat takes advantage of the _arguments collection by using it to
determine
> what type is being written to (it will return if you try to read a string
into
> an integer, for example--writef would throw a FormatError in this
situation),
> and to determine what to expect if no format string is supplied.
>
> You can also do stuff like this:
>
> # int i;
> # char c;
> # char[] s, t;
> #
> # sreadf( "0x1 hello", &i, "%2s", &s, "%2c", &t, &c );
> # assert( i == 1 && s == "he" && t == "ll" && c == 'o' );
>
> So there's no restriction on the number or the location of format strings.
All
> arguments are evaluated left to right.
>
>
> Sean
>
>

cool! so a C scanf call

scanf("%d %d",&i,&j)

can be any of

readf("%d %d",&i,&i)
readf("%d",&i,"%d",&j)
readf(&i,&j);

assuming i and j are ints. very nifty.
September 24, 2004
Re: readf/unformat 1.4 released
In article <cj219m$h14$1@digitaldaemon.com>, Ben Hinkle says...
>
>cool! so a C scanf call
>
> scanf("%d %d",&i,&j)
>
>can be any of
>
> readf("%d %d",&i,&i)
> readf("%d",&i,"%d",&j)
> readf(&i,&j);
>
>assuming i and j are ints. very nifty.

Yup.  I didn't think about it until just now, but this may come in handy for
internationalization, since the whole "%1" concept that's been talked about can
be faked just by reordering parameters.


Sean
March 17, 2005
Re: readf/unformat 1.4 released (updated)
Sean Kelly wrote: (back in 2004-09-24, that was)

> For those of you who don't know, readf began as an attempt at a full
> C99-compliant scanf implementation in D.  It's since been renamed to match the
> Phobos writef/format functions a bit more closely, and this version attempts to
> bring usage a bit closer to readf. 
[...]
> unFormat still will not throw an exception on parameter mismatch, but will
> return immediately instead.  This is the only interface issue I know of where
> this package diverges from doFormat/writef.  

I changed this package to break it into
std.stdio.readf and std.string.unformat...

I also made it throw Exceptions on % FormatError
and parameter mismatch e.g. not passing pointers


The missing TypeInfo for pointers, and the fact that you cannot
pass _arguments and _argptr around with GDC without losing info
makes it a horrible kludge at the moment - but it does work! ;-)
(currently unformat only works from within std.string, though...)


Sean Kelly's old declaration of unFormat was:

> int unFormat( bit delegate( out dchar ) getc,
>               bit delegate( dchar ) ungetc,
>               TypeInfo[] arguments,
>               void* argptr )

This was changed to use EOF and lose the "bit":

> void unFormat( dchar delegate() getc, dchar delegate(dchar) ungetc,
>               TypeInfo[] arguments, va_list argptr,
>               Mangle[] mangle = null, Mangle[] mangle2 = null)

(last two parameters being part of the GDC kludge,
you should already know "va_list" from std.stdarg:)

> version (GNU) {
>     // va_list might be a pointer, but assuming so is not portable.
>     private import gcc.builtins;
>     alias __builtin_va_list va_list;
> } else {
>     alias void* va_list;
> }


You only provide two delegates: getc and ungetc,
which are very similar to their C counterparts...
(making the wrappers for fgetc and ungetc simple)

    dchar getc();
    dchar ungetc(dchar c);

Should an EOF occur, the "new" versions now returns a
cast(dchar) std.c.stdio.EOF, or: 0xFFFFFFFF as UTF-32.
(which is not a valid code point, and thus "safe" here)

Otherwise, the internals work more or less as before
(except that it doesn't internalize the exceptions...)


Here is the "ideal" version of std.string.unformat,
ignoring the current GDC Mangling preprocessing hacks:

> void unformat(char[] s, ...)
> {
>     size_t idx = 0, old_idx;
> 	
>     dchar getc()
>     {
>     	old_idx = idx;
>     	if (idx >= s.length)
>     		return cast(dchar) EOF;
>     	return std.utf.decode(s, idx);
>     }
> 
>     dchar ungetc(dchar c)
>     {
>     	idx = old_idx;
>     	return c;
>     }
>
>     std.unformat.unFormat(&getc, &ungetc, _arguments, _argptr);
> }

You can use this as: (very similar to "format")

int i, j;
unformat("1 2", "%d %d",&i,&i)
unformat("1 2", "%d",&i,"%d",&j)
unformat("1 2", &i,&j);
assert(i == 1 && j == 2);

Since i and j are int's, it'll default to "%d".


Then there is the readf function, which also works as expected:

> import std.stdio;
> 
> void main()
> {
>   char[] s;
>   write("What's is your name: ");
>   readf("%s", &s);
>   writefln("Hello, %s!", s);
> }

Which inputs/outputs something like:
    What's is your name: Anders
    Hello, Anders!
(yes, this is the actual D program)

Note that if you pass "s" instead of "&s", an Error will be thrown...
(this should stop the usual scanf bugs, with forgetting to &-prefix ?)

The program also uses the formatless version of writef called "write",
which doesn't treat '%' characters special but just prints them out...


Once the new version of GDC is out, I will try to see if the TypeInfo
passing can't be fixed for that compiler too and then post some code.

>  * Copyright (C) 2004 by Sean Kelly
>  * Copyright (C) 2005 by Anders F Bjoerklund
>  *
>  * Permission to use, copy, modify, distribute and sell this software
>  * and its documentation for any purpose is hereby granted without fee,
>  * provided that the above copyright notice appear in all copies and
>  * that both that copyright notice and this permission notice appear
>  * in supporting documentation.  Author makes no representations about
>  * the suitability of this software for any purpose. It is provided
>  * "as is" without express or implied warranty.

Original file came from: http://home.f4.ca/sean/d/stdio.zip
(had the Open Source license agreement being duplicated above)


Thanks to Sean for doing the grunt-work with format parsing,
so we (still) don't have to rename it to "std.stdo" anymore :-)
--anders


PS. Yes, it uses pointers. Kris has already written C++-style
    bitshift-operator overloads for people who want that... ?

http://svn.dsource.org/svn/projects/mango/trunk/doc/html/classAbstractReader.html
http://svn.dsource.org/svn/projects/mango/trunk/doc/html/classAbstractWriter.html

    When (and if) D supports "out" arguments for variadic lists,
    the code can be changed to support those "out" vars instead.
    Although it would then also need some kind of R/O attribute to
    able to differentiate between format strings and string params?
    Meanwhile, the pointers work just fine (and it checks the types!)
March 17, 2005
Re: readf/unformat 1.4 released (updated)
In article <d1bhnv$ek6$1@digitaldaemon.com>,
=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= says...
>
>Sean Kelly wrote: (back in 2004-09-24, that was)
>
>> For those of you who don't know, readf began as an attempt at a full
>> C99-compliant scanf implementation in D.  It's since been renamed to match the
>> Phobos writef/format functions a bit more closely, and this version attempts to
>> bring usage a bit closer to readf. 
>[...]
>> unFormat still will not throw an exception on parameter mismatch, but will
>> return immediately instead.  This is the only interface issue I know of where
>> this package diverges from doFormat/writef.  
>
>I changed this package to break it into
>std.stdio.readf and std.string.unformat...

Nice!  Is this version available online?


Sean
March 17, 2005
Re: readf/unformat 1.4 released (updated)
Sean Kelly wrote:

>>I changed this package to break it into
>>std.stdio.readf and std.string.unformat...
> 
> Nice!  Is this version available online?

Not yet, have to clean it up and backport
it back into the DMD release again...

(currently it's done to a tweaked GDC,
you see) And it *really* wants TypeInfo?


Main reason was to get opinions on:
1) changed getc delegate definitions
2) throwing on exceptions on errors

Also, I still need to write the
multibyte wrappers of getc/ungetc
for file based streams (regular, as
well as the wide orientation kind)

--anders
March 17, 2005
Re: readf/unformat 1.4 released (updated)
In article <d1co48$1nlr$1@digitaldaemon.com>,
=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= says...
>
>Sean Kelly wrote:
>
>>>I changed this package to break it into
>>>std.stdio.readf and std.string.unformat...
>> 
>> Nice!  Is this version available online?
>
>Not yet, have to clean it up and backport
>it back into the DMD release again...
>
>(currently it's done to a tweaked GDC,
>you see) And it *really* wants TypeInfo?

I'll have to re-evaluate TypeInfo in DMD.  I don't suppose it's working yet for
pointer types?  And why in the world doesn't GDC properly generate copies of the
TypeInfo array?

>Main reason was to get opinions on:
>1) changed getc delegate definitions

I mostly created the getc/ungetc specs as they were because they were easier to
embed in boolean expressions.  ie.

if( !getc( ch ) ) return;

is easier to write than:

if( ( ch = getc() ) == WEOF ) return;

I figured the C function would need to be wrapped either way, so this seemed a
decent gain.  Especially since I think the reason the C routines are written the
way they are is because C lacks an output qualifier.  But it's mostly a cosmetic
issue, so it doesn't matter much to me either way.

>2) throwing on exceptions on errors

I mostly didn't do this with my version of the functions because I thought it
made sense that unFormat should throw the same exceptions as format, but doing
so created a dependency I wasn't happy with for an add-on library.  If this
stuff made it into Phobos I fully support the idea of consistency between the
functions.  This fix should be pretty easy anyway, as it just amounts to putting
a "throw" in the necessary catch blocks at the bottom of the unFormat
implementation (unFormat uses exceptions internally for flow control).

>Also, I still need to write the multibyte wrappers of getc/ungetc
>for file based streams (regular, as well as the wide orientation kind)

My release used the same wrappers for file i/o as it used for file i/o.  Are
these functions not available in Linux?


Sean
March 17, 2005
Re: readf/unformat 1.4 released (updated)
Sean Kelly wrote:

> I'll have to re-evaluate TypeInfo in DMD.  I don't suppose it's working yet for
> pointer types? 

Nope, they are all of the "TypeInfo" base class... :-(

> And why in the world doesn't GDC properly generate copies of the
> TypeInfo array

Maybe I explained myself badly. You can of course pass
_arguments and _argptr off to subroutines. It is just
that "arguments[i] is typeid(int*)" will no longer work...

The identity is lost, when doing the workaround like that.

It still works, if they are done against the original
_arguments and in the same module (I'm a little shady
on the details why that is so, just *that* it is so...)

If the TypeInfo/typeid was working, all would be cool.

> I figured the C function would need to be wrapped either way, so this seemed a
> decent gain.  Especially since I think the reason the C routines are written the
> way they are is because C lacks an output qualifier.  But it's mostly a cosmetic
> issue, so it doesn't matter much to me either way.

The read/write functions in std.stream work like you describe,
with out parameters (they throw Exceptions on EOF, instead
of return a bit, but that's just a matter of preference...)

Just thought that "dchar getc()" was a better match for
"putc(dchar)", and that there seemed to be a lot of
checking for eof spread out in the code ? That's all.

> I mostly didn't do this with my version of the functions because I thought it
> made sense that unFormat should throw the same exceptions as format, but doing
> so created a dependency I wasn't happy with for an add-on library. 

Yeah, it did meant hacking a few things in std.format...

And there are some *nasty* circular dependencies going on, and
double if not trouble defines of things like "stdin" and "va_list"

Had to resort to e.g. "alias std.stdarg.va_list va_list;"

> If this
> stuff made it into Phobos I fully support the idea of consistency between the
> functions.  This fix should be pretty easy anyway, as it just amounts to putting
> a "throw" in the necessary catch blocks at the bottom of the unFormat
> implementation (unFormat uses exceptions internally for flow control).

I did leave the overflow checks in, but most should be passed further ?

>>Also, I still need to write the multibyte wrappers of getc/ungetc
>>for file based streams (regular, as well as the wide orientation kind)
> 
> My release used the same wrappers for file i/o as it used for file i/o.  Are
> these functions not available in Linux?

Yes, I just didn't loop over the bytes to reassemble UTF-32 (yet)


Either way, readf and unformat *definitely* have a place in
a future release of Phobos - next to writef and format...

Just need to get the TypeInfo stuff completed first ?
(the major part being adding ti for pointer types...)

--anders
Top | Discussion index | About this forum | D home