Thread overview
unFormat marginally complete
Jul 30, 2004
Sean Kelly
Aug 05, 2004
Sean Kelly
Aug 05, 2004
pragma
Aug 05, 2004
Sean Kelly
Aug 05, 2004
Arcane Jill
Aug 06, 2004
Sean Kelly
July 30, 2004
http://home.f4.ca/sean/d/unformat.d

The D compiler is currently a bit weird with templates and stdarg so to use unformat.d in 0.97 you have to compile in std.format.d as well.  If anyone feels inclined to play with it, please let me know if sutff is broken, you'd like the exceptions to match doFormat, etc.


Prototypes:

int unFormat( bit delegate( out dchar ) getc,
              bit delegate( dchar ) ungetc,
              TypeInfo[] arguments,
              void* argptr );
int sreadf( ... ); // first va_arg is string, second is format
int freadf( FILE* buf, ... ); // first va_arg is format
int readf( ... ); // first va_arg is format (console input)


Ways in which unFormat differs from vscanf (and possibly doFormat):

- The format string can be either UTF-8, UTF-16, or UTF-32.
- If there is a mismatch between the arguments and the format specification, the function will return and will not evaluate the rest of the format string.
- unFormat will return prematurely on an input failure (if get returns false), an argument mismatch, or a UTF conversion error.  UtfError exceptions will not be passed out of the function.


For reference, the conversion specifiers are:

d, u: An optionally signed decimal integer.
i: An optionally signed integer.  Base can be decimal, hex, or octal and
   will be detected automatically.  If the input is preceded by 0x or 0X
   then the number will be interpreted as hex.  If the input is preceded
   only by 0 then the number will be interpreted as octal.  Any other
   value will be interpreted as decimal.
o: An optionally signed octal integer.
x, X: An optionally signed hex integer.
a, e, f, g
A, E, F, G: An optionally signed floating point number, infinity,
            or NaN.
  Examples:   1
              -5.6
              1.2e5
              0x3p-2
              0X1234
              NAN
              INF
              infinity
c: A single UTF-32 character, or sequence of characters if the width
   modifier is present.
s: A sequence of non-whitespace characters.
[: Defines a scanset.  Contents can be single characters or a range
   indicated by a hyphen.
   Examples:   [a-z]    indicates the set of numeric values between a
                        and z, inclusive.
               [abc123] indicates the characters a, b, c, 1, 2, and 3.
p: A pointer in hex format without the leading 0x.
n: Returns the number of UTF-32 characters read from the input stream.
%: Matches a single % character.
August 05, 2004
Sean Kelly wrote:
> http://home.f4.ca/sean/d/unformat.d

I just realized I'd misread a part of the scanf spec.  I've fixed the code and re-uploaded it with another unit test.


Sean
August 05, 2004
In article <ces7ck$s6f$1@digitaldaemon.com>, Sean Kelly says...
>
>Sean Kelly wrote:
>> http://home.f4.ca/sean/d/unformat.d
>
>I just realized I'd misread a part of the scanf spec.  I've fixed the code and re-uploaded it with another unit test.

Looks pretty useful.  I like it.  I haven't had a chance to run with it myself, so I'll have to ask: do you have any provisions for reading or handling whitespace?

One critique though: why check all your exception instances (Underflow, BadFmt, etc) for each call of unFormat?  You can set all these up ahead of time in a static block outside your function, without breaking encapsulation too badly.

# private class Underflow: Exception{ this(){ super("Underflow"); }}
# private static Underflow    underflow;
# static this(){
#     underflow = new Underflow();
# }

That way you can prevent redundant allocations (which you've already done) plus
eliminate all those extra "if" statements. :)

- Pragma


August 05, 2004
In article <cetftt$1nd2$1@digitaldaemon.com>, pragma <EricAnderton at yahoo dot com> says...
>
>In article <ces7ck$s6f$1@digitaldaemon.com>, Sean Kelly says...
>>
>>Sean Kelly wrote:
>>> http://home.f4.ca/sean/d/unformat.d
>>
>>I just realized I'd misread a part of the scanf spec.  I've fixed the code and re-uploaded it with another unit test.
>
>Looks pretty useful.  I like it.  I haven't had a chance to run with it myself, so I'll have to ask: do you have any provisions for reading or handling whitespace?

Everything is done internally in terms of dchars, so hopefully the functions will be able to correctly recognize all whitespace chars.  I know there may also be some locale dependent whitespace sequences (Jill?) but as D doesn't have any concept of locales yet, that will have to wait.

>One critique though: why check all your exception instances (Underflow, BadFmt, etc) for each call of unFormat?  You can set all these up ahead of time in a static block outside your function, without breaking encapsulation too badly.
>
># private class Underflow: Exception{ this(){ super("Underflow"); }}
># private static Underflow    underflow;
># static this(){
>#     underflow = new Underflow();
># }
>
>That way you can prevent redundant allocations (which you've already done) plus
>eliminate all those extra "if" statements. :)

Good point.  I think I'm still in a C++ mindset as far as statics are concerned. I'll make this change today :)


Sean


August 05, 2004
In article <ceti16$1oa9$1@digitaldaemon.com>, Sean Kelly says...

>Everything is done internally in terms of dchars, so hopefully the functions will be able to correctly recognize all whitespace chars.  I know there may also be some locale dependent whitespace sequences (Jill?)

Nope, whitespace is locale independent. You only have to import etc.unicode.unicode and call isWhitespace(dchar). But I'd suggest waiting until next week because I'm planning to finally get the linkable library + header files together this weekend, which will make things somewhat easier for you.


>but as D doesn't have any
>concept of locales yet, that will have to wait.

It will have soon, but as I said, it's not relevant to whitespace.

Arcane Jill


August 06, 2004
In article <cetftt$1nd2$1@digitaldaemon.com>, pragma <EricAnderton at yahoo dot com> says...
>
>In article <ces7ck$s6f$1@digitaldaemon.com>, Sean Kelly says...
>>
>>Sean Kelly wrote:
>>> http://home.f4.ca/sean/d/unformat.d
>>
>>I just realized I'd misread a part of the scanf spec.  I've fixed the code and re-uploaded it with another unit test.
>
>Looks pretty useful.  I like it.  I haven't had a chance to run with it myself, so I'll have to ask: do you have any provisions for reading or handling whitespace?

By the way.  I like that doFormat doesn't require a format string at all.  Since I was working off the scanf spec I didn't do anything about that with unFormat. I assume that doFormat can handle things like this:

doFormat( &get, "hello world", 1, "%d", 2 );

and would print:

hello world12

I suppose the equivalent bit for unFormat would be:

char[] buf;
int x, y;
float f;
unFormat( &get, &unget, &buf, &x, "%2d", &y, &f );

which would read a string, an integer, an int with width 2, and a float.  The only thing I don't know offhand is if I can tell a char** from a char* using TypeInfo (for %p).  In any case, would people like this syntax rather than having to specify a format string?  I think I may start on it today just to see how it goes.


Sean