new streams (page 2)

May 10, 2002

Re: new streams

Posted by Pavel Minayev
in reply to Andrew Feldstein

Permalink

Pavel Minayev

Posted in reply to Andrew Feldstein

Permalink

"Andrew Feldstein" <Andrew_member@pathlink.com> wrote in message news:abh3e0$3031$1@digitaldaemon.com...

> I agree that Russ's way is better, but it is still not ideal.  The user
should
> be able to set some sort of library flag to determine how to handle end of
lines
> *correctly* given the needs of the program.  This flag could control both writing as well as reading, knowing how to handle \n, for example.  For
example,
> under *nix, it is incorrect to treat CR as part of a newline, and under
MAC, I
> believe, the LF the same.  Of course, any implementation should should
default
> to the text model used by the underlying operating system and should
handle the
> oddball cases cleanly.  Of course reading and writing don't *have* to be
the
> same....

Under *nix, CR is a control character, and thus it is NOT supposed to be seen in ASCII-files - which readLine() is designed for. However, if it occasionally comes over a file made in Windows or Mac text editor, it will still be able to read it properly. The same is true for mac - text files SHOULDN'T contain LF. Stream's ability to handle it is an advantage, not a bug.

> Pavel, how would your new function read, say, a file containing nothing
but
> three <CR>'s followed by two <LF>'s?  Under various text models this could
be
> interpreted as any of 1, 2, 3, 4, or 5 blank lines.

It will treat is as CR, CR, CR+LF, LF - 4 lines.

"Burton Radons" <loth@users.sourceforge.net> wrote in message news:3f4oduspb6fuiseeg7a4a025c92pnnfl1e@4ax.com... > It's a word, but so is "Catholicity", and it's as appropriate as "Set". My dictionary gives 125 meanings for set. The only thing that could be related is in the context of "setting sun", which is quite the opposite. Hmm.. I always thought that "set" is a short form of "just set that #%$@ file pointer to whatever I say", but I could be wrong...

"Andrew Feldstein" <Andrew_member@pathlink.com> wrote in message news:abh3e0$3031$1@digitaldaemon.com... > I agree that Russ's way is better, but it is still not ideal. The user should > be able to set some sort of library flag to determine how to handle end of lines > *correctly* given the needs of the program. This flag could control both writing as well as reading, knowing how to handle \n, for example. For example, > under *nix, it is incorrect to treat CR as part of a newline, and under MAC, I > believe, the LF the same. Of course, any implementation should should default > to the text model used by the underlying operating system and should handle the > oddball cases cleanly. Of course reading and writing don't *have* to be the > same.... The problem is that files are transferred from machine, and a program cannot reliably know the source of it. > Pavel, how would your new function read, say, a file containing nothing but > three <CR>'s followed by two <LF>'s? Under various text models this could be > interpreted as any of 1, 2, 3, 4, or 5 blank lines. That would be CR,CR,CR,LF,LF, or 4 lines.

"Russ Lewis" <spamhole-2001-07-16@deming-os.org> wrote in message news:3CDBF812.D68ED4F6@deming-os.org... > This has caused me some HUGE headaches doing streaming on UNIX boxes. At least some of the tools do "lookahead", so they don't echo a line out until > you have printed 1 character AFTER the newline...in some cases, it has caused my programs to hang for minutes or hours (while, say, a long find command runs) until either another (unnecessary) line is printed, or the stream runs into EOF. One solution is to use isatty() and if it is a stream, not a file, timeout instead of blocking for the lookahead. I've used similar tricks when reading escape sequences from terminals.

"Pavel Minayev" <evilone@omen.ru> wrote in message news:abh2p1$2vfr$1@digitaldaemon.com... > "Burton Radons" <loth@users.sourceforge.net> wrote in message news:sjunduovt39c2tcntmkv6rp23cn8thmk9g@4ax.com... > > I think we should get the scanf and fmt format codes aligned. My method is "%s" for char[], "%S" for wchar[], "%+s" for char*, and "%+S" for wchar*. Different semantics for what looks like the same thing is bad city. > Agreed, but I think we should first have Walter to agree with this, so it'd become "official". Once it is, I will be happy to standartize streams appropriately. But that means I have to think about it <g>. In any case, I think it is just a matter of reviewing the C printf and scanf format strings, and coming up with something as equivalent as practical but still support the full D types. Note that D enables some cool things like a format specifier for Objects, too, which will cast the argument to an Object and call toString() on it.

Russ Lewis wrote: > Pavel Minayev wrote: > > > "Russ Lewis" <spamhole-2001-07-16@deming-os.org> wrote in message news:3CDBF812.D68ED4F6@deming-os.org... > > > > > IMHO, you should immediately interpret CR as a newline, but put a marker > > on > > > the stream such that if another character is read and that character is a LF, then it will be consumed LATER. DON'T lookahead for it :( > > > > I do a lookahead, but I have ungetc() implemented and working... > > Ungetc doesn't help the problem I was talking about. If you do lookahead but there is not a character available, then your library will block until one more character is available to read (or you detect EOF)...which could be a LONG time from now. On serial device drivers I've written, and on at least one of the many RTOS systems I've used, we had peekc() and/or lookc() calls that would, without side-effects, look at the next character in the device driver's buffer, and if that buffer was empty, the call would wait a single character time and sneak a nondestructive look at the uart buffer (a tricky thing to do on some uarts). I have no idea if Windows has similar capabilities. -BobC

"Walter" <walter@digitalmars.com> wrote in message news:abhjha$c6h$1@digitaldaemon.com... > But that means I have to think about it <g>. In any case, I think it is just > a matter of reviewing the C printf and scanf format strings, and coming up with something as equivalent as practical but still support the full D Yes, exactly. But you don't want anarchy here, do you?

Hi, "Burton Radons" <loth@users.sourceforge.net> wrote in message news:sjunduovt39c2tcntmkv6rp23cn8thmk9g@4ax.com... > > Since this format is our own (that is to say, there's no standard for counted strings -- some are 32-bit, some are 16-bit, some are 8-bit, with varying rules on NUL termination and alignment), we may as well use dynamic-sized integers for this. For each byte we take the first seven bits and read another byte if the eighth bit is set, like: This is very much like the ASN.1/DER encoding of lengths, but not exactly. We might consider that encoding, see: ftp://ftp.rsasecurity.com/pub/pkcs/ascii/layman.asc > When writing uint you'll usually get three or two bytes savings, which really adds up when writing meshes, and you have your future covered, and it's endian neutral. These are good properties. ASN.1/DER also specifies how to encode type information used to destinguish between ASCII and UNICODE strings - that might be usable too. Regards, Martin M. Pedersen

"Pavel Minayev" <evilone@omen.ru> wrote in message news:abi475$qe3$1@digitaldaemon.com... > "Walter" <walter@digitalmars.com> wrote in message news:abhjha$c6h$1@digitaldaemon.com... > > But that means I have to think about it <g>. In any case, I think it is > just > > a matter of reviewing the C printf and scanf format strings, and coming up > > with something as equivalent as practical but still support the full D > Yes, exactly. But you don't want anarchy here, do you? No, but it's a matter of getting spread too thin making it hard to give each issue the attention it needs. I'm currently trying to finish another project (and get paid for it) so I can spend more time on D.

"Pavel Minayev" <evilone@omen.ru> wrote in message news:abha3v$4f7$1@digitaldaemon.com... > "Burton Radons" <loth@users.sourceforge.net> wrote in message news:3f4oduspb6fuiseeg7a4a025c92pnnfl1e@4ax.com... > > > It's a word, but so is "Catholicity", and it's as appropriate as "Set". My dictionary gives 125 meanings for set. The only thing that could be related is in the context of "setting sun", which is quite the opposite. > > Hmm.. I always thought that "set" is a short form of "just set that #%$@ file pointer to whatever I say", but I could be wrong... > LOL :) -- Stijn OddesE_XYZ@hotmail.com http://OddesE.cjb.net _________________________________________________ Remove _XYZ from my address when replying by mail

Forums