May 25, 2005
"Vathix" <vathix@dprogramming.com> wrote in message news:op.srb9qssxkcck4r@esi...
> What about having 2 different streams: binary and text.

That's essentially what getc() and readLine() do. They treat the stream as a text stream and look at the unget buffer etc. The read() functions directly ask the OS for data and ignore the unget buffer.

> Binary one will work as it does now where eof() just checks the file pointer.
>
> Text one will use the unget buffer. If the unget buffer contains a character, it is not eof; otherwise it tries to read one into it.

The problem is that, generally speaking, the stream doesn't know it has hit eof until it tries to read past the end and fails. So when you say "otherwise it tried to read one into it" one has to say what happens if that fails. Currently it throws. One can argue it should return a special "eof" character and then set the stream eof flag so that future calls to eof() will indicate that eof has been reached.


May 25, 2005
>> Text one will use the unget buffer. If the unget buffer contains a
>> character, it is not eof; otherwise it tries to read one into it.
>
> The problem is that, generally speaking, the stream doesn't know it has hit
> eof until it tries to read past the end and fails. So when you say
> "otherwise it tried to read one into it" one has to say what happens if that
> fails. Currently it throws. One can argue it should return a special "eof"
> character and then set the stream eof flag so that future calls to eof()
> will indicate that eof has been reached.

That's why eof() would try to read into unget and if it fails, it's eof; otherwise it has a char stored for the next getc(). But this won't work right now since the different size chars use different unget buffers. If they shared an unget buffer that is just an array of bytes, you could, for example, unget a wchar and get 2 char`s from it.

Removing the unget buffer from a binary stream is also desirable since it's not wise to use ungetc and readBlock on the same stream.
May 25, 2005
"Vathix" <vathix@dprogramming.com> wrote in message news:op.srccyakakcck4r@esi...
>>> Text one will use the unget buffer. If the unget buffer contains a character, it is not eof; otherwise it tries to read one into it.
>>
>> The problem is that, generally speaking, the stream doesn't know it has
>> hit
>> eof until it tries to read past the end and fails. So when you say
>> "otherwise it tried to read one into it" one has to say what happens if
>> that
>> fails. Currently it throws. One can argue it should return a special
>> "eof"
>> character and then set the stream eof flag so that future calls to eof()
>> will indicate that eof has been reached.
>
> That's why eof() would try to read into unget and if it fails, it's eof; otherwise it has a char stored for the next getc(). But this won't work right now since the different size chars use different unget buffers. If they shared an unget buffer that is just an array of bytes, you could, for example, unget a wchar and get 2 char`s from it.
>
> Removing the unget buffer from a binary stream is also desirable since it's not wise to use ungetc and readBlock on the same stream.

I understand you now. I misunderstood that eof() would block. Would that be
a problem with something like
import std.stream;
int main() {
  while (!stdin.eof()) {
    stdout.writefln("type a line, please");
    char[] line = stdin.readLine();
    stdout.writefln("you typed: %s",line);
  }
  return 0;
}

type a line, please
hello
you typed: hello
type a line, please
there
you typed: there
type a line, please
^Z
you typed:

If stdin.eof() blocked waiting for input then the writefln inside the loop wouldn't get run until after the user has typed a line and hit enter.


May 25, 2005
Note another option is instead of

>  void read(out char x) { readExact(&x, x.sizeof); }
> would become something like
>  void read(out char x) { if (readBlock(&x, x.sizeof) == 0) x = EOF; }

to keep read(out char x) the same and only redo getc and getcw to not call
read(ch) directly. So getc() would look something like
  char getc() {
    if (<unget buffer non-empty>)
      return next-char-from-unget-buffer
    else {
      char ch;
      readBlock(&ch,1); // default ch is char.init which is 0xFF
      return ch;
    }
  }
That way readLine and other user code wouldn't have to try/catch getc
failures but would look for char.init instead.


May 26, 2005
> I understand you now. I misunderstood that eof() would block.

sorry, I wasn't thinking
May 26, 2005
Ben Hinkle wrote:
<snip>
> But for the situation of the original post (reading stdin) the OS doesn't tell us eof has happened until you try to read and it fails. So in other words for stdin "eof" means "did the last read attempt try to read past eof".

Actually it doesn't _mean_ this, it gives this as a possible alternative behaviour for situations where EOF can't be determined directly.

But you have a point there.  Another thing for me to consider when I get round to writing text I/O classes....

<snip>
>>>The non-char reads will throw (unexpected eof). Only trying to read char (or I suppose wchar or dchar) will return EOF (expected eof).  The idea is that in a binary file reaching eof in a read is unexpected while reaching eof in a text file is expected.
>>
>>That doesn't follow either.  For example, suppose you're writing a utility that manipulates binary files in general.  E.g. a hex editor or a file compression utility.  At no point while reading the file can you just expect that there is or isn't more.
> 
> I don't get you. What do you mean by "follow"?

Basically that claiming any difference between binary and text files in EOF handling doesn't derive from any consistent logic.

> I'm not trying to chain a sequence of statements into a proof or something. I'm stating that from a practical point of view binary files should throw if a read is incomplete and text files should return EOF. I don't understand what you are arguing read() do for different situations.

If there's data left to read, read it.
If there isn't data left to read, throw an exception.

<snip>
> The semantic content of the text file (eg a D source file) is independent of std.stream. You say some D source code can't end in the middle of a comment. I think such a file would be a semantically incorrect source file but there's no way std.stream can determine that. I could see if someone write a subclass of stream that knows about comments and throws on eof in a comment then that's fine with me. I don't see why that conflicts with returning EOF from getc.

Nobody said anything about std.stream knowing about comments.  Just think about it.  Just look at this natural way of skipping over a comment (once it's established that we're in a comment:

    char[] nextChars;
    while((nextChars = file.readString(2)) != "*/") {
        file.ungetc(nextChars[1]); // *
    }

* OK, so under getc, "This is the only method that will handle ungetc properly."  But you get the idea.

It doesn't check for EOF, because this isn't part of the normal program logic.  Instead, it relies on exception handling to catch an input file malformed in this respect, just as we might use exception handling to catch file not found and other file access errors.

And so we shouldn't be surprised to see this technique in use. Especially in quick and dirty programs, which are a significant part of the motivation for exceptions.

>> So really there is no correlation.
> 
> So are you arguing for throwing in getc or not throwing? 

Throwing.

Stewart.

-- 
My e-mail is valid but not my primary mailbox.  Please keep replies on the 'group where everyone may benefit.
May 26, 2005
>>>>The non-char reads will throw (unexpected eof). Only trying to read char (or I suppose wchar or dchar) will return EOF (expected eof).  The idea is that in a binary file reaching eof in a read is unexpected while reaching eof in a text file is expected.
>>>
>>>That doesn't follow either.  For example, suppose you're writing a utility that manipulates binary files in general.  E.g. a hex editor or a file compression utility.  At no point while reading the file can you just expect that there is or isn't more.
>> 
>> I don't get you. What do you mean by "follow"?
>
>Basically that claiming any difference between binary and text files in EOF handling doesn't derive from any consistent logic.

OK. I agree. The concept of "text file" and "binary file" is context dependent. One can say from an abstract point of view that EOF shouldn't depend on text vs binary. It is practical, though, to tailor parts of the API for "text files" and for "binary files" so I think it's worth it even though it breaks the uniformity.

><snip>
>> The semantic content of the text file (eg a D source file) is independent of std.stream. You say some D source code can't end in the middle of a comment. I think such a file would be a semantically incorrect source file but there's no way std.stream can determine that. I could see if someone write a subclass of stream that knows about comments and throws on eof in a comment then that's fine with me. I don't see why that conflicts with returning EOF from getc.
>
>Nobody said anything about std.stream knowing about comments.  Just think about it.  Just look at this natural way of skipping over a comment (once it's established that we're in a comment:
>
>     char[] nextChars;
>     while((nextChars = file.readString(2)) != "*/") {
>         file.ungetc(nextChars[1]); // *
>     }
>
>* OK, so under getc, "This is the only method that will handle ungetc properly."  But you get the idea.
>
>It doesn't check for EOF, because this isn't part of the normal program logic.  Instead, it relies on exception handling to catch an input file malformed in this respect, just as we might use exception handling to catch file not found and other file access errors.
>
>And so we shouldn't be surprised to see this technique in use. Especially in quick and dirty programs, which are a significant part of the motivation for exceptions.

Note the only functions that would no longer throw are: getc and getcw. So
readString(2) would continue to throw (plus readString doesn't use the unget
buffer). Asking to read a fixed amount of character will throw if there aren't
enough. From a practical point of the the difference is that some code that uses
getc will be able to switch from something like
#try {
#  while (true) {
#   ... blah blah stream.getc() blah blah ...
#  } catch (ReadException ex) {
#    if (!stream.eof()) throw ex;
#  }
#}
to
#while (!stream.eof()) {
#  ... blah blah stream.getc() blah blah ...
#}
Everything else should remain the same. Inside std.stream when I make the change
I was able to remove the try/catches from readLine/w and scanf plus some
try/catches in std.socketstream.

>>> So really there is no correlation.
>> 
>> So are you arguing for throwing in getc or not throwing?
>
>Throwing.

ok - understood.


May 26, 2005
Stewart Gordon wrote:
> Ben Hinkle wrote:
> <snip>
> 
>> More specifically the key change would be to std.Stream
>>   void read(out char x) { readExact(&x, x.sizeof); }
>> would become something like
>>   void read(out char x) { if (readBlock(&x, x.sizeof) == 0) x = EOF; }
>> Since D uses unicode setting EOF=0xFF means it won't get confused with a regular character.
> 
> <snip>
> 
> That doesn't follow.  The input stream might not be Unicode; moreover, it might even be a binary file.
<snip>

Just thinking about it, even if the program does expect UTF-8 input, this has the drawback that a malformed input file containing a 0xFF byte could cause the input to be truncated.  Which probably wouldn't be desirable.  So if we're going to do this, should we make it throw an exception if it reads in 0xFF?

Stewart.

-- 
My e-mail is valid but not my primary mailbox.  Please keep replies on the 'group where everyone may benefit.
May 28, 2005
In article <d74p6u$2kkq$1@digitaldaemon.com>, Stewart Gordon says...
>
>Stewart Gordon wrote:
>> Ben Hinkle wrote:
>> <snip>
>> 
>>> More specifically the key change would be to std.Stream
>>>   void read(out char x) { readExact(&x, x.sizeof); }
>>> would become something like
>>>   void read(out char x) { if (readBlock(&x, x.sizeof) == 0) x = EOF; }
>>> Since D uses unicode setting EOF=0xFF means it won't get confused with
>>> a regular character.
>> 
>> <snip>
>> 
>> That doesn't follow.  The input stream might not be Unicode; moreover, it might even be a binary file.
><snip>
>
>Just thinking about it, even if the program does expect UTF-8 input, this has the drawback that a malformed input file containing a 0xFF byte could cause the input to be truncated.  Which probably wouldn't be desirable.  So if we're going to do this, should we make it throw an exception if it reads in 0xFF?

True. It is fairly evil to co-op a valid return value to mean EOF. I think it is possible to check if the EOF was actually in the stream by checking eof() after getc, though. If eof() is true then the EOF was because of end-of-file while if eof() is false then the EOF was read from the stream. The one little edge case that might not work is if the EOF was the last character in the stream and the stream was seekable (since then the stream can figure out when eof is true without having to read past the end). Maybe the "readEOF" flag that indicates the last read was past the end needs to be public readable.


1 2
Next ›   Last »