Jump to page: 1 2
Thread overview
How to detect end of stdin?
May 25, 2005
k2
May 25, 2005
Ben Hinkle
May 25, 2005
Stewart Gordon
May 25, 2005
Ben Hinkle
May 25, 2005
Regan Heath
May 25, 2005
Ben Hinkle
May 25, 2005
Ben Hinkle
May 25, 2005
Stewart Gordon
May 25, 2005
Vathix
May 25, 2005
Ben Hinkle
May 25, 2005
Vathix
May 25, 2005
Ben Hinkle
May 26, 2005
Vathix
May 25, 2005
Ben Hinkle
May 26, 2005
Stewart Gordon
May 26, 2005
Ben Hinkle
May 26, 2005
Stewart Gordon
May 28, 2005
Ben Hinkle
May 25, 2005
Ben Hinkle
May 25, 2005
test.d
---
import std.stream;

void main()
{
while(!stdin.eof())
printf("%c", stdin.getc());
}

---

>dmd test.d
>type test.d | test.exe
void main()
{
while(!stdin.eof())
printf("%c", stdin.getc());
}
Error: not enough data in stream


Where is wrong?
Windows 2000, DMD v0.125


May 25, 2005
It does seem wierd but here's what's going on: stdin.eof returns true
*after* eof is hit - but not before (since eof would have to do a read to
check). So that means you have to wrap the getc in a try/catch. I am tempted
to make getc return EOF at eof. What do people think? Returning EOF would
get rid of some ugly try-catches but it would make reading char different
from reading anything else (if you call read(x) with an int x then it can't
"return" eof so it must throw). More specifically the key change would be to
std.Stream
  void read(out char x) { readExact(&x, x.sizeof); }
would become something like
  void read(out char x) { if (readBlock(&x, x.sizeof) == 0) x = EOF; }
Since D uses unicode setting EOF=0xFF means it won't get confused with a
regular character.

Does that seem like a good trade-off?
-Ben

"k2" <k2_member@pathlink.com> wrote in message news:d71eoj$23uv$1@digitaldaemon.com...
> test.d
> ---
> import std.stream;
>
> void main()
> {
> while(!stdin.eof())
> printf("%c", stdin.getc());
> }
>
> ---
>
>>dmd test.d
>>type test.d | test.exe
> void main()
> {
> while(!stdin.eof())
> printf("%c", stdin.getc());
> }
> Error: not enough data in stream
>
>
> Where is wrong?
> Windows 2000, DMD v0.125
>
> 


May 25, 2005
Ben Hinkle wrote:
<snip>
> More specifically the key change would be to std.Stream
>   void read(out char x) { readExact(&x, x.sizeof); }
> would become something like
>   void read(out char x) { if (readBlock(&x, x.sizeof) == 0) x = EOF; }
> Since D uses unicode setting EOF=0xFF means it won't get confused with a regular character.
<snip>

That doesn't follow.  The input stream might not be Unicode; moreover, it might even be a binary file.

Moreover, read is designed to be called once you've already established that there should not be an EOF.  We should keep intact the concepts of expected and unexpected EOF.

http://www.digitalmars.com/drn-bin/wwwnews?digitalmars.D/4085

Stewart.

-- 
My e-mail is valid but not my primary mailbox.  Please keep replies on the 'group where everyone may benefit.
May 25, 2005
"Stewart Gordon" <smjg_1998@yahoo.com> wrote in message news:d71r04$2jsb$1@digitaldaemon.com...
> Ben Hinkle wrote:
> <snip>
>> More specifically the key change would be to std.Stream
>>   void read(out char x) { readExact(&x, x.sizeof); }
>> would become something like
>>   void read(out char x) { if (readBlock(&x, x.sizeof) == 0) x = EOF; }
>> Since D uses unicode setting EOF=0xFF means it won't get confused with a
>> regular character.
> <snip>
>
> That doesn't follow.  The input stream might not be Unicode; moreover, it might even be a binary file.

char, wchar and dchar imply unicode since this is D. Are you referring to the fact that D doesn't enforce unicode "char" arrays? Reading a non-unicode stream using std.stream isn't possible without another library like libiconv or ICU to map encodings. I would think if one is reading a non-unicode stream one wouldn't use char[] or char or wchar[] or friends - instead one would use byte[] and such.

> Moreover, read is designed to be called once you've already established that there should not be an EOF.  We should keep intact the concepts of expected and unexpected EOF.
>
> http://www.digitalmars.com/drn-bin/wwwnews?digitalmars.D/4085

I'm not sure what you mean by "intact" since std.stream doesn't really have the notion of expected and unexpected eof - right now they are all unexpected. The non-char reads will throw (unexpected eof). Only trying to read char (or I suppose wchar or dchar) will return EOF (expected eof). The idea is that in a binary file reaching eof in a read is unexpected while reaching eof in a text file is expected.


May 25, 2005
On Wed, 25 May 2005 08:44:28 -0400, Ben Hinkle <ben.hinkle@gmail.com> wrote:
> "Stewart Gordon" <smjg_1998@yahoo.com> wrote in message
> news:d71r04$2jsb$1@digitaldaemon.com...
>> Ben Hinkle wrote:
>> <snip>
>>> More specifically the key change would be to std.Stream
>>>   void read(out char x) { readExact(&x, x.sizeof); }
>>> would become something like
>>>   void read(out char x) { if (readBlock(&x, x.sizeof) == 0) x = EOF; }
>>> Since D uses unicode setting EOF=0xFF means it won't get confused with a
>>> regular character.
>> <snip>
>>
>> That doesn't follow.  The input stream might not be Unicode; moreover, it
>> might even be a binary file.
>
> char, wchar and dchar imply unicode since this is D. Are you referring to
> the fact that D doesn't enforce unicode "char" arrays? Reading a non-unicode
> stream using std.stream isn't possible without another library like libiconv
> or ICU to map encodings. I would think if one is reading a non-unicode
> stream one wouldn't use char[] or char or wchar[] or friends - instead one
> would use byte[] and such.
>
>> Moreover, read is designed to be called once you've already established
>> that there should not be an EOF.  We should keep intact the concepts of
>> expected and unexpected EOF.
>>
>> http://www.digitalmars.com/drn-bin/wwwnews?digitalmars.D/4085
>
> I'm not sure what you mean by "intact" since std.stream doesn't really have
> the notion of expected and unexpected eof - right now they are all
> unexpected. The non-char reads will throw (unexpected eof). Only trying to
> read char (or I suppose wchar or dchar) will return EOF (expected eof). The
> idea is that in a binary file reaching eof in a read is unexpected while
> reaching eof in a text file is expected.

It's a curly problem, that's for sure.

My impression is that the EOF is expected when reading one byte at a time. Maybe also when reading the first byte of a greater than 1 byte thing (where thing is a wchar, dchar, short, int, long, float, etc). But EOF is unexpected when in the middle of reading something.

So, for example if you try to read an 'int' and get 2 bytes then EOF, it's unexpected. But, if you're reading chars or bytes, one at time, you expect to hit/read EOF eventually.

It could be argued that 'char' is different to 'byte' as, correct me if I am wrong, a single 'char' is a unicode fragment, possibly an incomplete character. So it's concievable you might want to validate it, and if it's incomplete you have an un-expected EOF as opposed to an expected one.

Regan
May 25, 2005
> My impression is that the EOF is expected when reading one byte at a time. Maybe also when reading the first byte of a greater than 1 byte thing (where thing is a wchar, dchar, short, int, long, float, etc). But EOF is unexpected when in the middle of reading something.

Good point. Half a wchar is unexpected.

> So, for example if you try to read an 'int' and get 2 bytes then EOF, it's unexpected. But, if you're reading chars or bytes, one at time, you expect to hit/read EOF eventually.

I had assumed reading bytes would be considered binary io and so hitting eof would throw. Off the top of my head I would prefer to keep bytes as numeric and chars as text.

> It could be argued that 'char' is different to 'byte' as, correct me if I am wrong, a single 'char' is a unicode fragment, possibly an incomplete character. So it's concievable you might want to validate it, and if it's incomplete you have an un-expected EOF as opposed to an expected one.

I agree char is different than byte. The trouble with trying to validate multi-byte codepoints is that you would need to look ahead or keep state about what the previous bytes were in order to know if the current byte being read is in the middle of a codepoint or not. It seems like a lot of trouble for unclear benefit.


May 25, 2005
> My impression is that the EOF is expected when reading one byte at a time. Maybe also when reading the first byte of a greater than 1 byte thing (where thing is a wchar, dchar, short, int, long, float, etc). But EOF is unexpected when in the middle of reading something.

sorry for the double post, but here's a possible read(out wchar x):
  void read(out wchar x) {
    size_t n = readBlock(&x, x.sizeof);
    if (n == 0)
      x = wchar.init;
    else if (n == 1) { // could be partial read
      void* buf = &x;
      if (readBlock(buf+1, 1) == 0)
        throw new ReadException(...);
    }
  }

That way an eof with half a wchar throws but eof with no data returns EOF. The dchar read would be something similar but probably with a loop for partial reads since it can read up to four times instead of twice.


May 25, 2005
Ben Hinkle wrote:
<snip>
> char, wchar and dchar imply unicode since this is D. Are you referring to the fact that D doesn't enforce unicode "char" arrays?  Reading a non-unicode stream using std.stream isn't possible without another library like libiconv or ICU to map encodings.

std.stream doesn't care at all about the format of input to that level.

> I would think if one is reading a non-unicode stream one wouldn't use char[] or char or wchar[] or friends - instead one would use byte[] and such.

Up until the point where you need to do console I/O or access an external API that relies on whatever encoding the input is in.

>> Moreover, read is designed to be called once you've already established that there should not be an EOF.  We should keep intact the concepts of expected and unexpected EOF.
>> 
>> http://www.digitalmars.com/drn-bin/wwwnews?digitalmars.D/4085
> 
> I'm not sure what you mean by "intact" since std.stream doesn't really have the notion of expected and unexpected eof - right now they are all unexpected.

An expected EOF is handled by checking for EOF before attempting to read.  It's part of common sense rather than of std.stream itself.  I.e. you check for EOF before reading if this is part of the normal program logic.

At the moment one can rely on exceptions to catch a premature end of file.  This should remain so.  I refer you back to the error handling philosophy.

> The non-char reads will throw (unexpected eof). Only trying to read char (or I suppose wchar or dchar) will return EOF (expected eof).  The idea is that in a binary file reaching eof in a read is unexpected while reaching eof in a text file is expected.

That doesn't follow either.  For example, suppose you're writing a utility that manipulates binary files in general.  E.g. a hex editor or a file compression utility.  At no point while reading the file can you just expect that there is or isn't more.

Conversely, suppose you're writing a D compiler.  A D code file is a text file.  And yet it can't end abruptly in the middle of a comment or string literal.  Similarly, many of my department's programs use parameter files designed to be edited directly by the user, with one parameter per line.    If you're expecting the next parameter but instead reach the end of the file, then that's unexpected.

So really there is no correlation.

Stewart.

-- 
My e-mail is valid but not my primary mailbox.  Please keep replies on
the 'group where everyone may benefit.
May 25, 2005
What about having 2 different streams: binary and text.

Binary one will work as it does now where eof() just checks the file pointer.

Text one will use the unget buffer. If the unget buffer contains a character, it is not eof; otherwise it tries to read one into it.
May 25, 2005
>>> Moreover, read is designed to be called once you've already established that there should not be an EOF.  We should keep intact the concepts of expected and unexpected EOF.
>>>
>>> http://www.digitalmars.com/drn-bin/wwwnews?digitalmars.D/4085
>>
>> I'm not sure what you mean by "intact" since std.stream doesn't really have the notion of expected and unexpected eof - right now they are all unexpected.
>
> An expected EOF is handled by checking for EOF before attempting to read. It's part of common sense rather than of std.stream itself.  I.e. you check for EOF before reading if this is part of the normal program logic.

But for the situation of the original post (reading stdin) the OS doesn't tell us eof has happened until you try to read and it fails. So in other words for stdin "eof" means "did the last read attempt try to read past eof".

> At the moment one can rely on exceptions to catch a premature end of file. This should remain so.  I refer you back to the error handling philosophy.

That would work fine if eof could detect that stdin has ended without attempting to read.

>> The non-char reads will throw (unexpected eof). Only trying to read char (or I suppose wchar or dchar) will return EOF (expected eof).  The idea is that in a binary file reaching eof in a read is unexpected while reaching eof in a text file is expected.
>
> That doesn't follow either.  For example, suppose you're writing a utility that manipulates binary files in general.  E.g. a hex editor or a file compression utility.  At no point while reading the file can you just expect that there is or isn't more.

I don't get you. What do you mean by "follow"? I'm not trying to chain a sequence of statements into a proof or something. I'm stating that from a practical point of view binary files should throw if a read is incomplete and text files should return EOF. I don't understand what you are arguing read() do for different situations.

> Conversely, suppose you're writing a D compiler.  A D code file is a text file.  And yet it can't end abruptly in the middle of a comment or string literal.  Similarly, many of my department's programs use parameter files designed to be edited directly by the user, with one parameter per line. If you're expecting the next parameter but instead reach the end of the file, then that's unexpected.

The semantic content of the text file (eg a D source file) is independent of std.stream. You say some D source code can't end in the middle of a comment. I think such a file would be a semantically incorrect source file but there's no way std.stream can determine that. I could see if someone write a subclass of stream that knows about comments and throws on eof in a comment then that's fine with me. I don't see why that conflicts with returning EOF from getc.

> So really there is no correlation.

So are you arguing for throwing in getc or not throwing?


« First   ‹ Prev
1 2