stream.getc() doesn't recognize eof

Mar 12, 2008

Brian White

Mar 13, 2008

Regan Heath

Mar 13, 2008

Mar 13, 2008

Mar 13, 2008

Mar 14, 2008

Mar 16, 2008

I was looking through the std.stream code of Phobos and found this function: // reads and returns next character from the stream, // handles characters pushed back by ungetc() // returns char.init on eof. char getc() { char c; if (prevCr) { prevCr = false; c = getc(); if (c != '\n') return c; } if (unget.length > 1) { c = cast(char)unget[unget.length - 1]; unget.length = unget.length - 1; } else { readBlock(&c,1); } return c; } Is there something I don't understand? How does it recognize EOF? The "readBlock" function is defined as returning 0 (zero) if there is no more data but its return value in not checked. -- Brian

Brian White wrote: > I was looking through the std.stream code of Phobos and found this function: > > // reads and returns next character from the stream, > // handles characters pushed back by ungetc() > // returns char.init on eof. > char getc() { > char c; > if (prevCr) { > prevCr = false; > c = getc(); > if (c != '\n') > return c; > } > if (unget.length > 1) { > c = cast(char)unget[unget.length - 1]; > unget.length = unget.length - 1; > } else { > readBlock(&c,1); > } > return c; > } > > > Is there something I don't understand? How does it recognize EOF? The "readBlock" function is defined as returning 0 (zero) if there is no more data but its return value in not checked. At EOF readBlock returns 0, but more importantly it does not modify the value of 'c' which it is passed. The value of 'c' is char.init (due to D's automatic initialisation of variables to their init value) So, because c == char.init and nothing has modified it, the path which calls readBlock will return char.init when EOF is reached. :) Regan

> So, because c == char.init and nothing has modified it, the path which calls readBlock will return char.init when EOF is reached. Ah, thanks! I must say that this technique worries me somewhat. "readBlock" is an abstract function definable by any derived class and I don't believe that "c must remain unchanged where data is not stored" is a defined output requirement of that method. -- Brian

March 13, 2008

Re: stream.getc() doesn't recognize eof

Posted by Regan Heath
in reply to Brian White

Permalink

Regan Heath

Posted in reply to Brian White

Permalink

Brian White wrote:
>> So, because c == char.init and nothing has modified it, the path which calls readBlock will return char.init when EOF is reached.
> 
> Ah, thanks!
> 
> I must say that this technique worries me somewhat.  "readBlock" is an abstract function definable by any derived class and I don't believe that "c must remain unchanged where data is not stored" is a defined output requirement of that method.

Good point, might be safer to check for the 0 return and set c to char.init explicitly.

You comment did get me thinking... Is there some way of expressing the requirement using design by contract?  I think the answer is, not easily, you'd have to do something like:

// the problem being that we need a global to copy the input buffer into
// and it could be potentially huge.
// when really all we want is some way to detect whether
// data was written to that address _at all_
byte* buffer_in;

abstract size_t readBlock(void* buffer, size_t size)
in {
  buffer_in = malloc(size);
  memcpy(buffer_in, buffer, size);
}
out (result) {
  assert(result > 0 ||
        (result == 0 && memcmp(buffer_in, buffer, size) == 0));
}
/* note, no body, therefore function is still 'abstract' */

All that assuming it is legal to specify in/out contracts on an abstract method without a body.

It should be possible, it would simply follow the same rules given for inheritance here under "In, Out and Inheritance":
http://www.digitalmars.com/d/1.0/dbc.html

Regan

> Good point, might be safer to check for the 0 return and set c to char.init explicitly. I think it makes a better design. This way feels like relying on side-effects and I've spent enough time coding perl to know that making use of side-effects is a great start towards unreadable and unmaintainable code. The more obvious you make code, the less likely there will be bugs and the easier it will be for someone else to maintain it. A comment like "c still has .init value if readBlock failed" would also be sufficient. If I were maintaining this code, I would have (wrongly) assumed a bug and "corrected" it, possibly introducing a new bug. > (result == 0 && memcmp(buffer_in, buffer, size) == 0)); Eee-Gad, but that's painful! Performance could easily be so bad that I'd turn off the checks and then they're no use at all. I've never known a "read" function to modify bytes beyond the "count" amount returned, but I don't know if it's ever explicitly stated not to do so. -- Brian

Brian White wrote: >> (result == 0 && memcmp(buffer_in, buffer, size) == 0)); > > Eee-Gad, but that's painful! Performance could easily be so bad that I'd turn off the checks and then they're no use at all. You can use -release to turn off contracts and asserts, so only non-release builds would suffer the penalty. > I've never known a "read" function to modify bytes beyond the "count" amount returned, but I don't know if it's ever explicitly stated not to do so. True. You could perhaps cheat a little and remember just the first byte of the output buffer, chances are if the first byte hasn't changed, nothing was written to the buffer. Regan

>>> (result == 0 && memcmp(buffer_in, buffer, size) == 0)); >> >> Eee-Gad, but that's painful! Performance could easily be so bad that I'd turn off the checks and then they're no use at all. > > You can use -release to turn off contracts and asserts, so only non-release builds would suffer the penalty. My worry is that the test code would be such a performance hit that it would be impossible to use without -release. >> I've never known a "read" function to modify bytes beyond the "count" amount returned, but I don't know if it's ever explicitly stated not to do so. > > True. You could perhaps cheat a little and remember just the first byte of the output buffer, chances are if the first byte hasn't changed, nothing was written to the buffer. I was just thinking the exact same thing. -- Brian

Forums