Thread overview
stream.getc() doesn't recognize eof
Mar 12, 2008
Brian White
Mar 13, 2008
Regan Heath
Mar 13, 2008
Brian White
Mar 13, 2008
Regan Heath
Mar 13, 2008
Brian White
Mar 14, 2008
Regan Heath
Mar 16, 2008
Brian White
March 12, 2008
I was looking through the std.stream code of Phobos and found this function:

  // reads and returns next character from the stream,
  // handles characters pushed back by ungetc()
  // returns char.init on eof.
  char getc() {
    char c;
    if (prevCr) {
      prevCr = false;
      c = getc();
      if (c != '\n')
        return c;
    }
    if (unget.length > 1) {
      c = cast(char)unget[unget.length - 1];
      unget.length = unget.length - 1;
    } else {
      readBlock(&c,1);
    }
    return c;
  }


Is there something I don't understand?  How does it recognize EOF?  The "readBlock" function is defined as returning 0 (zero) if there is no more data but its return value in not checked.

-- Brian
March 13, 2008
Brian White wrote:
> I was looking through the std.stream code of Phobos and found this function:
> 
>   // reads and returns next character from the stream,
>   // handles characters pushed back by ungetc()
>   // returns char.init on eof.
>   char getc() {
>     char c;
>     if (prevCr) {
>       prevCr = false;
>       c = getc();
>       if (c != '\n')
>         return c;
>     }
>     if (unget.length > 1) {
>       c = cast(char)unget[unget.length - 1];
>       unget.length = unget.length - 1;
>     } else {
>       readBlock(&c,1);
>     }
>     return c;
>   }
> 
> 
> Is there something I don't understand?  How does it recognize EOF?  The "readBlock" function is defined as returning 0 (zero) if there is no more data but its return value in not checked.

At EOF readBlock returns 0, but more importantly it does not modify the value of 'c' which it is passed.

The value of 'c' is char.init (due to D's automatic initialisation of variables to their init value)

So, because c == char.init and nothing has modified it, the path which calls readBlock will return char.init when EOF is reached.

:)

Regan
March 13, 2008
> So, because c == char.init and nothing has modified it, the path which calls readBlock will return char.init when EOF is reached.

Ah, thanks!

I must say that this technique worries me somewhat.  "readBlock" is an abstract function definable by any derived class and I don't believe that "c must remain unchanged where data is not stored" is a defined output requirement of that method.

-- Brian
March 13, 2008
Brian White wrote:
>> So, because c == char.init and nothing has modified it, the path which calls readBlock will return char.init when EOF is reached.
> 
> Ah, thanks!
> 
> I must say that this technique worries me somewhat.  "readBlock" is an abstract function definable by any derived class and I don't believe that "c must remain unchanged where data is not stored" is a defined output requirement of that method.

Good point, might be safer to check for the 0 return and set c to char.init explicitly.

You comment did get me thinking... Is there some way of expressing the requirement using design by contract?  I think the answer is, not easily, you'd have to do something like:

// the problem being that we need a global to copy the input buffer into
// and it could be potentially huge.
// when really all we want is some way to detect whether
// data was written to that address _at all_
byte* buffer_in;

abstract size_t readBlock(void* buffer, size_t size)
in {
  buffer_in = malloc(size);
  memcpy(buffer_in, buffer, size);
}
out (result) {
  assert(result > 0 ||
        (result == 0 && memcmp(buffer_in, buffer, size) == 0));
}
/* note, no body, therefore function is still 'abstract' */

All that assuming it is legal to specify in/out contracts on an abstract method without a body.

It should be possible, it would simply follow the same rules given for inheritance here under "In, Out and Inheritance":
http://www.digitalmars.com/d/1.0/dbc.html

Regan
March 13, 2008
> Good point, might be safer to check for the 0 return and set c to char.init explicitly.

I think it makes a better design.  This way feels like relying on side-effects and I've spent enough time coding perl to know that making use of side-effects is a great start towards unreadable and unmaintainable code.

The more obvious you make code, the less likely there will be bugs and the easier it will be for someone else to maintain it.  A comment like "c still has .init value if readBlock failed" would also be sufficient.

If I were maintaining this code, I would have (wrongly) assumed a bug and "corrected" it, possibly introducing a new bug.


>         (result == 0 && memcmp(buffer_in, buffer, size) == 0));

Eee-Gad, but that's painful!  Performance could easily be so bad that I'd turn off the checks and then they're no use at all.

I've never known a "read" function to modify bytes beyond the "count" amount returned, but I don't know if it's ever explicitly stated not to do so.

-- Brian
March 14, 2008
Brian White wrote:
>>         (result == 0 && memcmp(buffer_in, buffer, size) == 0));
> 
> Eee-Gad, but that's painful!  Performance could easily be so bad that I'd turn off the checks and then they're no use at all.

You can use -release to turn off contracts and asserts, so only non-release builds would suffer the penalty.

> I've never known a "read" function to modify bytes beyond the "count" amount returned, but I don't know if it's ever explicitly stated not to do so.

True.  You could perhaps cheat a little and remember just the first byte of the output buffer, chances are if the first byte hasn't changed, nothing was written to the buffer.

Regan
March 16, 2008
>>>         (result == 0 && memcmp(buffer_in, buffer, size) == 0));
>>
>> Eee-Gad, but that's painful!  Performance could easily be so bad that I'd turn off the checks and then they're no use at all.
> 
> You can use -release to turn off contracts and asserts, so only non-release builds would suffer the penalty.

My worry is that the test code would be such a performance hit that it would be impossible to use without -release.


>> I've never known a "read" function to modify bytes beyond the "count" amount returned, but I don't know if it's ever explicitly stated not to do so.
> 
> True.  You could perhaps cheat a little and remember just the first byte of the output buffer, chances are if the first byte hasn't changed, nothing was written to the buffer.

I was just thinking the exact same thing.

-- Brian