Minor std.stdio.File.ByLine rant - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » Minor std.stdio.File.ByLine rant

Thread overview

Minor std.stdio.File.ByLine rant
Feb 26, 2014 H. S. Teoh
Feb 26, 2014 Jakob Ovrum
Feb 26, 2014 bearophile
Feb 27, 2014 Jakob Ovrum
Feb 27, 2014 Steven Schveighoffer
Feb 27, 2014 H. S. Teoh
Feb 27, 2014 Sean Kelly
Feb 27, 2014 Steven Schveighoffer
Feb 27, 2014 Steven Schveighoffer
Feb 27, 2014 H. S. Teoh
Feb 27, 2014 Steven Schveighoffer
Feb 28, 2014 H. S. Teoh
Feb 28, 2014 Steven Schveighoffer

February 26, 2014

Minor std.stdio.File.ByLine rant

Posted by H. S. Teoh

H. S. Teoh

I'm writing a CLI program that uses File.ByLine to read input commands, with optional prompting (if run in interactive mode). One would imagine that this should be a natural use for ByLine (perhaps not as common nowadays with the rampant GUI fanboyism, but it still happens in some niches), but it is fraught with peril.

First of all, the way ByLine works is kinda tricky, even in the previous releases. The underlying cause is that at least on Posix, the underlying C feof() call doesn't actually tell you whether you're really at EOF until you try to read something from the file descriptor. I know there are good reasons for this, but this special percolates up the standard library code and causes a problem with D's input range primitives, where .empty must tell the caller, right now, whether data is available, *before* .front ever returns anything.

At one time, this problem was worked around by issuing a single fgetc on the underlying file descriptor in ByLine's .empty method to determine its EOF state, and then doing a fungetc to put the char back into the stream.  However, this code is a rather ugly hack, and causes the problem that when the interactive program needs to output a prompt before blocking on input, it has to do so *before* it calls ByLine.empty (since otherwise .empty blocks and the prompt doesn't get printed until after the user has hit Enter -- clearly unacceptable for an interactive shell program). If the stream turns out empty after all, then the prompt is already output, and there's no way to take it back, so an extraneous prompt is always written.

Understandably, the fungetc hack was subsequently removed from Phobos, by caching the subsequent line the first time .empty was called, which eliminated the ugliness of fungetc, and allowed current code to continue working as before.

Then recently, and also understandably, caching things in .empty was frowned upon, so the caching was removed from .empty altogether and pushed into the ByLine ctor. From the standpoint of Phobos code, this is perhaps the ideal solution: the ctor reads the stream to get the first line and simultaneously determine the EOF status of the stream, and there is no need for ugly boolean state flags, fungetc ugliness, and generally unpleasant code.

However, what happens is that now, ByLine will block on input *upon construction*. This is rather unpleasant when your program needs to do something like this:

	void main() {
		string prompt;
		...
		ByLine!char input;
		if (useStandardInput) {
			input = stdin.byLine();
		} else if (useScriptFile) {
			input = File(filename).byLine();
		}
		...
		if (mode == ProgramMode.modeA) { // mode is an enum
			runModeA(input);
		} else {
			runModeB(input);
		}
	}

	void runModeA(ByLine!char input) {
		write("modeA> ");	// display prompt
		while (!input.empty) {
			...
		}
	}

	void runModeB(ByLine!char input) {
		write("modeB> ");	// display prompt
		while (!input.empty) {
			...
		}
	}

The problem is, when input is initialized, we don't know what prompt to use yet, but ByLine's ctor will already block when it tries to read from stdin!

The current workaround I implemented is to use a wrapper around ByLine that lazily constructs it when .empty is called.

Who knew something so simple as an interactive prompting program that reads input lines could turn into such a nightmare when ByLine is used?

:-(


T

-- 
What is Matter, what is Mind? Never Mind, it doesn't Matter.

February 26, 2014

Re: Minor std.stdio.File.ByLine rant

Posted by Jakob Ovrum
in reply to H. S. Teoh

Jakob Ovrum

Posted in reply to H. S. Teoh

On Wednesday, 26 February 2014 at 23:45:48 UTC, H. S. Teoh wrote:
> The problem is, when input is initialized, we don't know what prompt to
> use yet, but ByLine's ctor will already block when it tries to read from
> stdin!

Ouch, I think I saw this coming... [1]

[1] https://github.com/D-Programming-Language/phobos/pull/1883

February 26, 2014

Re: Minor std.stdio.File.ByLine rant

Posted by bearophile
in reply to H. S. Teoh

bearophile

Posted in reply to H. S. Teoh

H. S. Teoh:

> I'm writing a CLI program that uses File.ByLine to read input commands,

Isn't using readln() better for that? File.byLine is to read lines of files on disk.

Bye,
bearophile

February 27, 2014

Re: Minor std.stdio.File.ByLine rant

Posted by Jakob Ovrum
in reply to bearophile

Jakob Ovrum

Posted in reply to bearophile

On Wednesday, 26 February 2014 at 23:59:09 UTC, bearophile wrote:
> H. S. Teoh:
>
>> I'm writing a CLI program that uses File.ByLine to read input commands,
>
> Isn't using readln() better for that? File.byLine is to read lines of files on disk.
>
> Bye,
> bearophile

Says who? The type system and documentation only assert that it works on files, with no reservations about what kind of file. The standard input file is as fine a file as any.

February 27, 2014

Re: Minor std.stdio.File.ByLine rant

Posted by Steven Schveighoffer
in reply to H. S. Teoh

Steven Schveighoffer

Posted in reply to H. S. Teoh

On Wed, 26 Feb 2014 18:44:10 -0500, H. S. Teoh <hsteoh@quickfur.ath.cx> wrote:

> First of all, the way ByLine works is kinda tricky, even in the previous
> releases. The underlying cause is that at least on Posix, the underlying
> C feof() call doesn't actually tell you whether you're really at EOF
> until you try to read something from the file descriptor.

This is not a posix problem, it's a general stream problem.

A stream is not at EOF until the write end is closed. Until then, you cannot know whether it's empty until you read and don't get anything back. Even if a primitive existed that allowed you to tell whether the write end was closed, you can race this against the other process closing it's write end.

I think the correct solution is to block on the first front call. We may be able to do this without storing an additional variable.

-Steve

February 27, 2014

Re: Minor std.stdio.File.ByLine rant

Posted by H. S. Teoh
in reply to Steven Schveighoffer

H. S. Teoh

Posted in reply to Steven Schveighoffer

On Thu, Feb 27, 2014 at 07:55:59AM -0500, Steven Schveighoffer wrote:
> On Wed, 26 Feb 2014 18:44:10 -0500, H. S. Teoh <hsteoh@quickfur.ath.cx> wrote:
> 
> >First of all, the way ByLine works is kinda tricky, even in the previous releases. The underlying cause is that at least on Posix, the underlying C feof() call doesn't actually tell you whether you're really at EOF until you try to read something from the file descriptor.
> 
> This is not a posix problem, it's a general stream problem.
> 
> A stream is not at EOF until the write end is closed. Until then, you cannot know whether it's empty until you read and don't get anything back. Even if a primitive existed that allowed you to tell whether the write end was closed, you can race this against the other process closing it's write end.
> 
> I think the correct solution is to block on the first front call. We may be able to do this without storing an additional variable.
[...]

Unfortunately, you can't. Since Phobos can't know whether the file (which may be a network socket, say) is at EOF without first blocking on read, it won't be able to return the correct value from .empty, and according to the range API, it's invalid to access .front unless .empty returns false. So this solution doesn't work. :-(


T

-- 
All men are mortal. Socrates is mortal. Therefore all men are Socrates.

February 27, 2014

Re: Minor std.stdio.File.ByLine rant

Posted by Sean Kelly
in reply to H. S. Teoh

Sean Kelly

Posted in reply to H. S. Teoh

Are the peek routines standard?  I'm on my phone so I can't easily check right now. Barring that, there's an ioctl call that can tell whether data is available, though I'm not sure offhand what the result would be for a file if you haven't read anything yet.

February 27, 2014

Re: Minor std.stdio.File.ByLine rant

Posted by Steven Schveighoffer
in reply to H. S. Teoh

Steven Schveighoffer

Posted in reply to H. S. Teoh

On Thu, 27 Feb 2014 10:04:47 -0500, H. S. Teoh <hsteoh@quickfur.ath.cx> wrote:

> On Thu, Feb 27, 2014 at 07:55:59AM -0500, Steven Schveighoffer wrote:
>> On Wed, 26 Feb 2014 18:44:10 -0500, H. S. Teoh
>> <hsteoh@quickfur.ath.cx> wrote:
>>
>> >First of all, the way ByLine works is kinda tricky, even in the
>> >previous releases. The underlying cause is that at least on Posix,
>> >the underlying C feof() call doesn't actually tell you whether you're
>> >really at EOF until you try to read something from the file
>> >descriptor.
>>
>> This is not a posix problem, it's a general stream problem.
>>
>> A stream is not at EOF until the write end is closed. Until then,
>> you cannot know whether it's empty until you read and don't get
>> anything back. Even if a primitive existed that allowed you to tell
>> whether the write end was closed, you can race this against the
>> other process closing it's write end.
>>
>> I think the correct solution is to block on the first front call. We
>> may be able to do this without storing an additional variable.
> [...]
>
> Unfortunately, you can't. Since Phobos can't know whether the file
> (which may be a network socket, say) is at EOF without first blocking on
> read, it won't be able to return the correct value from .empty, and
> according to the range API, it's invalid to access .front unless .empty
> returns false. So this solution doesn't work. :-(

Yes, you are right!

Thinking about it, the only correct solution is to do what it already does -- establish the first line on construction. empty cannot depend on front, and doing something different on the first empty vs. every other one makes the range bloated and confusing.

The issue really is, to treat the construction and popFront as blocking. Streams are a tricky business indeed. I think your solution is the only valid one. Unfortunate that you have to do this.

An interesting general solution is to use a delegate to generate the range, giving an easy one-line construction without having to make a wrapper range that lazily constructs on empty, but just using a delegate name does not call it. I did come up with this:

import std.stdio;
import std.range;

void foo(R)(R r)
{
    static if(isInputRange!R)
    {
        alias _r = r;
    }
    else // if is no-arg delegate and returns input range (too lazy to figure this out :)
    {
        auto _r(){return r();}
    }

    foreach(x; _r)
    {
        writeln(x);
    }
}
void main()
{
    foo(() => stdin.byLine);
    foo([1,2,3]);
}

The static if at the beginning is awkward, but just allows the rest of the code to be identical whether you call with a delegate or a range.

-Steve

February 27, 2014

Re: Minor std.stdio.File.ByLine rant

Posted by Steven Schveighoffer
in reply to Sean Kelly

Steven Schveighoffer

Posted in reply to Sean Kelly

On Thu, 27 Feb 2014 11:22:45 -0500, Sean Kelly <sean@invisibleduck.org> wrote:

> Are the peek routines standard?  I'm on my phone so I can't easily check right now. Barring that, there's an ioctl call that can tell whether data is available, though I'm not sure offhand what the result would be for a file if you haven't read anything yet.

Peek doesn't help. You can't, in a non-blocking way, tell if input will be forthcoming without actually receiving the input.

-Steve

February 27, 2014

Re: Minor std.stdio.File.ByLine rant

Posted by H. S. Teoh
in reply to Steven Schveighoffer

H. S. Teoh

Posted in reply to Steven Schveighoffer

On Thu, Feb 27, 2014 at 11:26:42AM -0500, Steven Schveighoffer wrote:
> On Thu, 27 Feb 2014 10:04:47 -0500, H. S. Teoh <hsteoh@quickfur.ath.cx> wrote:
> 
> >On Thu, Feb 27, 2014 at 07:55:59AM -0500, Steven Schveighoffer wrote:
> >>On Wed, 26 Feb 2014 18:44:10 -0500, H. S. Teoh <hsteoh@quickfur.ath.cx> wrote:
> >>
> >>>First of all, the way ByLine works is kinda tricky, even in the previous releases. The underlying cause is that at least on Posix, the underlying C feof() call doesn't actually tell you whether you're really at EOF until you try to read something from the file descriptor.
> >>
> >>This is not a posix problem, it's a general stream problem.
> >>
> >>A stream is not at EOF until the write end is closed. Until then, you cannot know whether it's empty until you read and don't get anything back. Even if a primitive existed that allowed you to tell whether the write end was closed, you can race this against the other process closing it's write end.
> >>
> >>I think the correct solution is to block on the first front call. We may be able to do this without storing an additional variable.
> >[...]
> >
> >Unfortunately, you can't. Since Phobos can't know whether the file (which may be a network socket, say) is at EOF without first blocking on read, it won't be able to return the correct value from .empty, and according to the range API, it's invalid to access .front unless .empty returns false. So this solution doesn't work. :-(
> 
> Yes, you are right!
> 
> Thinking about it, the only correct solution is to do what it already does -- establish the first line on construction. empty cannot depend on front, and doing something different on the first empty vs. every other one makes the range bloated and confusing.
> 
> The issue really is, to treat the construction and popFront as blocking. Streams are a tricky business indeed. I think your solution is the only valid one. Unfortunate that you have to do this.
> 
> An interesting general solution is to use a delegate to generate the range, giving an easy one-line construction without having to make a wrapper range that lazily constructs on empty, but just using a delegate name does not call it. I did come up with this:

Actually, now that I think about it, can't we just make ByLine lazily constructed? It's already a wrapper around ByLineImpl anyway (since it's being refcounted), so why not just make the wrapper create ByLineImpl only when you actually attempt to use it? That would solve the problem: you can call ByLine but it won't block until ByLineImpl is actually created, which is the first time you call ByLine.empty.


T

-- 
Don't drink and derive. Alcohol and algebra don't mix.

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation