Thread overview
read till EOF from stdin
Dec 11, 2020
kdevel
Dec 11, 2020
frame
Dec 11, 2020
kdevel
Dec 11, 2020
frame
Dec 11, 2020
kdevel
Dec 11, 2020
Adam D. Ruppe
Dec 11, 2020
kdevel
Dec 11, 2020
frame
December 11, 2020
Currently as a workaround I read all the chars from stdin with

   import std.file;
   auto s = cast (string) read("/dev/fd/0");

after I found that you can't read from stdin. This is of course
non-portable Linux only code. In perl I frequently use the idiom

   $s = join ('', <>);

that corresponds to D's

   import std.stdio;
   import std.array;
   import std.typecons;
   auto s = stdin.byLineCopy(Yes.keepTerminator).join;

which alas needs an amazing amount of import boilerplate. BTW why does
byLine not suffice in this case? Then there is a third way of reading
all the characters from stdin:

   import std.stdio;
   import std.array;
   auto s = cast (string) stdin.byChunk(1).join;

This version behaves correctly if Ctrl+D is pressed anywhere after
the program is started. This is no longer the case a if larger chunk
is read, e.g.:

   auto s = cast (string) stdin.byChunk(4).join;

As strace reveals the resulting program sometimes reads twice zero
characters before it terminates:

   read(0, a                                         <-- A, return
   "a\n", 1024)                    = 2
   read(0, "", 1024)                       = 0       <-- ctrl+d
   read(0, "", 1024)                       = 0       <-- ctrl+d

Any comments or ideas?

December 11, 2020
On Friday, 11 December 2020 at 02:31:24 UTC, kdevel wrote:
>    auto s = cast (string) stdin.byChunk(4).join;
>
> As strace reveals the resulting program sometimes reads twice zero
> characters before it terminates:
>
>    read(0, a                                         <-- A, return
>    "a\n", 1024)                    = 2
>    read(0, "", 1024)                       = 0       <-- ctrl+d
>    read(0, "", 1024)                       = 0       <-- ctrl+d
>
> Any comments or ideas?

I see expected behaviour here if you use a buffer of length 4. I don't know what you want to achieve here. If you want to stop reading from stdin, you should check for eof() instead. You should not check yourself for the character. eof() can be lock in by multiple ways and it is the only correct way to handle all of them.
December 11, 2020
On Friday, 11 December 2020 at 11:05:59 UTC, frame wrote:
> On Friday, 11 December 2020 at 02:31:24 UTC, kdevel wrote:
>>    auto s = cast (string) stdin.byChunk(4).join;
>>
>> As strace reveals the resulting program sometimes reads twice zero
>> characters before it terminates:
>>
>>    read(0, a                                         <-- A, return
>>    "a\n", 1024)                    = 2
>>    read(0, "", 1024)                       = 0       <-- ctrl+d
>>    read(0, "", 1024)                       = 0       <-- ctrl+d
>>
>> Any comments or ideas?
>
> I see expected behaviour here if you use a buffer of length 4. I don't know what you want to achieve here.

Read till EOF.

> If you want to stop reading from stdin, you should check for eof() instead.

My code cannot do that because the function byChunk has control over the
file descriptor. The OS reports EOF by returning zero from read(2). The
D documentation of byChunk [1] does not mention such a check for eof
either.

> You should not check yourself for the character.

Where did I do that here?

   auto s = cast (string) stdin.byChunk(4).join;

> eof() can be lock in by multiple ways and it is the only correct way to handle all of them.

??

[1] https://linux.die.net/man/2/read
[2] https://dlang.org/phobos/std_stdio.html#byChunk
December 11, 2020
On Friday, 11 December 2020 at 12:34:19 UTC, kdevel wrote:
> My code cannot do that because the function byChunk has control over the
> file descriptor.

What do you mean by control? It just has the file handle, why do you cannot call eof() on the file handle struct?

>> You should not check yourself for the character.
>
> Where did I do that here?
>

I was just assuming that...

>
>> eof() can be lock in by multiple ways and it is the only correct way to handle all of them.
>
> ??

I mean that it's safer to rely on eof() which should return true if the stream comes inaccessible, caused by read(2) or whatever other OS depended reasons.

...but I was looking in the source and...

yes, byChunk() seems not to care about eof() - but it will just truncate the buffer on read failure which should work for your case. It basically just calls C's fread().

Are you sure that read(0, "", 1024) trace cones from your ctrl+d? It could be also from the runtime checking if the handle can be closed or something.

Please note that your terminal could be also the issue.




December 11, 2020
On Friday, 11 December 2020 at 15:57:37 UTC, frame wrote:
> On Friday, 11 December 2020 at 12:34:19 UTC, kdevel wrote:
>> My code cannot do that because the function byChunk has control over the
>> file descriptor.
>
> What do you mean by control?

The error happens while the cpu executes code of the D runtime (or the C library).
After looking into std/stdio.d I found that byChunk uses fread (not read). Thus I think I ran into [1] which seems to affect quite a lot of programs [2] [3].

~~~bychunk.d
void main ()
{
   import std.stdio;
   foreach (buf; stdin.byChunk (4096)) {
      auto s = cast (string) buf;
      writeln ("buf = <", s, ">");
   }
}
~~~

STR:

1. ./bychunk
2. A, [RETURN]
3. CTRL+D

expected: program ends
found: program still reading


[1] https://sourceware.org/bugzilla/show_bug.cgi?id=1190
    Bug 1190 Summary: fgetc()/fread() behaviour is not POSIX compliant

[2] https://unix.stackexchange.com/questions/517064/why-does-hexdump-try-to-read-through-eof

[3] https://stackoverflow.com/questions/52674057/why-does-an-fread-loop-require-an-extra-ctrld-to-signal-eof-with-glibc

December 11, 2020
On Friday, 11 December 2020 at 16:37:42 UTC, kdevel wrote:
> expected: program ends
> found: program still reading

works for me.... looks like i have

libc-2.30.so

so i guess i have the fixed libc. Can you confirm what version you have? I did `ls /lib/libc*` to pick that out but it might be different on your system.
December 11, 2020
On Friday, 11 December 2020 at 16:49:18 UTC, Adam D. Ruppe wrote:
> libc-2.30.so

The bug was fixed in 2.28 IIRC.

> so i guess i have the fixed libc. Can you confirm what version you have?

Various. I tested the code on a machine running the yet EOL CENTOS-6
having glibc 2.12.
December 11, 2020
On Friday, 11 December 2020 at 18:18:35 UTC, kdevel wrote:
> On Friday, 11 December 2020 at 16:49:18 UTC, Adam D. Ruppe wrote:
>> libc-2.30.so
>
> The bug was fixed in 2.28 IIRC.
>
>> so i guess i have the fixed libc. Can you confirm what version you have?
>
> Various. I tested the code on a machine running the yet EOL CENTOS-6
> having glibc 2.12.

Of course that could be "your" bug.

But you should test your program with another stream than stdin to ensure the terminal is not the problem because read(2) is lowlevel and you may not see where it really comes from. Maybe the terminal checks again or there are some buffers between terminal and your program.