December 28, 2013 Re: readln() returns new line charater | ||||
---|---|---|---|---|
| ||||
Posted in reply to Jakob Ovrum | On Saturday, 28 December 2013 at 17:15:17 UTC, Jakob Ovrum wrote:
> On Saturday, 28 December 2013 at 16:59:51 UTC, bearophile wrote:
>> void main() {
>> import std.stdio, std.string;
>> immutable txt = readln.chomp;
>> writeln(">", txt, "<");
>> }
>>
>>
>> Bye,
>> bearophile
>
> These examples are cute, but I think in real programs it's usually important to handle `stdin` being exhausted. With `readln`, such code is prone to go into an infinite loop.
>
> Of course in these same real programs, `byLine` is often the better choice anyway...
Usually if you're working with a console though the input stream won't exhaust and thus the blocking 'readln' would be a better option, no?
I'll just use the chomp method as that seems like the best option.
|
December 28, 2013 Re: readln() returns new line charater | ||||
---|---|---|---|---|
| ||||
Posted in reply to Jeroen Bollen | On Saturday, 28 December 2013 at 17:23:30 UTC, Jeroen Bollen wrote:
> Usually if you're working with a console though the input stream won't exhaust and thus the blocking 'readln' would be a better option, no?
The blocking behaviour of `stdin` by default is fine. The issue is that `readln` returns an empty string when `stdin` is empty/closed, which is different from an empty line (which is just the line terminator). Approaches that erase the difference with functions like `chomp` can't tell them apart.
Assuming that `stdin` will never close seems like a bad idea when it's so easy to handle, and the consequences of it closing can be harsh (particularly an infinite loop). Even assuming that `stdin` will never be redirected and always used from a console, an experienced user might use ^Z to close standard input to signal a clean end, only to be faced with either an obscure error, segfault or infinite loop.
|
December 29, 2013 Re: readln() returns new line charater | ||||
---|---|---|---|---|
| ||||
Posted in reply to Jakob Ovrum | On Saturday, 28 December 2013 at 17:42:26 UTC, Jakob Ovrum wrote:
> On Saturday, 28 December 2013 at 17:23:30 UTC, Jeroen Bollen wrote:
>> Usually if you're working with a console though the input stream won't exhaust and thus the blocking 'readln' would be a better option, no?
>
> The blocking behaviour of `stdin` by default is fine. The issue is that `readln` returns an empty string when `stdin` is empty/closed, which is different from an empty line (which is just the line terminator). Approaches that erase the difference with functions like `chomp` can't tell them apart.
>
> Assuming that `stdin` will never close seems like a bad idea when it's so easy to handle, and the consequences of it closing can be harsh (particularly an infinite loop). Even assuming that `stdin` will never be redirected and always used from a console, an experienced user might use ^Z to close standard input to signal a clean end, only to be faced with either an obscure error, segfault or infinite loop.
Wouldn't byline return an empty string if the inputstream is exhausted but not closed?
|
December 29, 2013 Re: readln() returns new line charater | ||||
---|---|---|---|---|
| ||||
Posted in reply to Jeroen Bollen | On Sunday, 29 December 2013 at 17:25:39 UTC, Jeroen Bollen wrote:
> Wouldn't byline return an empty string if the inputstream is exhausted but not closed?
No, both `readln` and `byLine` will block until either EOL or EOF. They differ in their handling of EOF - `readln` returns an empty string, while the result of `byLine` reports empty (it is a range) and calling `front` is an error.
|
December 29, 2013 Re: readln() returns new line charater | ||||
---|---|---|---|---|
| ||||
Posted in reply to Vladimir Panteleev | 28-Dec-2013 21:13, Vladimir Panteleev пишет: > On Saturday, 28 December 2013 at 17:07:58 UTC, Andrei Alexandrescu wrote: >> On 12/28/13 8:50 AM, Jeroen Bollen wrote: >>> On Saturday, 28 December 2013 at 16:49:15 UTC, Jeroen Bollen wrote: >>>> Why is when you do readln() the newline character (\n) gets read too? >>>> Wouldn't it make more sense for that character to be stripped off? >>> >>> I just want to add to this, that it makes it really annoying to work >>> with the command line, as you kinda have to strip off the last character >>> and thus cannot make the string immutable. >> >> Try stdin.byLine, which by default strips the newline. > > stdin.byLine can't strip \r\n unless you specify that as the line > terminator, in which case it can't split by \n. I've come to conclusion that the only sane line ending behavior is to do what Unicode standard says, and detect the following pattern as line separator: \r\n | \r | \f | \v | \n | \u0085 | \u2028 | \u2029 This includes never breaking a line in between \r\n sequence. -- Dmitry Olshansky |
December 29, 2013 Re: readln() returns new line charater | ||||
---|---|---|---|---|
| ||||
Posted in reply to Dmitry Olshansky | On Sunday, 29 December 2013 at 18:45:36 UTC, Dmitry Olshansky wrote:
> I've come to conclusion that the only sane line ending behavior is to do what Unicode standard says, and detect the following pattern as line separator:
>
> \r\n | \r | \f | \v | \n | \u0085 | \u2028 | \u2029
>
> This includes never breaking a line in between \r\n sequence.
I don't think something as basic as a line-splitting function should do UTF decoding unless the user asks for it explicitly. Getting UTF-8 decoding errors in splitLines when working with ASCII files has caused be enough frustration to stop using that function altogether (unless I *KNOW* the text is valid UTF-8). I've yet to encounter a need to split by anything other than \n and \r\n.
|
December 29, 2013 Re: readln() returns new line charater | ||||
---|---|---|---|---|
| ||||
Posted in reply to Vladimir Panteleev | 29-Dec-2013 23:28, Vladimir Panteleev пишет: > On Sunday, 29 December 2013 at 18:45:36 UTC, Dmitry Olshansky wrote: >> I've come to conclusion that the only sane line ending behavior is to >> do what Unicode standard says, and detect the following pattern as >> line separator: >> >> \r\n | \r | \f | \v | \n | \u0085 | \u2028 | \u2029 >> >> This includes never breaking a line in between \r\n sequence. > > I don't think something as basic as a line-splitting function should do > UTF decoding unless the user asks for it explicitly. I haven't said decode :) Just match the pattern as UTF-8 bytes explicitly, the bulk of these separators is side-steped away after a single test instruction + conditional branch (that is fairly predictable - like almost never taken). > Getting UTF-8 > decoding errors in splitLines when working with ASCII files has caused > be enough frustration to stop using that function altogether (unless I > *KNOW* the text is valid UTF-8). I've yet to encounter a need to split > by anything other than \n and \r\n. I would argue there is a way to do that almost as cheap as the trio of \r | \n | \r\n would be. Personal experience notwithstanding it would be better do the right thing. P.S. What I know for sure is that there is a strong need for having better support for other encodings. Raw ASCII included, but encoding assumptions must be explicit. -- Dmitry Olshansky |
December 29, 2013 Re: readln() returns new line charater | ||||
---|---|---|---|---|
| ||||
Posted in reply to Jakob Ovrum | On Sunday, 29 December 2013 at 18:13:30 UTC, Jakob Ovrum wrote:
> On Sunday, 29 December 2013 at 17:25:39 UTC, Jeroen Bollen wrote:
>> Wouldn't byline return an empty string if the inputstream is exhausted but not closed?
>
> No, both `readln` and `byLine` will block until either EOL or EOF. They differ in their handling of EOF - `readln` returns an empty string, while the result of `byLine` reports empty (it is a range) and calling `front` is an error.
But wouldn't that mean I'd still end up making my char[] mutable, as I still need to manually remove the last character, AFTER I checked it's not empty?
|
December 30, 2013 Re: readln() returns new line charater | ||||
---|---|---|---|---|
| ||||
Posted in reply to Jeroen Bollen | Am Sun, 29 Dec 2013 22:03:14 +0000 schrieb "Jeroen Bollen" <jbinero@gmail.com>: > On Sunday, 29 December 2013 at 18:13:30 UTC, Jakob Ovrum wrote: > > On Sunday, 29 December 2013 at 17:25:39 UTC, Jeroen Bollen wrote: > >> Wouldn't byline return an empty string if the inputstream is exhausted but not closed? > > > > No, both `readln` and `byLine` will block until either EOL or EOF. They differ in their handling of EOF - `readln` returns an empty string, while the result of `byLine` reports empty (it is a range) and calling `front` is an error. > > But wouldn't that mean I'd still end up making my char[] mutable, as I still need to manually remove the last character, AFTER I checked it's not empty? No, strings have immutable characters, but there is nothing wrong with using only part of it as an array slice: string s = readln(); s = s[0 .. $-1]; (just to illustrate) -- Marco |
December 30, 2013 Re: readln() returns new line charater | ||||
---|---|---|---|---|
| ||||
Posted in reply to Vladimir Panteleev | Am Sat, 28 Dec 2013 17:08:38 +0000 schrieb "Vladimir Panteleev" <vladimir@thecybershadow.net>: > On Saturday, 28 December 2013 at 17:07:23 UTC, Andrei Alexandrescu wrote: > > On 12/28/13 8:49 AM, Jeroen Bollen wrote: > >> Why is when you do readln() the newline character (\n) gets > >> read too? > >> Wouldn't it make more sense for that character to be stripped > >> off? > > > > So you know that if it returns an empty string the file is done. > > And also so a readln/writeln loop preserves line endings. Detect the bug in this sentence. -- Marco |
Copyright © 1999-2021 by the D Language Foundation