Thread overview
lineSplitter ignores the the trailing newline?
Nov 09, 2019
Jonathan Marler
Nov 09, 2019
Paul Backus
Nov 09, 2019
Jonathan Marler
Nov 09, 2019
Jonathan Marler
Nov 10, 2019
Jonathan M Davis
Nov 10, 2019
Jonathan Marler
Nov 10, 2019
Patrick Schluter
Nov 10, 2019
Jonathan Marler
November 09, 2019
In people's opinion, should lineSplitter handle files with trailing newlines differently or the same?  Currently, lineSplitter will ignore the trailing newline.  If it didn't ignore it, then I'd expect that a file with a trailing newline would return an empty string as its last element, but it doesn't do this.

I noticed this because the "tolf" tool in the tools repo uses lineSplitter and joins each line with a '\n' character (see https://github.com/dlang/tools/blob/master/tolf.d).  However, because lineSplitter ignores the trailing newline, this means it will always remove the last trailing newline in the file.  If we wanted to keep the trailing newline, then we could add an empty string to the lineSplitter range, but then if the original file didn't have a trailing newline then it would add one that it didn't have before.  This seems like a bug in lineSplitter but I'm not sure if everyone would agree.

November 09, 2019
On Saturday, 9 November 2019 at 19:00:31 UTC, Jonathan Marler wrote:
> In people's opinion, should lineSplitter handle files with trailing newlines differently or the same?  Currently, lineSplitter will ignore the trailing newline.  If it didn't ignore it, then I'd expect that a file with a trailing newline would return an empty string as its last element, but it doesn't do this.

lineSplitter follows the Unix convention of treating newlines as line *terminators* rather than line *separators* (as we can see from the name of its template argument "KeepTerminator"). Under this convention, a trailing newline terminates the final line in a file, but does not start a new one, so there is no need for an additional empty line.
November 09, 2019
On Saturday, 9 November 2019 at 19:28:52 UTC, Paul Backus wrote:
> On Saturday, 9 November 2019 at 19:00:31 UTC, Jonathan Marler wrote:
>> In people's opinion, should lineSplitter handle files with trailing newlines differently or the same?  Currently, lineSplitter will ignore the trailing newline.  If it didn't ignore it, then I'd expect that a file with a trailing newline would return an empty string as its last element, but it doesn't do this.
>
> lineSplitter follows the Unix convention of treating newlines as line *terminators* rather than line *separators* (as we can see from the name of its template argument "KeepTerminator"). Under this convention, a trailing newline terminates the final line in a file, but does not start a new one, so there is no need for an additional empty line.

Interesting, do you have any links and or references that establish that unix uses newlines as terminators rather than separators?
November 09, 2019
On Saturday, 9 November 2019 at 23:04:11 UTC, Jonathan Marler wrote:
> On Saturday, 9 November 2019 at 19:28:52 UTC, Paul Backus wrote:
>> On Saturday, 9 November 2019 at 19:00:31 UTC, Jonathan Marler wrote:
>>> In people's opinion, should lineSplitter handle files with trailing newlines differently or the same?  Currently, lineSplitter will ignore the trailing newline.  If it didn't ignore it, then I'd expect that a file with a trailing newline would return an empty string as its last element, but it doesn't do this.
>>
>> lineSplitter follows the Unix convention of treating newlines as line *terminators* rather than line *separators* (as we can see from the name of its template argument "KeepTerminator"). Under this convention, a trailing newline terminates the final line in a file, but does not start a new one, so there is no need for an additional empty line.
>
> Interesting, do you have any links and or references that establish that unix uses newlines as terminators rather than separators?

Also, if unix is using it as a "terminator" instead of a "separator", if you don't have a trailing newline at the end of the file then the last line wouldn't be "terminated" properly, so should you ignore the last line?  Should that be an error?

If so, then it sounds like the correct solution for the "tolf" tool is to always make sure the last line is terminated with a newline character.  Does that sound correct?
November 09, 2019
On Saturday, November 9, 2019 4:07:29 PM MST Jonathan Marler via Digitalmars-d wrote:
> On Saturday, 9 November 2019 at 23:04:11 UTC, Jonathan Marler
>
> wrote:
> > On Saturday, 9 November 2019 at 19:28:52 UTC, Paul Backus wrote:
> >> On Saturday, 9 November 2019 at 19:00:31 UTC, Jonathan Marler
> >>
> >> wrote:
> >>> In people's opinion, should lineSplitter handle files with trailing newlines differently or the same?  Currently, lineSplitter will ignore the trailing newline.  If it didn't ignore it, then I'd expect that a file with a trailing newline would return an empty string as its last element, but it doesn't do this.
> >>
> >> lineSplitter follows the Unix convention of treating newlines as line *terminators* rather than line *separators* (as we can see from the name of its template argument "KeepTerminator"). Under this convention, a trailing newline terminates the final line in a file, but does not start a new one, so there is no need for an additional empty line.
> >
> > Interesting, do you have any links and or references that establish that unix uses newlines as terminators rather than separators?
>
> Also, if unix is using it as a "terminator" instead of a "separator", if you don't have a trailing newline at the end of the file then the last line wouldn't be "terminated" properly, so should you ignore the last line?  Should that be an error?
>
> If so, then it sounds like the correct solution for the "tolf" tool is to always make sure the last line is terminated with a newline character.  Does that sound correct?

Per the POSIX standard, lines are always terminated by a newline.

https://stackoverflow.com/questions/729692/why-should-text-files-end-with-a-newline

Text editors and the like will generally ensure that it's there. Now, I don't see much reason to treat it as an error if you manage to have a text file or other block of text that doesn't end with a newline, since reaching the end of the file or text makes it pretty clear that the line ended. Also, while POSIX and its utilities may be designed to assume that lines always end with newlines (including at the end of a file), Windows doesn't make that same assumption.

- Jonathan M Davis



November 10, 2019
On Sunday, 10 November 2019 at 01:26:10 UTC, Jonathan M Davis wrote:
> On Saturday, November 9, 2019 4:07:29 PM MST Jonathan Marler via Digitalmars-d wrote:
>> [...]
>
> Per the POSIX standard, lines are always terminated by a newline.
>
> https://stackoverflow.com/questions/729692/why-should-text-files-end-with-a-newline
>
> Text editors and the like will generally ensure that it's there. Now, I don't see much reason to treat it as an error if you manage to have a text file or other block of text that doesn't end with a newline, since reaching the end of the file or text makes it pretty clear that the line ended. Also, while POSIX and its utilities may be designed to assume that lines always end with newlines (including at the end of a file), Windows doesn't make that same assumption.
>
> - Jonathan M Davis

Thanks for the reference.  I've opened a PR to fix the "tolf" tool to keep the trailing newline on each file, or, add a trailing newline if it doesn't have one yet: https://github.com/dlang/tools/pull/385
November 10, 2019
On Saturday, 9 November 2019 at 23:07:29 UTC, Jonathan Marler wrote:
> On Saturday, 9 November 2019 at 23:04:11 UTC, Jonathan Marler wrote:
>> On Saturday, 9 November 2019 at 19:28:52 UTC, Paul Backus wrote:
>>> On Saturday, 9 November 2019 at 19:00:31 UTC, Jonathan Marler wrote:
>>>> [...]
>>>
>>> lineSplitter follows the Unix convention of treating newlines as line *terminators* rather than line *separators* (as we can see from the name of its template argument "KeepTerminator"). Under this convention, a trailing newline terminates the final line in a file, but does not start a new one, so there is no need for an additional empty line.
>>
>> Interesting, do you have any links and or references that establish that unix uses newlines as terminators rather than separators?
>
> Also, if unix is using it as a "terminator" instead of a "separator", if you don't have a trailing newline at the end of the file then the last line wouldn't be "terminated" properly, so should you ignore the last line?  Should that be an error?
>
> If so, then it sounds like the correct solution for the "tolf" tool is to always make sure the last line is terminated with a newline character.  Does that sound correct?

It would be necessary also for most Unices that don't use glibc. In a lot of libc implementation (Solaris definitely has the bug) fgets() doesn't return correctly the last line if it has no line feed. glibc corrects this behaviour so that most Linux users don't know about this issue.
November 10, 2019
On Sunday, 10 November 2019 at 12:29:12 UTC, Patrick Schluter wrote:
> On Saturday, 9 November 2019 at 23:07:29 UTC, Jonathan Marler wrote:
>> On Saturday, 9 November 2019 at 23:04:11 UTC, Jonathan Marler wrote:
>>> On Saturday, 9 November 2019 at 19:28:52 UTC, Paul Backus wrote:
>>>> [...]
>>>
>>> Interesting, do you have any links and or references that establish that unix uses newlines as terminators rather than separators?
>>
>> Also, if unix is using it as a "terminator" instead of a "separator", if you don't have a trailing newline at the end of the file then the last line wouldn't be "terminated" properly, so should you ignore the last line?  Should that be an error?
>>
>> If so, then it sounds like the correct solution for the "tolf" tool is to always make sure the last line is terminated with a newline character.  Does that sound correct?
>
> It would be necessary also for most Unices that don't use glibc. In a lot of libc implementation (Solaris definitely has the bug) fgets() doesn't return correctly the last line if it has no line feed. glibc corrects this behaviour so that most Linux users don't know about this issue.

Interesting.  But it makes sense as it sounds like Unix requires each line to end with a line feed, so if it's missing, then it's up to the library whether or not they want to support a non-standard feature such as a line without a line feed.  It makes sense that some libraries would choose not to support it.