Thread overview | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
September 10, 2014 std.range.byLine | ||||
---|---|---|---|---|
| ||||
I'm missing a range variant of byLine that can operate on strings instead of just File. This is such a common feature so I believe it should have its place in std.range. My suggestion is to define this using splitter!(std.uni.isNewline) but I'm missing std.uni.isNewline. I'm guessing the problem here is that newline separators can be 1 or 2 bytes long. that is it Separator must be of the same time as Range. Should I add an overload in PR? Destroy. |
September 10, 2014 Re: std.range.byLine | ||||
---|---|---|---|---|
| ||||
Posted in reply to Nordlöw | On Wednesday, 10 September 2014 at 21:06:30 UTC, Nordlöw wrote:
> This is such a common feature so I believe it should have its place in std.range.
Or some other Phobos module.
|
September 10, 2014 Re: std.range.byLine | ||||
---|---|---|---|---|
| ||||
Posted in reply to Nordlöw | On 09/10/2014 02:06 PM, "Nordlöw" wrote:
> I'm missing a range variant of byLine that can operate on strings
> instead of just File.
>
> This is such a common feature so I believe it should have its place in
> std.range.
>
> My suggestion is to define this using
>
> splitter!(std.uni.isNewline)
>
> but I'm missing std.uni.isNewline.
>
> I'm guessing the problem here is that newline separators can be 1 or 2
> bytes long. that is it Separator must be of the same time as Range.
>
> Should I add an overload in PR?
>
> Destroy.
There is std.ascii.newline. The following works where newline is '\n' e.g. on my Linux system. :)
import std.ascii;
import std.algorithm;
import std.range;
void main()
{
assert("foo\nbar\n"
.splitter(newline)
.filter!(a => !a.empty)
.equal([ "foo", "bar" ]));
}
Ali
|
September 10, 2014 Re: std.range.byLine | ||||
---|---|---|---|---|
| ||||
Posted in reply to Ali Çehreli | On Wednesday, 10 September 2014 at 22:29:55 UTC, Ali Çehreli wrote:
> assert("foo\nbar\n"
> .splitter(newline)
> .filter!(a => !a.empty)
> .equal([ "foo", "bar" ]));
> }
>
> Ali
Ok, great.
So I got.
auto byLine(Range)(Range input) if (isForwardRange!Range)
{
import std.algorithm: splitter;
import std.ascii: newline;
static if (newline.length == 1)
{
return input.splitter(newline.front);
}
else
{
return input.splitter(newline);
}
}
unittest
{
import std.algorithm: equal;
assert(equal("a\nb".byLine, ["a", "b"]));
}
One thing still:
Is my optimization for newline.length == 1 unnecessary or perhaps even wrong?
|
September 10, 2014 Re: std.range.byLine | ||||
---|---|---|---|---|
| ||||
Posted in reply to Nordlöw | On Wednesday, 10 September 2014 at 22:45:08 UTC, Nordlöw wrote:
> auto byLine(Range)(Range input) if (isForwardRange!Range)
> {
> import std.algorithm: splitter;
> import std.ascii: newline;
> static if (newline.length == 1)
> {
> return input.splitter(newline.front);
> }
> else
> {
> return input.splitter(newline);
> }
> }
IMHO, this should be added to std.string and restricted to isSomeString. Should I do a PR?
|
September 11, 2014 Re: std.range.byLine | ||||
---|---|---|---|---|
| ||||
Posted in reply to Nordlöw | On Wednesday, 10 September 2014 at 23:01:44 UTC, Nordlöw wrote: > On Wednesday, 10 September 2014 at 22:45:08 UTC, Nordlöw wrote: >> auto byLine(Range)(Range input) if (isForwardRange!Range) >> { >> import std.algorithm: splitter; >> import std.ascii: newline; >> static if (newline.length == 1) >> { >> return input.splitter(newline.front); >> } >> else >> { >> return input.splitter(newline); >> } >> } > > IMHO, this should be added to std.string and restricted to isSomeString. Should I do a PR? Well, the issue is that this isn't very portable for *reading*, as even on linux, you may read files with "\r\n" line endings (It's "standard" for csv files, for example), or read "\n" terminated files on windows. The issue is that (currently) we don't have any splitter that operates on multiple needles. *That'd* be what needs to be written (probably not too hard either, since "find" already exists). We also have splitLines, "http://dlang.org/phobos/std_string.html#.splitLines". Is that good enough for you by any chance? Or do you need it to actually be lazy? |
September 11, 2014 Re: std.range.byLine | ||||
---|---|---|---|---|
| ||||
Posted in reply to monarch_dodra | On Thursday, 11 September 2014 at 10:19:17 UTC, monarch_dodra wrote: > Well, the issue is that this isn't very portable for *reading*, as even on linux, you may read files with "\r\n" line endings (It's "standard" for csv files, for example), or read "\n" terminated files on windows. > The issue is that (currently) we don't have any splitter that operates on multiple needles. *That'd* be what needs to be written (probably not too hard either, since "find" already exists). Good idea. So its "just" a matter of extending splitter with std.algorithm.find with these three keys: - \n - \r - \r\n then? Or are there more encodings to choose from? > We also have splitLines, "http://dlang.org/phobos/std_string.html#.splitLines". Is that good enough for you by any chance? Or do you need it to actually be lazy? Lazyness is good in this case because my input files are Gigabytes in size :) I'm playing around with single-pass-parsing ConceptNet5 CSV-files at https://github.com/nordlow/justd/blob/master/conceptnet5.d |
September 11, 2014 Re: std.range.byLine | ||||
---|---|---|---|---|
| ||||
Posted in reply to Nordlöw | On Thursday, 11 September 2014 at 20:03:26 UTC, Nordlöw wrote:
> On Thursday, 11 September 2014 at 10:19:17 UTC, monarch_dodra wrote:
>> Well, the issue is that this isn't very portable for *reading*, as even on linux, you may read files with "\r\n" line endings (It's "standard" for csv files, for example), or read "\n" terminated files on windows.
>> The issue is that (currently) we don't have any splitter that operates on multiple needles. *That'd* be what needs to be written (probably not too hard either, since "find" already exists).
>
> Good idea. So its "just" a matter of extending splitter with std.algorithm.find with these three keys:
> - \n
> - \r
> - \r\n
> then? Or are there more encodings to choose from?
Hum... no, those are the correct splitting elements. However, I don't think that would actually work, as "find" will privilege the first whole element to match as a "hit", so "\r\n" never be hit (rather, it will be hit twice, in the form of two individual line breaks `\r` and '\n').
Bummer...
|
September 11, 2014 Re: std.range.byLine | ||||
---|---|---|---|---|
| ||||
Posted in reply to monarch_dodra | On Thursday, 11 September 2014 at 21:29:16 UTC, monarch_dodra wrote:
> Hum... no, those are the correct splitting elements. However, I don't think that would actually work, as "find" will privilege the first whole element to match as a "hit", so "\r\n" never be hit (rather, it will be hit twice, in the form of two individual line breaks `\r` and '\n').
>
> Bummer...
So why not simply change the order of the keys to
- \r\n
- \r
- \n
then?
|
September 11, 2014 Re: std.range.byLine | ||||
---|---|---|---|---|
| ||||
Posted in reply to monarch_dodra | On Thursday, 11 September 2014 at 21:29:16 UTC, monarch_dodra wrote:
> Bummer...
Anyway, it shouldn't be too hard to express this in a new range.
|
Copyright © 1999-2021 by the D Language Foundation